Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw wrote: > On 03/12/15 05:26, Bin.Cheng wrote: >> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw >> wrote: >>> On 01/12/15 03:19, Bin.Cheng wrote: On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw wrote: > On 24/11/15 09:56, Richard Earnshaw wrote: >> On 24/11/15 02:51, Bin.Cheng wrote: > The aarch64's problem is we don't define addptr3 pattern, and we don't >>> have direct insn pattern describing the "x + y << z". According to >>> gcc internal: >>> >>> ‘addptrm3’ >>> Like addm3 but is guaranteed to only be used for address >>> calculations. >>> The expanded code is not allowed to clobber the condition code. It >>> only needs to be defined if addm3 sets the condition code. > > addm3 on aarch64 does not set the condition codes, so by this rule we > shouldn't need to define this pattern. >>> Hi Richard, >>> I think that rule has a prerequisite that backend needs to support >>> register shifted addition in addm3 pattern. >> >> addm3 is a named pattern and its format is well defined. It does not >> take a shifted operand and never has. >> >>> Apparently for AArch64, >>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>> "does not set the condition codes" actually, because both >>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >> >> You appear to be confusing named patterns (used by expand) with >> recognizers. Anyway, we have >> >> (define_insn "*add__" >> [(set (match_operand:GPI 0 "register_operand" "=r") >> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" >> "r") >> (match_operand:QI 2 >> "aarch64_shift_imm_" "n")) >> (match_operand:GPI 3 "register_operand" "r")))] >> >> Which is a non-flag setting add with shifted operand. >> >>> Either way I think it is another backend issue, so do you approve that >>> I commit this patch now? >> >> Not yet. I think there's something fundamental amiss here. >> >> BTW, it looks to me as though addptr3 should have exactly the same >> operand rules as add3 (documentation reads "like add3"), so a >> shifted operand shouldn't be supported there either. If that isn't the >> case then that should be clearly called out in the documentation. >> >> R. >> > > PS. > > I presume you are aware of the canonicalization rules for add? That is, > for a shift-and-add operation, the shift operand must appear first. Ie. > > (plus (shift (op, op)), op) > > not > > (plus (op, (shift (op, op)) Hi Richard, Thanks for the comments. I realized that the not-recognized insn issue is because the original patch build non-canonical expressions. When reloading address expression, LRA generates non-canonical register scaled insn, which can't be recognized by aarch64 backend. Here is the updated patch using canonical form pattern, it passes bootstrap and regression test. Well, the ivo failure still exists, but it analyzed in the original message. Is this patch OK? As for Jiong's concern about the additional extension instruction, I think this only stands for atmoic load store instructions. For general load store, AArch64 supports zext/sext in register scaling addressing mode, the additional instruction can be forward propagated into memory reference. The problem for atomic load store is AArch64 only supports direct register addressing mode. After LRA reloads address expression out of memory reference, there is no combine/fwprop optimizer to merge instructions. The problem is atomic_store's predicate doesn't match its constraint. The predicate used for atomic_store is memory_operand, while all other atomic patterns use aarch64_sync_memory_operand. I think this might be a typo. With this change, expand will not generate addressing mode requiring reload anymore. I will test another patch fixing this. Thanks, bin >>> >>> Some comments inline. >>> > > R. > > aarch64_legitimize_addr-20151128.txt > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 3fe2f0f..5b3e3c4 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x > */, machine_mode mode) > We try to pick as large a range for the offset as possible to > maximize the chance of a CSE. However, for aligned addresses > we limit the range to 4k so that structures with different sized > - elements are likely to use the s
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 03/12/15 05:26, Bin.Cheng wrote: > On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw > wrote: >> On 01/12/15 03:19, Bin.Cheng wrote: >>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >>> wrote: On 24/11/15 09:56, Richard Earnshaw wrote: > On 24/11/15 02:51, Bin.Cheng wrote: The aarch64's problem is we don't define addptr3 pattern, and we don't >> have direct insn pattern describing the "x + y << z". According to >> gcc internal: >> >> ‘addptrm3’ >> Like addm3 but is guaranteed to only be used for address >> calculations. >> The expanded code is not allowed to clobber the condition code. It >> only needs to be defined if addm3 sets the condition code. addm3 on aarch64 does not set the condition codes, so by this rule we shouldn't need to define this pattern. >> Hi Richard, >> I think that rule has a prerequisite that backend needs to support >> register shifted addition in addm3 pattern. > > addm3 is a named pattern and its format is well defined. It does not > take a shifted operand and never has. > >> Apparently for AArch64, >> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >> "does not set the condition codes" actually, because both >> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. > > You appear to be confusing named patterns (used by expand) with > recognizers. Anyway, we have > > (define_insn "*add__" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 > "aarch64_shift_imm_" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > > Which is a non-flag setting add with shifted operand. > >> Either way I think it is another backend issue, so do you approve that >> I commit this patch now? > > Not yet. I think there's something fundamental amiss here. > > BTW, it looks to me as though addptr3 should have exactly the same > operand rules as add3 (documentation reads "like add3"), so a > shifted operand shouldn't be supported there either. If that isn't the > case then that should be clearly called out in the documentation. > > R. > PS. I presume you are aware of the canonicalization rules for add? That is, for a shift-and-add operation, the shift operand must appear first. Ie. (plus (shift (op, op)), op) not (plus (op, (shift (op, op)) >>> >>> Hi Richard, >>> Thanks for the comments. I realized that the not-recognized insn >>> issue is because the original patch build non-canonical expressions. >>> When reloading address expression, LRA generates non-canonical >>> register scaled insn, which can't be recognized by aarch64 backend. >>> >>> Here is the updated patch using canonical form pattern, it passes >>> bootstrap and regression test. Well, the ivo failure still exists, >>> but it analyzed in the original message. >>> >>> Is this patch OK? >>> >>> As for Jiong's concern about the additional extension instruction, I >>> think this only stands for atmoic load store instructions. For >>> general load store, AArch64 supports zext/sext in register scaling >>> addressing mode, the additional instruction can be forward propagated >>> into memory reference. The problem for atomic load store is AArch64 >>> only supports direct register addressing mode. After LRA reloads >>> address expression out of memory reference, there is no combine/fwprop >>> optimizer to merge instructions. The problem is atomic_store's >>> predicate doesn't match its constraint. The predicate used for >>> atomic_store is memory_operand, while all other atomic patterns >>> use aarch64_sync_memory_operand. I think this might be a typo. With >>> this change, expand will not generate addressing mode requiring reload >>> anymore. I will test another patch fixing this. >>> >>> Thanks, >>> bin >> >> Some comments inline. >> R. aarch64_legitimize_addr-20151128.txt diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3fe2f0f..5b3e3c4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) We try to pick as large a range for the offset as possible to maximize the chance of a CSE. However, for aligned addresses we limit the range to 4k so that structures with different sized - elements are likely to use the same base. */ + elements are likely to use the same base. We need to be careful + not split CONST for some forms address expressions, otherwise it >> >> not to sp
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw wrote: > On 01/12/15 03:19, Bin.Cheng wrote: >> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw >> wrote: >>> On 24/11/15 09:56, Richard Earnshaw wrote: On 24/11/15 02:51, Bin.Cheng wrote: >>> The aarch64's problem is we don't define addptr3 pattern, and we don't > have direct insn pattern describing the "x + y << z". According to > gcc internal: > > ‘addptrm3’ > Like addm3 but is guaranteed to only be used for address calculations. > The expanded code is not allowed to clobber the condition code. It > only needs to be defined if addm3 sets the condition code. >>> >>> addm3 on aarch64 does not set the condition codes, so by this rule we >>> shouldn't need to define this pattern. > Hi Richard, > I think that rule has a prerequisite that backend needs to support > register shifted addition in addm3 pattern. addm3 is a named pattern and its format is well defined. It does not take a shifted operand and never has. > Apparently for AArch64, > addm3 only supports "reg+reg" or "reg+imm". Also we don't really > "does not set the condition codes" actually, because both > "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. You appear to be confusing named patterns (used by expand) with recognizers. Anyway, we have (define_insn "*add__" [(set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operand:QI 2 "aarch64_shift_imm_" "n")) (match_operand:GPI 3 "register_operand" "r")))] Which is a non-flag setting add with shifted operand. > Either way I think it is another backend issue, so do you approve that > I commit this patch now? Not yet. I think there's something fundamental amiss here. BTW, it looks to me as though addptr3 should have exactly the same operand rules as add3 (documentation reads "like add3"), so a shifted operand shouldn't be supported there either. If that isn't the case then that should be clearly called out in the documentation. R. >>> >>> PS. >>> >>> I presume you are aware of the canonicalization rules for add? That is, >>> for a shift-and-add operation, the shift operand must appear first. Ie. >>> >>> (plus (shift (op, op)), op) >>> >>> not >>> >>> (plus (op, (shift (op, op)) >> >> Hi Richard, >> Thanks for the comments. I realized that the not-recognized insn >> issue is because the original patch build non-canonical expressions. >> When reloading address expression, LRA generates non-canonical >> register scaled insn, which can't be recognized by aarch64 backend. >> >> Here is the updated patch using canonical form pattern, it passes >> bootstrap and regression test. Well, the ivo failure still exists, >> but it analyzed in the original message. >> >> Is this patch OK? >> >> As for Jiong's concern about the additional extension instruction, I >> think this only stands for atmoic load store instructions. For >> general load store, AArch64 supports zext/sext in register scaling >> addressing mode, the additional instruction can be forward propagated >> into memory reference. The problem for atomic load store is AArch64 >> only supports direct register addressing mode. After LRA reloads >> address expression out of memory reference, there is no combine/fwprop >> optimizer to merge instructions. The problem is atomic_store's >> predicate doesn't match its constraint. The predicate used for >> atomic_store is memory_operand, while all other atomic patterns >> use aarch64_sync_memory_operand. I think this might be a typo. With >> this change, expand will not generate addressing mode requiring reload >> anymore. I will test another patch fixing this. >> >> Thanks, >> bin > > Some comments inline. > >>> >>> R. >>> >>> aarch64_legitimize_addr-20151128.txt >>> >>> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 3fe2f0f..5b3e3c4 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x >>> */, machine_mode mode) >>> We try to pick as large a range for the offset as possible to >>> maximize the chance of a CSE. However, for aligned addresses >>> we limit the range to 4k so that structures with different sized >>> - elements are likely to use the same base. */ >>> + elements are likely to use the same base. We need to be careful >>> + not split CONST for some forms address expressions, otherwise it > > not to split a CONST for some forms of address expression, > >>> + will generate sub-optimal code. */ >>> >>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) >>>
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 01/12/15 03:19, Bin.Cheng wrote: > On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw > wrote: >> On 24/11/15 09:56, Richard Earnshaw wrote: >>> On 24/11/15 02:51, Bin.Cheng wrote: >> The aarch64's problem is we don't define addptr3 pattern, and we don't have direct insn pattern describing the "x + y << z". According to gcc internal: ‘addptrm3’ Like addm3 but is guaranteed to only be used for address calculations. The expanded code is not allowed to clobber the condition code. It only needs to be defined if addm3 sets the condition code. >> >> addm3 on aarch64 does not set the condition codes, so by this rule we >> shouldn't need to define this pattern. Hi Richard, I think that rule has a prerequisite that backend needs to support register shifted addition in addm3 pattern. >>> >>> addm3 is a named pattern and its format is well defined. It does not >>> take a shifted operand and never has. >>> Apparently for AArch64, addm3 only supports "reg+reg" or "reg+imm". Also we don't really "does not set the condition codes" actually, because both "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >>> >>> You appear to be confusing named patterns (used by expand) with >>> recognizers. Anyway, we have >>> >>> (define_insn "*add__" >>> [(set (match_operand:GPI 0 "register_operand" "=r") >>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") >>> (match_operand:QI 2 >>> "aarch64_shift_imm_" "n")) >>> (match_operand:GPI 3 "register_operand" "r")))] >>> >>> Which is a non-flag setting add with shifted operand. >>> Either way I think it is another backend issue, so do you approve that I commit this patch now? >>> >>> Not yet. I think there's something fundamental amiss here. >>> >>> BTW, it looks to me as though addptr3 should have exactly the same >>> operand rules as add3 (documentation reads "like add3"), so a >>> shifted operand shouldn't be supported there either. If that isn't the >>> case then that should be clearly called out in the documentation. >>> >>> R. >>> >> >> PS. >> >> I presume you are aware of the canonicalization rules for add? That is, >> for a shift-and-add operation, the shift operand must appear first. Ie. >> >> (plus (shift (op, op)), op) >> >> not >> >> (plus (op, (shift (op, op)) > > Hi Richard, > Thanks for the comments. I realized that the not-recognized insn > issue is because the original patch build non-canonical expressions. > When reloading address expression, LRA generates non-canonical > register scaled insn, which can't be recognized by aarch64 backend. > > Here is the updated patch using canonical form pattern, it passes > bootstrap and regression test. Well, the ivo failure still exists, > but it analyzed in the original message. > > Is this patch OK? > > As for Jiong's concern about the additional extension instruction, I > think this only stands for atmoic load store instructions. For > general load store, AArch64 supports zext/sext in register scaling > addressing mode, the additional instruction can be forward propagated > into memory reference. The problem for atomic load store is AArch64 > only supports direct register addressing mode. After LRA reloads > address expression out of memory reference, there is no combine/fwprop > optimizer to merge instructions. The problem is atomic_store's > predicate doesn't match its constraint. The predicate used for > atomic_store is memory_operand, while all other atomic patterns > use aarch64_sync_memory_operand. I think this might be a typo. With > this change, expand will not generate addressing mode requiring reload > anymore. I will test another patch fixing this. > > Thanks, > bin Some comments inline. >> >> R. >> >> aarch64_legitimize_addr-20151128.txt >> >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 3fe2f0f..5b3e3c4 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x >> */, machine_mode mode) >> We try to pick as large a range for the offset as possible to >> maximize the chance of a CSE. However, for aligned addresses >> we limit the range to 4k so that structures with different sized >> - elements are likely to use the same base. */ >> + elements are likely to use the same base. We need to be careful >> + not split CONST for some forms address expressions, otherwise it not to split a CONST for some forms of address expression, >> + will generate sub-optimal code. */ >> >>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) >> { >>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >>HOST_WIDE_INT base_offset; >> >> + if (GET_CODE (XEXP (x, 0)) == PLUS) >> +{ >> + rt
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw wrote: > On 24/11/15 09:56, Richard Earnshaw wrote: >> On 24/11/15 02:51, Bin.Cheng wrote: > The aarch64's problem is we don't define addptr3 pattern, and we don't >>> have direct insn pattern describing the "x + y << z". According to >>> gcc internal: >>> >>> ‘addptrm3’ >>> Like addm3 but is guaranteed to only be used for address calculations. >>> The expanded code is not allowed to clobber the condition code. It >>> only needs to be defined if addm3 sets the condition code. > > addm3 on aarch64 does not set the condition codes, so by this rule we > shouldn't need to define this pattern. >>> Hi Richard, >>> I think that rule has a prerequisite that backend needs to support >>> register shifted addition in addm3 pattern. >> >> addm3 is a named pattern and its format is well defined. It does not >> take a shifted operand and never has. >> >>> Apparently for AArch64, >>> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >>> "does not set the condition codes" actually, because both >>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. >> >> You appear to be confusing named patterns (used by expand) with >> recognizers. Anyway, we have >> >> (define_insn "*add__" >> [(set (match_operand:GPI 0 "register_operand" "=r") >> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") >> (match_operand:QI 2 >> "aarch64_shift_imm_" "n")) >> (match_operand:GPI 3 "register_operand" "r")))] >> >> Which is a non-flag setting add with shifted operand. >> >>> Either way I think it is another backend issue, so do you approve that >>> I commit this patch now? >> >> Not yet. I think there's something fundamental amiss here. >> >> BTW, it looks to me as though addptr3 should have exactly the same >> operand rules as add3 (documentation reads "like add3"), so a >> shifted operand shouldn't be supported there either. If that isn't the >> case then that should be clearly called out in the documentation. >> >> R. >> > > PS. > > I presume you are aware of the canonicalization rules for add? That is, > for a shift-and-add operation, the shift operand must appear first. Ie. > > (plus (shift (op, op)), op) > > not > > (plus (op, (shift (op, op)) Hi Richard, Thanks for the comments. I realized that the not-recognized insn issue is because the original patch build non-canonical expressions. When reloading address expression, LRA generates non-canonical register scaled insn, which can't be recognized by aarch64 backend. Here is the updated patch using canonical form pattern, it passes bootstrap and regression test. Well, the ivo failure still exists, but it analyzed in the original message. Is this patch OK? As for Jiong's concern about the additional extension instruction, I think this only stands for atmoic load store instructions. For general load store, AArch64 supports zext/sext in register scaling addressing mode, the additional instruction can be forward propagated into memory reference. The problem for atomic load store is AArch64 only supports direct register addressing mode. After LRA reloads address expression out of memory reference, there is no combine/fwprop optimizer to merge instructions. The problem is atomic_store's predicate doesn't match its constraint. The predicate used for atomic_store is memory_operand, while all other atomic patterns use aarch64_sync_memory_operand. I think this might be a typo. With this change, expand will not generate addressing mode requiring reload anymore. I will test another patch fixing this. Thanks, bin > > R. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3fe2f0f..5b3e3c4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) We try to pick as large a range for the offset as possible to maximize the chance of a CSE. However, for aligned addresses we limit the range to 4k so that structures with different sized - elements are likely to use the same base. */ + elements are likely to use the same base. We need to be careful + not split CONST for some forms address expressions, otherwise it + will generate sub-optimal code. */ if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) { HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); HOST_WIDE_INT base_offset; + if (GET_CODE (XEXP (x, 0)) == PLUS) + { + rtx op0 = XEXP (XEXP (x, 0), 0); + rtx op1 = XEXP (XEXP (x, 0), 1); + + /* For addr expression in the form like "r1 + r2 + 0x3ffc". +Since the offset is within range supported by addressing +mode "reg+offset", we don't split the const and legalize +it into below insn and expr sequence: + r3
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Tue, Nov 24, 2015 at 5:56 PM, Richard Earnshaw wrote: > On 24/11/15 02:51, Bin.Cheng wrote: The aarch64's problem is we don't define addptr3 pattern, and we don't >> have direct insn pattern describing the "x + y << z". According to >> gcc internal: >> >> ‘addptrm3’ >> Like addm3 but is guaranteed to only be used for address calculations. >> The expanded code is not allowed to clobber the condition code. It >> only needs to be defined if addm3 sets the condition code. >>> > >>> > addm3 on aarch64 does not set the condition codes, so by this rule we >>> > shouldn't need to define this pattern. >> Hi Richard, >> I think that rule has a prerequisite that backend needs to support >> register shifted addition in addm3 pattern. > > addm3 is a named pattern and its format is well defined. It does not > take a shifted operand and never has. > >> Apparently for AArch64, >> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >> "does not set the condition codes" actually, because both >> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. > > You appear to be confusing named patterns (used by expand) with > recognizers. Anyway, we have I understand the difference between expanding and recognizer. I missed below non-set-condition-flags insn patterns as you pointed out, that's what I meant we don't support "does not set the condition codes" wrto either expanding or recognizer. Anyway, I was wrong and we do have such nameless patterns. > > (define_insn "*add__" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 > "aarch64_shift_imm_" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > > Which is a non-flag setting add with shifted operand. > >> Either way I think it is another backend issue, so do you approve that >> I commit this patch now? > > Not yet. I think there's something fundamental amiss here. > > BTW, it looks to me as though addptr3 should have exactly the same > operand rules as add3 (documentation reads "like add3"), so a > shifted operand shouldn't be supported there either. If that isn't the > case then that should be clearly called out in the documentation. The direct cause of this ICE is LRA generates below insn sequence when reloading "x=y+z*const" where the mult part is actually a register scale operation: r1 = z*const x = y + r1 It generates mult directly but aarch64 doesn't have pattern describing mult with constant. We do have pattern for the corresponding shift insn. As a result, the constant mult insn is not recognized. This can be fixed by using force_operand to generate valid mult or shift instructions, is it feasible in lra? For this case, we can do better reload if we support scaled add in addptr pattern. After some archaeological work, I found addptr is added for s390 which supports addressing mode like "base + index + disp" and its variants. Maybe that's why the pattern's documentation doesn't explicitly describe the scaled operation. IMHO, this pattern should be target dependent and be able to handle addressing modes that each target supports. A side proof is in code and comment of function lra_emit_add. It explicitly handles, describes scaled add address expression. CCing Vlad and Andreas since they are the original authors and may help here. Thanks, bin
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 24/11/15 14:36, Jiong Wang wrote: > > > On 24/11/15 13:23, Richard Earnshaw wrote: >> On 24/11/15 13:06, Jiong Wang wrote: >>> >>> On 24/11/15 10:18, Richard Earnshaw wrote: I presume you are aware of the canonicalization rules for add? That is, for a shift-and-add operation, the shift operand must appear first. Ie. (plus (shift (op, op)), op) not (plus (op, (shift (op, op)) R. >>> Looks to me it's not optimal to generate invalid mem addr, for example >>> (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r, >>> r), imm), >>> in the first place. Those complex rtx inside is hidden by the permissive >>> memory_operand predication, and only exposed during reload by stricter >>> constraints, then reload need to extra work. If we expose those >>> complex rtx >>> earlier then some earlier rtl pass may find more optimization >>> opportunities, for >>> example combine. >>> >>> The following simple modification fix the ICE and generates best >>> sequences to me: >>> >>> - return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1); >>> + addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); >>> + emit_insn (gen_rtx_SET (base, addr)); >>> + return base; >>> >> That wouldn't be right either if op1 could be a const_int. > > Indeed, though it would be strange to me if op1 would be const_int here, > given > those early if check, if it is, then the incoming address rtx would be > something > like the following without two const_int folded. > > + > / \ >/ \ > +const_int > / \ > / \ >Reg const_int > > or we could sync the rtx checked in the early > aarch64_legitimate_address_hook_p, > it will return false if op1 be const_int. > > - rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1); > + rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); > > The safest thing is to insert a call to swap_commutative_operands_p and then switch the order over if that returns true. R. >> >> R. >> >>>67 add x1, x29, 48 >>>68 add x1, x1, x0, sxtw 3 >>>69 stlrx19, [x1] >>> >>> instead of >>> >>>67 add x1, x29, 64 >>>68 add x0, x1, x0, sxtw 3 >>>69 sub x0, x0, #16 >>>70 stlrx19, [x0] >>> >>> or >>> >>>67 sxtwx0, w0 >>>68 add x1, x29, 48 >>>69 add x1, x1, x0, sxtw 3 >>>70 stlrx19, [x1] >>> >>> >>> >>> >>> >
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 24/11/15 13:23, Richard Earnshaw wrote: On 24/11/15 13:06, Jiong Wang wrote: On 24/11/15 10:18, Richard Earnshaw wrote: I presume you are aware of the canonicalization rules for add? That is, for a shift-and-add operation, the shift operand must appear first. Ie. (plus (shift (op, op)), op) not (plus (op, (shift (op, op)) R. Looks to me it's not optimal to generate invalid mem addr, for example (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r, r), imm), in the first place. Those complex rtx inside is hidden by the permissive memory_operand predication, and only exposed during reload by stricter constraints, then reload need to extra work. If we expose those complex rtx earlier then some earlier rtl pass may find more optimization opportunities, for example combine. The following simple modification fix the ICE and generates best sequences to me: - return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1); + addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); + emit_insn (gen_rtx_SET (base, addr)); + return base; That wouldn't be right either if op1 could be a const_int. Indeed, though it would be strange to me if op1 would be const_int here, given those early if check, if it is, then the incoming address rtx would be something like the following without two const_int folded. + / \ / \ +const_int / \ / \ Reg const_int or we could sync the rtx checked in the early aarch64_legitimate_address_hook_p, it will return false if op1 be const_int. - rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1); + rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); R. 67 add x1, x29, 48 68 add x1, x1, x0, sxtw 3 69 stlrx19, [x1] instead of 67 add x1, x29, 64 68 add x0, x1, x0, sxtw 3 69 sub x0, x0, #16 70 stlrx19, [x0] or 67 sxtwx0, w0 68 add x1, x29, 48 69 add x1, x1, x0, sxtw 3 70 stlrx19, [x1]
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 24/11/15 13:06, Jiong Wang wrote: > > > On 24/11/15 10:18, Richard Earnshaw wrote: >> I presume you are aware of the canonicalization rules for add? That is, >> for a shift-and-add operation, the shift operand must appear first. Ie. >> >> (plus (shift (op, op)), op) >> >> not >> >> (plus (op, (shift (op, op)) >> >> R. > > Looks to me it's not optimal to generate invalid mem addr, for example > (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r, > r), imm), > in the first place. Those complex rtx inside is hidden by the permissive > memory_operand predication, and only exposed during reload by stricter > constraints, then reload need to extra work. If we expose those complex rtx > earlier then some earlier rtl pass may find more optimization > opportunities, for > example combine. > > The following simple modification fix the ICE and generates best > sequences to me: > > - return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1); > + addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); > + emit_insn (gen_rtx_SET (base, addr)); > + return base; > That wouldn't be right either if op1 could be a const_int. R. > 67 add x1, x29, 48 > 68 add x1, x1, x0, sxtw 3 > 69 stlrx19, [x1] > > instead of > > 67 add x1, x29, 64 > 68 add x0, x1, x0, sxtw 3 > 69 sub x0, x0, #16 > 70 stlrx19, [x0] > > or > > 67 sxtwx0, w0 > 68 add x1, x29, 48 > 69 add x1, x1, x0, sxtw 3 > 70 stlrx19, [x1] > > > > >
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 24/11/15 10:18, Richard Earnshaw wrote: I presume you are aware of the canonicalization rules for add? That is, for a shift-and-add operation, the shift operand must appear first. Ie. (plus (shift (op, op)), op) not (plus (op, (shift (op, op)) R. Looks to me it's not optimal to generate invalid mem addr, for example (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r, r), imm), in the first place. Those complex rtx inside is hidden by the permissive memory_operand predication, and only exposed during reload by stricter constraints, then reload need to extra work. If we expose those complex rtx earlier then some earlier rtl pass may find more optimization opportunities, for example combine. The following simple modification fix the ICE and generates best sequences to me: - return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1); + addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base); + emit_insn (gen_rtx_SET (base, addr)); + return base; 67 add x1, x29, 48 68 add x1, x1, x0, sxtw 3 69 stlrx19, [x1] instead of 67 add x1, x29, 64 68 add x0, x1, x0, sxtw 3 69 sub x0, x0, #16 70 stlrx19, [x0] or 67 sxtwx0, w0 68 add x1, x29, 48 69 add x1, x1, x0, sxtw 3 70 stlrx19, [x1]
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 24/11/15 09:56, Richard Earnshaw wrote: > On 24/11/15 02:51, Bin.Cheng wrote: The aarch64's problem is we don't define addptr3 pattern, and we don't >> have direct insn pattern describing the "x + y << z". According to >> gcc internal: >> >> ‘addptrm3’ >> Like addm3 but is guaranteed to only be used for address calculations. >> The expanded code is not allowed to clobber the condition code. It >> only needs to be defined if addm3 sets the condition code. addm3 on aarch64 does not set the condition codes, so by this rule we shouldn't need to define this pattern. >> Hi Richard, >> I think that rule has a prerequisite that backend needs to support >> register shifted addition in addm3 pattern. > > addm3 is a named pattern and its format is well defined. It does not > take a shifted operand and never has. > >> Apparently for AArch64, >> addm3 only supports "reg+reg" or "reg+imm". Also we don't really >> "does not set the condition codes" actually, because both >> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. > > You appear to be confusing named patterns (used by expand) with > recognizers. Anyway, we have > > (define_insn "*add__" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") > (match_operand:QI 2 > "aarch64_shift_imm_" "n")) > (match_operand:GPI 3 "register_operand" "r")))] > > Which is a non-flag setting add with shifted operand. > >> Either way I think it is another backend issue, so do you approve that >> I commit this patch now? > > Not yet. I think there's something fundamental amiss here. > > BTW, it looks to me as though addptr3 should have exactly the same > operand rules as add3 (documentation reads "like add3"), so a > shifted operand shouldn't be supported there either. If that isn't the > case then that should be clearly called out in the documentation. > > R. > PS. I presume you are aware of the canonicalization rules for add? That is, for a shift-and-add operation, the shift operand must appear first. Ie. (plus (shift (op, op)), op) not (plus (op, (shift (op, op)) R.
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 24/11/15 02:51, Bin.Cheng wrote: >>> The aarch64's problem is we don't define addptr3 pattern, and we don't >>> >> have direct insn pattern describing the "x + y << z". According to >>> >> gcc internal: >>> >> >>> >> ‘addptrm3’ >>> >> Like addm3 but is guaranteed to only be used for address calculations. >>> >> The expanded code is not allowed to clobber the condition code. It >>> >> only needs to be defined if addm3 sets the condition code. >> > >> > addm3 on aarch64 does not set the condition codes, so by this rule we >> > shouldn't need to define this pattern. > Hi Richard, > I think that rule has a prerequisite that backend needs to support > register shifted addition in addm3 pattern. addm3 is a named pattern and its format is well defined. It does not take a shifted operand and never has. > Apparently for AArch64, > addm3 only supports "reg+reg" or "reg+imm". Also we don't really > "does not set the condition codes" actually, because both > "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags. You appear to be confusing named patterns (used by expand) with recognizers. Anyway, we have (define_insn "*add__" [(set (match_operand:GPI 0 "register_operand" "=r") (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") (match_operand:QI 2 "aarch64_shift_imm_" "n")) (match_operand:GPI 3 "register_operand" "r")))] Which is a non-flag setting add with shifted operand. > Either way I think it is another backend issue, so do you approve that > I commit this patch now? Not yet. I think there's something fundamental amiss here. BTW, it looks to me as though addptr3 should have exactly the same operand rules as add3 (documentation reads "like add3"), so a shifted operand shouldn't be supported there either. If that isn't the case then that should be clearly called out in the documentation. R.
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Sat, Nov 21, 2015 at 1:39 AM, Richard Earnshaw wrote: > On 20/11/15 08:31, Bin.Cheng wrote: >> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng wrote: >>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh >>> wrote: On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: > Hi, > GIMPLE IVO needs to call backend interface to calculate costs for addr > expressions like below: >FORM1: "r73 + r74 + 16380" >FORM2: "r73 << 2 + r74 + 16380" > > They are invalid address expression on AArch64, so will be legitimized by > aarch64_legitimize_address. Below are what we got from that function: > > For FORM1, the address expression is legitimized into below insn sequence > and rtx: >r84:DI=r73:DI+r74:DI >r85:DI=r84:DI+0x3000 >r83:DI=r85:DI >"r83 + 4092" > > For FORM2, the address expression is legitimized into below insn sequence > and rtx: >r108:DI=r73:DI<<0x2 >r109:DI=r108:DI+r74:DI >r110:DI=r109:DI+0x3000 >r107:DI=r110:DI >"r107 + 4092" > > So the costs computed are 12/16 respectively. The high cost prevents IVO > from choosing right candidates. Besides cost computation, I also think > the > legitmization is bad in terms of code generation. > The root cause in aarch64_legitimize_address can be described by it's > comment: >/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), > where mask is selected by alignment and size of the offset. > We try to pick as large a range for the offset as possible to > maximize the chance of a CSE. However, for aligned addresses > we limit the range to 4k so that structures with different sized > elements are likely to use the same base. */ > I think the split of CONST is intended for REG+CONST where the const > offset > is not in the range of AArch64's addressing modes. Unfortunately, it > doesn't explicitly handle/reject "REG+REG+CONST" and > "REG+REG< when the CONST are in the range of addressing modes. As a result, these > two > cases fallthrough this logic, resulting in sub-optimal results. > > It's obvious we can do below legitimization: > FORM1: >r83:DI=r73:DI+r74:DI >"r83 + 16380" > FORM2: >r107:DI=0x3ffc >r106:DI=r74:DI+r107:DI > REG_EQUAL r74:DI+0x3ffc >"r106 + r73 << 2" > > This patch handles these two cases as described. Thanks for the description, it made the patch very easy to review. I only have a style comment. > Bootstrap & test on AArch64 along with other patch. Is it OK? > > 2015-11-04 Bin Cheng > Jiong Wang > > * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle > address expressions like REG+REG+CONST and REG+NON_REG+CONST. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 5c8604f..47875ac 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x > */, machine_mode mode) > { >HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >HOST_WIDE_INT base_offset; > + rtx op0 = XEXP (x,0); > + > + if (GET_CODE (op0) == PLUS) > + { > + rtx op0_ = XEXP (op0, 0); > + rtx op1_ = XEXP (op0, 1); I don't see this trailing _ on a variable name in many places in the source tree (mostly in the Go frontend), and certainly not in the aarch64 backend. Can we pick a different name for op0_ and op1_? > + > + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will > + reach here, the 'CONST' may be valid in which case we should > + not split. */ > + if (REG_P (op0_) && REG_P (op1_)) > + { > + machine_mode addr_mode = GET_MODE (op0); > + rtx addr = gen_reg_rtx (addr_mode); > + > + rtx ret = plus_constant (addr_mode, addr, offset); > + if (aarch64_legitimate_address_hook_p (mode, ret, false)) > + { > + emit_insn (gen_adddi3 (addr, op0_, op1_)); > + return ret; > + } > + } > + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) > + will reach here. If (PLUS REG, NON_REG) is valid addr expr, > + we split it into Y=REG+CONST, Y+NON_REG. */ > + else if (REG_P (op0_) || REG_P (op1_)) > + { > + machine_mode addr_mode = GET_MODE (op0); > + rtx addr = gen_reg_rtx (addr_mode); > + > + /* Switch to make sure that register is in op0_. */ > + if (REG_P (op1_)) > +
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 20/11/15 08:31, Bin.Cheng wrote: > On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng wrote: >> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh >> wrote: >>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: Hi, GIMPLE IVO needs to call backend interface to calculate costs for addr expressions like below: FORM1: "r73 + r74 + 16380" FORM2: "r73 << 2 + r74 + 16380" They are invalid address expression on AArch64, so will be legitimized by aarch64_legitimize_address. Below are what we got from that function: For FORM1, the address expression is legitimized into below insn sequence and rtx: r84:DI=r73:DI+r74:DI r85:DI=r84:DI+0x3000 r83:DI=r85:DI "r83 + 4092" For FORM2, the address expression is legitimized into below insn sequence and rtx: r108:DI=r73:DI<<0x2 r109:DI=r108:DI+r74:DI r110:DI=r109:DI+0x3000 r107:DI=r110:DI "r107 + 4092" So the costs computed are 12/16 respectively. The high cost prevents IVO from choosing right candidates. Besides cost computation, I also think the legitmization is bad in terms of code generation. The root cause in aarch64_legitimize_address can be described by it's comment: /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), where mask is selected by alignment and size of the offset. We try to pick as large a range for the offset as possible to maximize the chance of a CSE. However, for aligned addresses we limit the range to 4k so that structures with different sized elements are likely to use the same base. */ I think the split of CONST is intended for REG+CONST where the const offset is not in the range of AArch64's addressing modes. Unfortunately, it doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>>> when the CONST are in the range of addressing modes. As a result, these two cases fallthrough this logic, resulting in sub-optimal results. It's obvious we can do below legitimization: FORM1: r83:DI=r73:DI+r74:DI "r83 + 16380" FORM2: r107:DI=0x3ffc r106:DI=r74:DI+r107:DI REG_EQUAL r74:DI+0x3ffc "r106 + r73 << 2" This patch handles these two cases as described. >>> >>> Thanks for the description, it made the patch very easy to review. I only >>> have a style comment. >>> Bootstrap & test on AArch64 along with other patch. Is it OK? 2015-11-04 Bin Cheng Jiong Wang * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle address expressions like REG+REG+CONST and REG+NON_REG+CONST. >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5c8604f..47875ac 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) { HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); HOST_WIDE_INT base_offset; + rtx op0 = XEXP (x,0); + + if (GET_CODE (op0) == PLUS) + { + rtx op0_ = XEXP (op0, 0); + rtx op1_ = XEXP (op0, 1); >>> >>> I don't see this trailing _ on a variable name in many places in the source >>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend. >>> Can we pick a different name for op0_ and op1_? >>> + + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will + reach here, the 'CONST' may be valid in which case we should + not split. */ + if (REG_P (op0_) && REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + rtx ret = plus_constant (addr_mode, addr, offset); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) + { + emit_insn (gen_adddi3 (addr, op0_, op1_)); + return ret; + } + } + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) + will reach here. If (PLUS REG, NON_REG) is valid addr expr, + we split it into Y=REG+CONST, Y+NON_REG. */ + else if (REG_P (op0_) || REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + /* Switch to make sure that register is in op0_. */ + if (REG_P (op1_)) + std::swap (op0_, op1_); + + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) +
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng wrote: > On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh > wrote: >> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: >>> Hi, >>> GIMPLE IVO needs to call backend interface to calculate costs for addr >>> expressions like below: >>>FORM1: "r73 + r74 + 16380" >>>FORM2: "r73 << 2 + r74 + 16380" >>> >>> They are invalid address expression on AArch64, so will be legitimized by >>> aarch64_legitimize_address. Below are what we got from that function: >>> >>> For FORM1, the address expression is legitimized into below insn sequence >>> and rtx: >>>r84:DI=r73:DI+r74:DI >>>r85:DI=r84:DI+0x3000 >>>r83:DI=r85:DI >>>"r83 + 4092" >>> >>> For FORM2, the address expression is legitimized into below insn sequence >>> and rtx: >>>r108:DI=r73:DI<<0x2 >>>r109:DI=r108:DI+r74:DI >>>r110:DI=r109:DI+0x3000 >>>r107:DI=r110:DI >>>"r107 + 4092" >>> >>> So the costs computed are 12/16 respectively. The high cost prevents IVO >>> from choosing right candidates. Besides cost computation, I also think the >>> legitmization is bad in terms of code generation. >>> The root cause in aarch64_legitimize_address can be described by it's >>> comment: >>>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), >>> where mask is selected by alignment and size of the offset. >>> We try to pick as large a range for the offset as possible to >>> maximize the chance of a CSE. However, for aligned addresses >>> we limit the range to 4k so that structures with different sized >>> elements are likely to use the same base. */ >>> I think the split of CONST is intended for REG+CONST where the const offset >>> is not in the range of AArch64's addressing modes. Unfortunately, it >>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>> when the CONST are in the range of addressing modes. As a result, these two >>> cases fallthrough this logic, resulting in sub-optimal results. >>> >>> It's obvious we can do below legitimization: >>> FORM1: >>>r83:DI=r73:DI+r74:DI >>>"r83 + 16380" >>> FORM2: >>>r107:DI=0x3ffc >>>r106:DI=r74:DI+r107:DI >>> REG_EQUAL r74:DI+0x3ffc >>>"r106 + r73 << 2" >>> >>> This patch handles these two cases as described. >> >> Thanks for the description, it made the patch very easy to review. I only >> have a style comment. >> >>> Bootstrap & test on AArch64 along with other patch. Is it OK? >>> >>> 2015-11-04 Bin Cheng >>> Jiong Wang >>> >>> * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle >>> address expressions like REG+REG+CONST and REG+NON_REG+CONST. >> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 5c8604f..47875ac 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x >>> */, machine_mode mode) >>> { >>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >>>HOST_WIDE_INT base_offset; >>> + rtx op0 = XEXP (x,0); >>> + >>> + if (GET_CODE (op0) == PLUS) >>> + { >>> + rtx op0_ = XEXP (op0, 0); >>> + rtx op1_ = XEXP (op0, 1); >> >> I don't see this trailing _ on a variable name in many places in the source >> tree (mostly in the Go frontend), and certainly not in the aarch64 backend. >> Can we pick a different name for op0_ and op1_? >> >>> + >>> + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will >>> + reach here, the 'CONST' may be valid in which case we should >>> + not split. */ >>> + if (REG_P (op0_) && REG_P (op1_)) >>> + { >>> + machine_mode addr_mode = GET_MODE (op0); >>> + rtx addr = gen_reg_rtx (addr_mode); >>> + >>> + rtx ret = plus_constant (addr_mode, addr, offset); >>> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >>> + { >>> + emit_insn (gen_adddi3 (addr, op0_, op1_)); >>> + return ret; >>> + } >>> + } >>> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) >>> + will reach here. If (PLUS REG, NON_REG) is valid addr expr, >>> + we split it into Y=REG+CONST, Y+NON_REG. */ >>> + else if (REG_P (op0_) || REG_P (op1_)) >>> + { >>> + machine_mode addr_mode = GET_MODE (op0); >>> + rtx addr = gen_reg_rtx (addr_mode); >>> + >>> + /* Switch to make sure that register is in op0_. */ >>> + if (REG_P (op1_)) >>> + std::swap (op0_, op1_); >>> + >>> + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >>> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >>> + { >>> + addr = force_operand (plus_constant (addr_mode, >>> +op0_, offset), >>> +
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh wrote: > On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: >> Hi, >> GIMPLE IVO needs to call backend interface to calculate costs for addr >> expressions like below: >>FORM1: "r73 + r74 + 16380" >>FORM2: "r73 << 2 + r74 + 16380" >> >> They are invalid address expression on AArch64, so will be legitimized by >> aarch64_legitimize_address. Below are what we got from that function: >> >> For FORM1, the address expression is legitimized into below insn sequence >> and rtx: >>r84:DI=r73:DI+r74:DI >>r85:DI=r84:DI+0x3000 >>r83:DI=r85:DI >>"r83 + 4092" >> >> For FORM2, the address expression is legitimized into below insn sequence >> and rtx: >>r108:DI=r73:DI<<0x2 >>r109:DI=r108:DI+r74:DI >>r110:DI=r109:DI+0x3000 >>r107:DI=r110:DI >>"r107 + 4092" >> >> So the costs computed are 12/16 respectively. The high cost prevents IVO >> from choosing right candidates. Besides cost computation, I also think the >> legitmization is bad in terms of code generation. >> The root cause in aarch64_legitimize_address can be described by it's >> comment: >>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), >> where mask is selected by alignment and size of the offset. >> We try to pick as large a range for the offset as possible to >> maximize the chance of a CSE. However, for aligned addresses >> we limit the range to 4k so that structures with different sized >> elements are likely to use the same base. */ >> I think the split of CONST is intended for REG+CONST where the const offset >> is not in the range of AArch64's addressing modes. Unfortunately, it >> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<> when the CONST are in the range of addressing modes. As a result, these two >> cases fallthrough this logic, resulting in sub-optimal results. >> >> It's obvious we can do below legitimization: >> FORM1: >>r83:DI=r73:DI+r74:DI >>"r83 + 16380" >> FORM2: >>r107:DI=0x3ffc >>r106:DI=r74:DI+r107:DI >> REG_EQUAL r74:DI+0x3ffc >>"r106 + r73 << 2" >> >> This patch handles these two cases as described. > > Thanks for the description, it made the patch very easy to review. I only > have a style comment. > >> Bootstrap & test on AArch64 along with other patch. Is it OK? >> >> 2015-11-04 Bin Cheng >> Jiong Wang >> >> * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle >> address expressions like REG+REG+CONST and REG+NON_REG+CONST. > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 5c8604f..47875ac 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, >> machine_mode mode) >> { >>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >>HOST_WIDE_INT base_offset; >> + rtx op0 = XEXP (x,0); >> + >> + if (GET_CODE (op0) == PLUS) >> + { >> + rtx op0_ = XEXP (op0, 0); >> + rtx op1_ = XEXP (op0, 1); > > I don't see this trailing _ on a variable name in many places in the source > tree (mostly in the Go frontend), and certainly not in the aarch64 backend. > Can we pick a different name for op0_ and op1_? > >> + >> + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will >> + reach here, the 'CONST' may be valid in which case we should >> + not split. */ >> + if (REG_P (op0_) && REG_P (op1_)) >> + { >> + machine_mode addr_mode = GET_MODE (op0); >> + rtx addr = gen_reg_rtx (addr_mode); >> + >> + rtx ret = plus_constant (addr_mode, addr, offset); >> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >> + { >> + emit_insn (gen_adddi3 (addr, op0_, op1_)); >> + return ret; >> + } >> + } >> + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) >> + will reach here. If (PLUS REG, NON_REG) is valid addr expr, >> + we split it into Y=REG+CONST, Y+NON_REG. */ >> + else if (REG_P (op0_) || REG_P (op1_)) >> + { >> + machine_mode addr_mode = GET_MODE (op0); >> + rtx addr = gen_reg_rtx (addr_mode); >> + >> + /* Switch to make sure that register is in op0_. */ >> + if (REG_P (op1_)) >> + std::swap (op0_, op1_); >> + >> + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >> + if (aarch64_legitimate_address_hook_p (mode, ret, false)) >> + { >> + addr = force_operand (plus_constant (addr_mode, >> +op0_, offset), >> + NULL_RTX); >> + ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); >> + return ret; >> + } > > The
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: > Hi, > GIMPLE IVO needs to call backend interface to calculate costs for addr > expressions like below: >FORM1: "r73 + r74 + 16380" >FORM2: "r73 << 2 + r74 + 16380" > > They are invalid address expression on AArch64, so will be legitimized by > aarch64_legitimize_address. Below are what we got from that function: > > For FORM1, the address expression is legitimized into below insn sequence > and rtx: >r84:DI=r73:DI+r74:DI >r85:DI=r84:DI+0x3000 >r83:DI=r85:DI >"r83 + 4092" > > For FORM2, the address expression is legitimized into below insn sequence > and rtx: >r108:DI=r73:DI<<0x2 >r109:DI=r108:DI+r74:DI >r110:DI=r109:DI+0x3000 >r107:DI=r110:DI >"r107 + 4092" > > So the costs computed are 12/16 respectively. The high cost prevents IVO > from choosing right candidates. Besides cost computation, I also think the > legitmization is bad in terms of code generation. > The root cause in aarch64_legitimize_address can be described by it's > comment: >/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), > where mask is selected by alignment and size of the offset. > We try to pick as large a range for the offset as possible to > maximize the chance of a CSE. However, for aligned addresses > we limit the range to 4k so that structures with different sized > elements are likely to use the same base. */ > I think the split of CONST is intended for REG+CONST where the const offset > is not in the range of AArch64's addressing modes. Unfortunately, it > doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG< when the CONST are in the range of addressing modes. As a result, these two > cases fallthrough this logic, resulting in sub-optimal results. > > It's obvious we can do below legitimization: > FORM1: >r83:DI=r73:DI+r74:DI >"r83 + 16380" > FORM2: >r107:DI=0x3ffc >r106:DI=r74:DI+r107:DI > REG_EQUAL r74:DI+0x3ffc >"r106 + r73 << 2" > > This patch handles these two cases as described. Thanks for the description, it made the patch very easy to review. I only have a style comment. > Bootstrap & test on AArch64 along with other patch. Is it OK? > > 2015-11-04 Bin Cheng > Jiong Wang > > * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle > address expressions like REG+REG+CONST and REG+NON_REG+CONST. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 5c8604f..47875ac 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, > machine_mode mode) > { >HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); >HOST_WIDE_INT base_offset; > + rtx op0 = XEXP (x,0); > + > + if (GET_CODE (op0) == PLUS) > + { > + rtx op0_ = XEXP (op0, 0); > + rtx op1_ = XEXP (op0, 1); I don't see this trailing _ on a variable name in many places in the source tree (mostly in the Go frontend), and certainly not in the aarch64 backend. Can we pick a different name for op0_ and op1_? > + > + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will > + reach here, the 'CONST' may be valid in which case we should > + not split. */ > + if (REG_P (op0_) && REG_P (op1_)) > + { > + machine_mode addr_mode = GET_MODE (op0); > + rtx addr = gen_reg_rtx (addr_mode); > + > + rtx ret = plus_constant (addr_mode, addr, offset); > + if (aarch64_legitimate_address_hook_p (mode, ret, false)) > + { > + emit_insn (gen_adddi3 (addr, op0_, op1_)); > + return ret; > + } > + } > + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) > + will reach here. If (PLUS REG, NON_REG) is valid addr expr, > + we split it into Y=REG+CONST, Y+NON_REG. */ > + else if (REG_P (op0_) || REG_P (op1_)) > + { > + machine_mode addr_mode = GET_MODE (op0); > + rtx addr = gen_reg_rtx (addr_mode); > + > + /* Switch to make sure that register is in op0_. */ > + if (REG_P (op1_)) > + std::swap (op0_, op1_); > + > + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); > + if (aarch64_legitimate_address_hook_p (mode, ret, false)) > + { > + addr = force_operand (plus_constant (addr_mode, > +op0_, offset), > + NULL_RTX); > + ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); > + return ret; > + } The logic here is a bit hairy to follow, you construct a PLUS RTX to check aarch64_legitimate_address_hook_p, then construct a different PLUS RTX to use as the return value. This can pr
[PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
Hi, GIMPLE IVO needs to call backend interface to calculate costs for addr expressions like below: FORM1: "r73 + r74 + 16380" FORM2: "r73 << 2 + r74 + 16380" They are invalid address expression on AArch64, so will be legitimized by aarch64_legitimize_address. Below are what we got from that function: For FORM1, the address expression is legitimized into below insn sequence and rtx: r84:DI=r73:DI+r74:DI r85:DI=r84:DI+0x3000 r83:DI=r85:DI "r83 + 4092" For FORM2, the address expression is legitimized into below insn sequence and rtx: r108:DI=r73:DI<<0x2 r109:DI=r108:DI+r74:DI r110:DI=r109:DI+0x3000 r107:DI=r110:DI "r107 + 4092" So the costs computed are 12/16 respectively. The high cost prevents IVO from choosing right candidates. Besides cost computation, I also think the legitmization is bad in terms of code generation. The root cause in aarch64_legitimize_address can be described by it's comment: /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), where mask is selected by alignment and size of the offset. We try to pick as large a range for the offset as possible to maximize the chance of a CSE. However, for aligned addresses we limit the range to 4k so that structures with different sized elements are likely to use the same base. */ I think the split of CONST is intended for REG+CONST where the const offset is not in the range of AArch64's addressing modes. Unfortunately, it doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG< Jiong Wang * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle address expressions like REG+REG+CONST and REG+NON_REG+CONST. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5c8604f..47875ac 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) { HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); HOST_WIDE_INT base_offset; + rtx op0 = XEXP (x,0); + + if (GET_CODE (op0) == PLUS) + { + rtx op0_ = XEXP (op0, 0); + rtx op1_ = XEXP (op0, 1); + + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will +reach here, the 'CONST' may be valid in which case we should +not split. */ + if (REG_P (op0_) && REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + rtx ret = plus_constant (addr_mode, addr, offset); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) + { + emit_insn (gen_adddi3 (addr, op0_, op1_)); + return ret; + } + } + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) +will reach here. If (PLUS REG, NON_REG) is valid addr expr, +we split it into Y=REG+CONST, Y+NON_REG. */ + else if (REG_P (op0_) || REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + /* Switch to make sure that register is in op0_. */ + if (REG_P (op1_)) + std::swap (op0_, op1_); + + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) + { + addr = force_operand (plus_constant (addr_mode, + op0_, offset), + NULL_RTX); + ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); + return ret; + } + } + } /* Does it look like we'll need a load/store-pair operation? */ if (GET_MODE_SIZE (mode) > 16