Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-03 Thread Bin.Cheng
On Thu, Dec 3, 2015 at 6:26 PM, Richard Earnshaw
 wrote:
> On 03/12/15 05:26, Bin.Cheng wrote:
>> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
>>  wrote:
>>> On 01/12/15 03:19, Bin.Cheng wrote:
 On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
  wrote:
> On 24/11/15 09:56, Richard Earnshaw wrote:
>> On 24/11/15 02:51, Bin.Cheng wrote:
> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>> have direct insn pattern describing the "x + y << z".  According to
>>> gcc internal:
>>>
>>> ‘addptrm3’
>>> Like addm3 but is guaranteed to only be used for address 
>>> calculations.
>>> The expanded code is not allowed to clobber the condition code. It
>>> only needs to be defined if addm3 sets the condition code.
>
> addm3 on aarch64 does not set the condition codes, so by this rule we
> shouldn't need to define this pattern.
>>> Hi Richard,
>>> I think that rule has a prerequisite that backend needs to support
>>> register shifted addition in addm3 pattern.
>>
>> addm3 is a named pattern and its format is well defined.  It does not
>> take a shifted operand and never has.
>>
>>> Apparently for AArch64,
>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>> "does not set the condition codes" actually, because both
>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>
>> You appear to be confusing named patterns (used by expand) with
>> recognizers.  Anyway, we have
>>
>> (define_insn "*add__"
>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" 
>> "r")
>>   (match_operand:QI 2
>> "aarch64_shift_imm_" "n"))
>>   (match_operand:GPI 3 "register_operand" "r")))]
>>
>> Which is a non-flag setting add with shifted operand.
>>
>>> Either way I think it is another backend issue, so do you approve that
>>> I commit this patch now?
>>
>> Not yet.  I think there's something fundamental amiss here.
>>
>> BTW, it looks to me as though addptr3 should have exactly the same
>> operand rules as add3 (documentation reads "like add3"), so a
>> shifted operand shouldn't be supported there either.  If that isn't the
>> case then that should be clearly called out in the documentation.
>>
>> R.
>>
>
> PS.
>
> I presume you are aware of the canonicalization rules for add?  That is,
> for a shift-and-add operation, the shift operand must appear first.  Ie.
>
> (plus (shift (op, op)), op)
>
> not
>
> (plus (op, (shift (op, op))

 Hi Richard,
 Thanks for the comments.  I realized that the not-recognized insn
 issue is because the original patch build non-canonical expressions.
 When reloading address expression, LRA generates non-canonical
 register scaled insn, which can't be recognized by aarch64 backend.

 Here is the updated patch using canonical form pattern,  it passes
 bootstrap and regression test.  Well, the ivo failure still exists,
 but it analyzed in the original message.

 Is this patch OK?

 As for Jiong's concern about the additional extension instruction, I
 think this only stands for atmoic load store instructions.  For
 general load store, AArch64 supports zext/sext in register scaling
 addressing mode, the additional instruction can be forward propagated
 into memory reference.  The problem for atomic load store is AArch64
 only supports direct register addressing mode.  After LRA reloads
 address expression out of memory reference, there is no combine/fwprop
 optimizer to merge instructions.  The problem is atomic_store's
 predicate doesn't match its constraint.   The predicate used for
 atomic_store is memory_operand, while all other atomic patterns
 use aarch64_sync_memory_operand.  I think this might be a typo.  With
 this change, expand will not generate addressing mode requiring reload
 anymore.  I will test another patch fixing this.

 Thanks,
 bin
>>>
>>> Some comments inline.
>>>
>
> R.
>
> aarch64_legitimize_addr-20151128.txt
>
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3fe2f0f..5b3e3c4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
> */, machine_mode mode)
>   We try to pick as large a range for the offset as possible to
>   maximize the chance of a CSE.  However, for aligned addresses
>   we limit the range to 4k so that structures with different sized
> - elements are likely to use the s

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-03 Thread Richard Earnshaw
On 03/12/15 05:26, Bin.Cheng wrote:
> On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
>  wrote:
>> On 01/12/15 03:19, Bin.Cheng wrote:
>>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>>  wrote:
 On 24/11/15 09:56, Richard Earnshaw wrote:
> On 24/11/15 02:51, Bin.Cheng wrote:
 The aarch64's problem is we don't define addptr3 pattern, and we don't
>> have direct insn pattern describing the "x + y << z".  According to
>> gcc internal:
>>
>> ‘addptrm3’
>> Like addm3 but is guaranteed to only be used for address 
>> calculations.
>> The expanded code is not allowed to clobber the condition code. It
>> only needs to be defined if addm3 sets the condition code.

 addm3 on aarch64 does not set the condition codes, so by this rule we
 shouldn't need to define this pattern.
>> Hi Richard,
>> I think that rule has a prerequisite that backend needs to support
>> register shifted addition in addm3 pattern.
>
> addm3 is a named pattern and its format is well defined.  It does not
> take a shifted operand and never has.
>
>> Apparently for AArch64,
>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>> "does not set the condition codes" actually, because both
>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>
> You appear to be confusing named patterns (used by expand) with
> recognizers.  Anyway, we have
>
> (define_insn "*add__"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>   (match_operand:QI 2
> "aarch64_shift_imm_" "n"))
>   (match_operand:GPI 3 "register_operand" "r")))]
>
> Which is a non-flag setting add with shifted operand.
>
>> Either way I think it is another backend issue, so do you approve that
>> I commit this patch now?
>
> Not yet.  I think there's something fundamental amiss here.
>
> BTW, it looks to me as though addptr3 should have exactly the same
> operand rules as add3 (documentation reads "like add3"), so a
> shifted operand shouldn't be supported there either.  If that isn't the
> case then that should be clearly called out in the documentation.
>
> R.
>

 PS.

 I presume you are aware of the canonicalization rules for add?  That is,
 for a shift-and-add operation, the shift operand must appear first.  Ie.

 (plus (shift (op, op)), op)

 not

 (plus (op, (shift (op, op))
>>>
>>> Hi Richard,
>>> Thanks for the comments.  I realized that the not-recognized insn
>>> issue is because the original patch build non-canonical expressions.
>>> When reloading address expression, LRA generates non-canonical
>>> register scaled insn, which can't be recognized by aarch64 backend.
>>>
>>> Here is the updated patch using canonical form pattern,  it passes
>>> bootstrap and regression test.  Well, the ivo failure still exists,
>>> but it analyzed in the original message.
>>>
>>> Is this patch OK?
>>>
>>> As for Jiong's concern about the additional extension instruction, I
>>> think this only stands for atmoic load store instructions.  For
>>> general load store, AArch64 supports zext/sext in register scaling
>>> addressing mode, the additional instruction can be forward propagated
>>> into memory reference.  The problem for atomic load store is AArch64
>>> only supports direct register addressing mode.  After LRA reloads
>>> address expression out of memory reference, there is no combine/fwprop
>>> optimizer to merge instructions.  The problem is atomic_store's
>>> predicate doesn't match its constraint.   The predicate used for
>>> atomic_store is memory_operand, while all other atomic patterns
>>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>>> this change, expand will not generate addressing mode requiring reload
>>> anymore.  I will test another patch fixing this.
>>>
>>> Thanks,
>>> bin
>>
>> Some comments inline.
>>

 R.

 aarch64_legitimize_addr-20151128.txt


 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 3fe2f0f..5b3e3c4 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
 */, machine_mode mode)
   We try to pick as large a range for the offset as possible to
   maximize the chance of a CSE.  However, for aligned addresses
   we limit the range to 4k so that structures with different sized
 - elements are likely to use the same base.  */
 + elements are likely to use the same base.  We need to be careful
 + not split CONST for some forms address expressions, otherwise it
>>
>> not to sp

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-02 Thread Bin.Cheng
On Tue, Dec 1, 2015 at 6:25 PM, Richard Earnshaw
 wrote:
> On 01/12/15 03:19, Bin.Cheng wrote:
>> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>>  wrote:
>>> On 24/11/15 09:56, Richard Earnshaw wrote:
 On 24/11/15 02:51, Bin.Cheng wrote:
>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
> have direct insn pattern describing the "x + y << z".  According to
> gcc internal:
>
> ‘addptrm3’
> Like addm3 but is guaranteed to only be used for address calculations.
> The expanded code is not allowed to clobber the condition code. It
> only needs to be defined if addm3 sets the condition code.
>>>
>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>> shouldn't need to define this pattern.
> Hi Richard,
> I think that rule has a prerequisite that backend needs to support
> register shifted addition in addm3 pattern.

 addm3 is a named pattern and its format is well defined.  It does not
 take a shifted operand and never has.

> Apparently for AArch64,
> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
> "does not set the condition codes" actually, because both
> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.

 You appear to be confusing named patterns (used by expand) with
 recognizers.  Anyway, we have

 (define_insn "*add__"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
   (match_operand:QI 2
 "aarch64_shift_imm_" "n"))
   (match_operand:GPI 3 "register_operand" "r")))]

 Which is a non-flag setting add with shifted operand.

> Either way I think it is another backend issue, so do you approve that
> I commit this patch now?

 Not yet.  I think there's something fundamental amiss here.

 BTW, it looks to me as though addptr3 should have exactly the same
 operand rules as add3 (documentation reads "like add3"), so a
 shifted operand shouldn't be supported there either.  If that isn't the
 case then that should be clearly called out in the documentation.

 R.

>>>
>>> PS.
>>>
>>> I presume you are aware of the canonicalization rules for add?  That is,
>>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>>
>>> (plus (shift (op, op)), op)
>>>
>>> not
>>>
>>> (plus (op, (shift (op, op))
>>
>> Hi Richard,
>> Thanks for the comments.  I realized that the not-recognized insn
>> issue is because the original patch build non-canonical expressions.
>> When reloading address expression, LRA generates non-canonical
>> register scaled insn, which can't be recognized by aarch64 backend.
>>
>> Here is the updated patch using canonical form pattern,  it passes
>> bootstrap and regression test.  Well, the ivo failure still exists,
>> but it analyzed in the original message.
>>
>> Is this patch OK?
>>
>> As for Jiong's concern about the additional extension instruction, I
>> think this only stands for atmoic load store instructions.  For
>> general load store, AArch64 supports zext/sext in register scaling
>> addressing mode, the additional instruction can be forward propagated
>> into memory reference.  The problem for atomic load store is AArch64
>> only supports direct register addressing mode.  After LRA reloads
>> address expression out of memory reference, there is no combine/fwprop
>> optimizer to merge instructions.  The problem is atomic_store's
>> predicate doesn't match its constraint.   The predicate used for
>> atomic_store is memory_operand, while all other atomic patterns
>> use aarch64_sync_memory_operand.  I think this might be a typo.  With
>> this change, expand will not generate addressing mode requiring reload
>> anymore.  I will test another patch fixing this.
>>
>> Thanks,
>> bin
>
> Some comments inline.
>
>>>
>>> R.
>>>
>>> aarch64_legitimize_addr-20151128.txt
>>>
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 3fe2f0f..5b3e3c4 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>>> */, machine_mode mode)
>>>   We try to pick as large a range for the offset as possible to
>>>   maximize the chance of a CSE.  However, for aligned addresses
>>>   we limit the range to 4k so that structures with different sized
>>> - elements are likely to use the same base.  */
>>> + elements are likely to use the same base.  We need to be careful
>>> + not split CONST for some forms address expressions, otherwise it
>
> not to split a CONST for some forms of address expression,
>
>>> + will generate sub-optimal code.  */
>>>
>>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>>

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-01 Thread Richard Earnshaw
On 01/12/15 03:19, Bin.Cheng wrote:
> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>  wrote:
>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>> On 24/11/15 02:51, Bin.Cheng wrote:
>> The aarch64's problem is we don't define addptr3 pattern, and we don't
 have direct insn pattern describing the "x + y << z".  According to
 gcc internal:

 ‘addptrm3’
 Like addm3 but is guaranteed to only be used for address calculations.
 The expanded code is not allowed to clobber the condition code. It
 only needs to be defined if addm3 sets the condition code.
>>
>> addm3 on aarch64 does not set the condition codes, so by this rule we
>> shouldn't need to define this pattern.
 Hi Richard,
 I think that rule has a prerequisite that backend needs to support
 register shifted addition in addm3 pattern.
>>>
>>> addm3 is a named pattern and its format is well defined.  It does not
>>> take a shifted operand and never has.
>>>
 Apparently for AArch64,
 addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
 "does not set the condition codes" actually, because both
 "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>
>>> You appear to be confusing named patterns (used by expand) with
>>> recognizers.  Anyway, we have
>>>
>>> (define_insn "*add__"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>   (match_operand:QI 2
>>> "aarch64_shift_imm_" "n"))
>>>   (match_operand:GPI 3 "register_operand" "r")))]
>>>
>>> Which is a non-flag setting add with shifted operand.
>>>
 Either way I think it is another backend issue, so do you approve that
 I commit this patch now?
>>>
>>> Not yet.  I think there's something fundamental amiss here.
>>>
>>> BTW, it looks to me as though addptr3 should have exactly the same
>>> operand rules as add3 (documentation reads "like add3"), so a
>>> shifted operand shouldn't be supported there either.  If that isn't the
>>> case then that should be clearly called out in the documentation.
>>>
>>> R.
>>>
>>
>> PS.
>>
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
> 
> Hi Richard,
> Thanks for the comments.  I realized that the not-recognized insn
> issue is because the original patch build non-canonical expressions.
> When reloading address expression, LRA generates non-canonical
> register scaled insn, which can't be recognized by aarch64 backend.
> 
> Here is the updated patch using canonical form pattern,  it passes
> bootstrap and regression test.  Well, the ivo failure still exists,
> but it analyzed in the original message.
> 
> Is this patch OK?
> 
> As for Jiong's concern about the additional extension instruction, I
> think this only stands for atmoic load store instructions.  For
> general load store, AArch64 supports zext/sext in register scaling
> addressing mode, the additional instruction can be forward propagated
> into memory reference.  The problem for atomic load store is AArch64
> only supports direct register addressing mode.  After LRA reloads
> address expression out of memory reference, there is no combine/fwprop
> optimizer to merge instructions.  The problem is atomic_store's
> predicate doesn't match its constraint.   The predicate used for
> atomic_store is memory_operand, while all other atomic patterns
> use aarch64_sync_memory_operand.  I think this might be a typo.  With
> this change, expand will not generate addressing mode requiring reload
> anymore.  I will test another patch fixing this.
> 
> Thanks,
> bin

Some comments inline.

>>
>> R.
>>
>> aarch64_legitimize_addr-20151128.txt
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3fe2f0f..5b3e3c4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>> */, machine_mode mode)
>>   We try to pick as large a range for the offset as possible to
>>   maximize the chance of a CSE.  However, for aligned addresses
>>   we limit the range to 4k so that structures with different sized
>> - elements are likely to use the same base.  */
>> + elements are likely to use the same base.  We need to be careful
>> + not split CONST for some forms address expressions, otherwise it

not to split a CONST for some forms of address expression,

>> + will generate sub-optimal code.  */
>>  
>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>  {
>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>HOST_WIDE_INT base_offset;
>>  
>> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
>> +{
>> +  rt

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-30 Thread Bin.Cheng
On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
 wrote:
> On 24/11/15 09:56, Richard Earnshaw wrote:
>> On 24/11/15 02:51, Bin.Cheng wrote:
> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>> have direct insn pattern describing the "x + y << z".  According to
>>> gcc internal:
>>>
>>> ‘addptrm3’
>>> Like addm3 but is guaranteed to only be used for address calculations.
>>> The expanded code is not allowed to clobber the condition code. It
>>> only needs to be defined if addm3 sets the condition code.
>
> addm3 on aarch64 does not set the condition codes, so by this rule we
> shouldn't need to define this pattern.
>>> Hi Richard,
>>> I think that rule has a prerequisite that backend needs to support
>>> register shifted addition in addm3 pattern.
>>
>> addm3 is a named pattern and its format is well defined.  It does not
>> take a shifted operand and never has.
>>
>>> Apparently for AArch64,
>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>> "does not set the condition codes" actually, because both
>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>
>> You appear to be confusing named patterns (used by expand) with
>> recognizers.  Anyway, we have
>>
>> (define_insn "*add__"
>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>   (match_operand:QI 2
>> "aarch64_shift_imm_" "n"))
>>   (match_operand:GPI 3 "register_operand" "r")))]
>>
>> Which is a non-flag setting add with shifted operand.
>>
>>> Either way I think it is another backend issue, so do you approve that
>>> I commit this patch now?
>>
>> Not yet.  I think there's something fundamental amiss here.
>>
>> BTW, it looks to me as though addptr3 should have exactly the same
>> operand rules as add3 (documentation reads "like add3"), so a
>> shifted operand shouldn't be supported there either.  If that isn't the
>> case then that should be clearly called out in the documentation.
>>
>> R.
>>
>
> PS.
>
> I presume you are aware of the canonicalization rules for add?  That is,
> for a shift-and-add operation, the shift operand must appear first.  Ie.
>
> (plus (shift (op, op)), op)
>
> not
>
> (plus (op, (shift (op, op))

Hi Richard,
Thanks for the comments.  I realized that the not-recognized insn
issue is because the original patch build non-canonical expressions.
When reloading address expression, LRA generates non-canonical
register scaled insn, which can't be recognized by aarch64 backend.

Here is the updated patch using canonical form pattern,  it passes
bootstrap and regression test.  Well, the ivo failure still exists,
but it analyzed in the original message.

Is this patch OK?

As for Jiong's concern about the additional extension instruction, I
think this only stands for atmoic load store instructions.  For
general load store, AArch64 supports zext/sext in register scaling
addressing mode, the additional instruction can be forward propagated
into memory reference.  The problem for atomic load store is AArch64
only supports direct register addressing mode.  After LRA reloads
address expression out of memory reference, there is no combine/fwprop
optimizer to merge instructions.  The problem is atomic_store's
predicate doesn't match its constraint.   The predicate used for
atomic_store is memory_operand, while all other atomic patterns
use aarch64_sync_memory_operand.  I think this might be a typo.  With
this change, expand will not generate addressing mode requiring reload
anymore.  I will test another patch fixing this.

Thanks,
bin
>
> R.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3fe2f0f..5b3e3c4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
  We try to pick as large a range for the offset as possible to
  maximize the chance of a CSE.  However, for aligned addresses
  we limit the range to 4k so that structures with different sized
- elements are likely to use the same base.  */
+ elements are likely to use the same base.  We need to be careful
+ not split CONST for some forms address expressions, otherwise it
+ will generate sub-optimal code.  */
 
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
 {
   HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
   HOST_WIDE_INT base_offset;
 
+  if (GET_CODE (XEXP (x, 0)) == PLUS)
+   {
+ rtx op0 = XEXP (XEXP (x, 0), 0);
+ rtx op1 = XEXP (XEXP (x, 0), 1);
+
+ /* For addr expression in the form like "r1 + r2 + 0x3ffc".
+Since the offset is within range supported by addressing
+mode "reg+offset", we don't split the const and legalize
+it into below insn and expr sequence:
+  r3

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Bin.Cheng
On Tue, Nov 24, 2015 at 5:56 PM, Richard Earnshaw
 wrote:
> On 24/11/15 02:51, Bin.Cheng wrote:
 The aarch64's problem is we don't define addptr3 pattern, and we don't
 >> have direct insn pattern describing the "x + y << z".  According to
 >> gcc internal:
 >>
 >> ‘addptrm3’
 >> Like addm3 but is guaranteed to only be used for address calculations.
 >> The expanded code is not allowed to clobber the condition code. It
 >> only needs to be defined if addm3 sets the condition code.
>>> >
>>> > addm3 on aarch64 does not set the condition codes, so by this rule we
>>> > shouldn't need to define this pattern.
>> Hi Richard,
>> I think that rule has a prerequisite that backend needs to support
>> register shifted addition in addm3 pattern.
>
> addm3 is a named pattern and its format is well defined.  It does not
> take a shifted operand and never has.
>
>> Apparently for AArch64,
>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>> "does not set the condition codes" actually, because both
>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>
> You appear to be confusing named patterns (used by expand) with
> recognizers.  Anyway, we have
I understand the difference between expanding and recognizer.  I
missed below non-set-condition-flags insn patterns as you pointed out,
that's what I meant we don't support "does not set the condition
codes" wrto either expanding or recognizer.  Anyway, I was wrong and
we do have such nameless patterns.

>
> (define_insn "*add__"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>   (match_operand:QI 2
> "aarch64_shift_imm_" "n"))
>   (match_operand:GPI 3 "register_operand" "r")))]
>
> Which is a non-flag setting add with shifted operand.
>
>> Either way I think it is another backend issue, so do you approve that
>> I commit this patch now?
>
> Not yet.  I think there's something fundamental amiss here.
>
> BTW, it looks to me as though addptr3 should have exactly the same
> operand rules as add3 (documentation reads "like add3"), so a
> shifted operand shouldn't be supported there either.  If that isn't the
> case then that should be clearly called out in the documentation.
The direct cause of this ICE is LRA generates below insn sequence when
reloading "x=y+z*const" where the mult part is actually a register
scale operation:
r1 = z*const
x = y + r1
It generates mult directly but aarch64 doesn't have pattern describing
mult with constant.  We do have pattern for the corresponding shift
insn.  As a result, the constant mult insn is not recognized.  This
can be fixed by using force_operand to generate valid mult or shift
instructions, is it feasible in lra?
For this case, we can do better reload if we support scaled add in
addptr pattern.  After some
archaeological work, I found addptr is added for s390 which supports
addressing mode like "base + index + disp" and its variants.  Maybe
that's why the pattern's documentation doesn't explicitly describe the
scaled operation.  IMHO, this pattern should be target dependent and
be able to handle addressing modes that each target supports.  A side
proof is in code and comment of function lra_emit_add.  It explicitly
handles, describes scaled add address expression.

CCing Vlad and Andreas since they are the original authors and may help here.

Thanks,
bin


Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Richard Earnshaw
On 24/11/15 14:36, Jiong Wang wrote:
> 
> 
> On 24/11/15 13:23, Richard Earnshaw wrote:
>> On 24/11/15 13:06, Jiong Wang wrote:
>>>
>>> On 24/11/15 10:18, Richard Earnshaw wrote:
 I presume you are aware of the canonicalization rules for add?  That
 is,
 for a shift-and-add operation, the shift operand must appear first. 
 Ie.

 (plus (shift (op, op)), op)

 not

 (plus (op, (shift (op, op))

 R.
>>> Looks to me it's not optimal to generate invalid mem addr, for example
>>> (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r,
>>> r), imm),
>>> in the first place. Those complex rtx inside is hidden by the permissive
>>> memory_operand predication, and only exposed during reload by stricter
>>> constraints, then reload need to extra work. If we expose those
>>> complex rtx
>>> earlier then some earlier rtl pass may find more optimization
>>> opportunities, for
>>> example combine.
>>>
>>> The following simple modification fix the ICE and generates best
>>> sequences to me:
>>>
>>> - return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
>>> + addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
>>> + emit_insn (gen_rtx_SET (base, addr));
>>> + return base;
>>>
>> That wouldn't be right either if op1 could be a const_int.
> 
> Indeed, though it would be strange to me if op1 would be const_int here,
> given
> those early if check, if it is, then the incoming address rtx would be
> something
> like the following without two const_int folded.
> 
>  +
> / \
>/   \
>   +const_int
>  / \
> /   \
>Reg const_int
> 
> or we could sync the rtx checked in the early
> aarch64_legitimate_address_hook_p,
> it will return false if op1 be const_int.
> 
> - rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
> + rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
> 
> 

The safest thing is to insert a call to swap_commutative_operands_p and
then switch the order over if that returns true.

R.

>>
>> R.
>>
>>>67 add x1, x29, 48
>>>68 add x1, x1, x0, sxtw 3
>>>69 stlrx19, [x1]
>>>
>>> instead of
>>>
>>>67 add x1, x29, 64
>>>68 add x0, x1, x0, sxtw 3
>>>69 sub x0, x0, #16
>>>70 stlrx19, [x0]
>>>
>>> or
>>>
>>>67 sxtwx0, w0
>>>68 add x1, x29, 48
>>>69 add x1, x1, x0, sxtw 3
>>>70 stlrx19, [x1]
>>>
>>>
>>>
>>>
>>>
> 



Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Jiong Wang



On 24/11/15 13:23, Richard Earnshaw wrote:

On 24/11/15 13:06, Jiong Wang wrote:


On 24/11/15 10:18, Richard Earnshaw wrote:

I presume you are aware of the canonicalization rules for add?  That is,
for a shift-and-add operation, the shift operand must appear first.  Ie.

(plus (shift (op, op)), op)

not

(plus (op, (shift (op, op))

R.

Looks to me it's not optimal to generate invalid mem addr, for example
(mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r,
r), imm),
in the first place. Those complex rtx inside is hidden by the permissive
memory_operand predication, and only exposed during reload by stricter
constraints, then reload need to extra work. If we expose those complex rtx
earlier then some earlier rtl pass may find more optimization
opportunities, for
example combine.

The following simple modification fix the ICE and generates best
sequences to me:

- return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+ addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
+ emit_insn (gen_rtx_SET (base, addr));
+ return base;


That wouldn't be right either if op1 could be a const_int.


Indeed, though it would be strange to me if op1 would be const_int here, 
given
those early if check, if it is, then the incoming address rtx would be 
something

like the following without two const_int folded.

 +
/ \
   /   \
  +const_int
 / \
/   \
   Reg const_int

or we could sync the rtx checked in the early 
aarch64_legitimate_address_hook_p,

it will return false if op1 be const_int.

- rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+ rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);




R.


   67 add x1, x29, 48
   68 add x1, x1, x0, sxtw 3
   69 stlrx19, [x1]

instead of

   67 add x1, x29, 64
   68 add x0, x1, x0, sxtw 3
   69 sub x0, x0, #16
   70 stlrx19, [x0]

or

   67 sxtwx0, w0
   68 add x1, x29, 48
   69 add x1, x1, x0, sxtw 3
   70 stlrx19, [x1]









Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Richard Earnshaw
On 24/11/15 13:06, Jiong Wang wrote:
> 
> 
> On 24/11/15 10:18, Richard Earnshaw wrote:
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
>>
>> R.
> 
> Looks to me it's not optimal to generate invalid mem addr, for example
> (mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r,
> r), imm),
> in the first place. Those complex rtx inside is hidden by the permissive
> memory_operand predication, and only exposed during reload by stricter
> constraints, then reload need to extra work. If we expose those complex rtx
> earlier then some earlier rtl pass may find more optimization
> opportunities, for
> example combine.
> 
> The following simple modification fix the ICE and generates best
> sequences to me:
> 
> - return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
> + addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
> + emit_insn (gen_rtx_SET (base, addr));
> + return base;
> 

That wouldn't be right either if op1 could be a const_int.

R.

>   67 add x1, x29, 48
>   68 add x1, x1, x0, sxtw 3
>   69 stlrx19, [x1]
> 
> instead of
> 
>   67 add x1, x29, 64
>   68 add x0, x1, x0, sxtw 3
>   69 sub x0, x0, #16
>   70 stlrx19, [x0]
> 
> or
> 
>   67 sxtwx0, w0
>   68 add x1, x29, 48
>   69 add x1, x1, x0, sxtw 3
>   70 stlrx19, [x1]
> 
> 
> 
> 
> 



Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Jiong Wang



On 24/11/15 10:18, Richard Earnshaw wrote:

I presume you are aware of the canonicalization rules for add?  That is,
for a shift-and-add operation, the shift operand must appear first.  Ie.

(plus (shift (op, op)), op)

not

(plus (op, (shift (op, op))

R.


Looks to me it's not optimal to generate invalid mem addr, for example
(mem (plus reg, (mult reg, imm))) or even the simple (mem (plus (plus r, 
r), imm),

in the first place. Those complex rtx inside is hidden by the permissive
memory_operand predication, and only exposed during reload by stricter
constraints, then reload need to extra work. If we expose those complex rtx
earlier then some earlier rtl pass may find more optimization 
opportunities, for

example combine.

The following simple modification fix the ICE and generates best 
sequences to me:


- return gen_rtx_fmt_ee (PLUS, addr_mode, base, op1);
+ addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
+ emit_insn (gen_rtx_SET (base, addr));
+ return base;

  67 add x1, x29, 48
  68 add x1, x1, x0, sxtw 3
  69 stlrx19, [x1]

instead of

  67 add x1, x29, 64
  68 add x0, x1, x0, sxtw 3
  69 sub x0, x0, #16
  70 stlrx19, [x0]

or

  67 sxtwx0, w0
  68 add x1, x29, 48
  69 add x1, x1, x0, sxtw 3
  70 stlrx19, [x1]







Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Richard Earnshaw
On 24/11/15 09:56, Richard Earnshaw wrote:
> On 24/11/15 02:51, Bin.Cheng wrote:
 The aarch64's problem is we don't define addptr3 pattern, and we don't
>> have direct insn pattern describing the "x + y << z".  According to
>> gcc internal:
>>
>> ‘addptrm3’
>> Like addm3 but is guaranteed to only be used for address calculations.
>> The expanded code is not allowed to clobber the condition code. It
>> only needs to be defined if addm3 sets the condition code.

 addm3 on aarch64 does not set the condition codes, so by this rule we
 shouldn't need to define this pattern.
>> Hi Richard,
>> I think that rule has a prerequisite that backend needs to support
>> register shifted addition in addm3 pattern.  
> 
> addm3 is a named pattern and its format is well defined.  It does not
> take a shifted operand and never has.
> 
>> Apparently for AArch64,
>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>> "does not set the condition codes" actually, because both
>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
> 
> You appear to be confusing named patterns (used by expand) with
> recognizers.  Anyway, we have
> 
> (define_insn "*add__"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>   (match_operand:QI 2
> "aarch64_shift_imm_" "n"))
>   (match_operand:GPI 3 "register_operand" "r")))]
> 
> Which is a non-flag setting add with shifted operand.
> 
>> Either way I think it is another backend issue, so do you approve that
>> I commit this patch now?
> 
> Not yet.  I think there's something fundamental amiss here.
> 
> BTW, it looks to me as though addptr3 should have exactly the same
> operand rules as add3 (documentation reads "like add3"), so a
> shifted operand shouldn't be supported there either.  If that isn't the
> case then that should be clearly called out in the documentation.
> 
> R.
> 

PS.

I presume you are aware of the canonicalization rules for add?  That is,
for a shift-and-add operation, the shift operand must appear first.  Ie.

(plus (shift (op, op)), op)

not

(plus (op, (shift (op, op))

R.


Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-24 Thread Richard Earnshaw
On 24/11/15 02:51, Bin.Cheng wrote:
>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>> >> have direct insn pattern describing the "x + y << z".  According to
>>> >> gcc internal:
>>> >>
>>> >> ‘addptrm3’
>>> >> Like addm3 but is guaranteed to only be used for address calculations.
>>> >> The expanded code is not allowed to clobber the condition code. It
>>> >> only needs to be defined if addm3 sets the condition code.
>> >
>> > addm3 on aarch64 does not set the condition codes, so by this rule we
>> > shouldn't need to define this pattern.
> Hi Richard,
> I think that rule has a prerequisite that backend needs to support
> register shifted addition in addm3 pattern.  

addm3 is a named pattern and its format is well defined.  It does not
take a shifted operand and never has.

> Apparently for AArch64,
> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
> "does not set the condition codes" actually, because both
> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.

You appear to be confusing named patterns (used by expand) with
recognizers.  Anyway, we have

(define_insn "*add__"
  [(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
  (match_operand:QI 2
"aarch64_shift_imm_" "n"))
  (match_operand:GPI 3 "register_operand" "r")))]

Which is a non-flag setting add with shifted operand.

> Either way I think it is another backend issue, so do you approve that
> I commit this patch now?

Not yet.  I think there's something fundamental amiss here.

BTW, it looks to me as though addptr3 should have exactly the same
operand rules as add3 (documentation reads "like add3"), so a
shifted operand shouldn't be supported there either.  If that isn't the
case then that should be clearly called out in the documentation.

R.


Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-23 Thread Bin.Cheng
On Sat, Nov 21, 2015 at 1:39 AM, Richard Earnshaw
 wrote:
> On 20/11/15 08:31, Bin.Cheng wrote:
>> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng  wrote:
>>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>>>  wrote:
 On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
> Hi,
> GIMPLE IVO needs to call backend interface to calculate costs for addr
> expressions like below:
>FORM1: "r73 + r74 + 16380"
>FORM2: "r73 << 2 + r74 + 16380"
>
> They are invalid address expression on AArch64, so will be legitimized by
> aarch64_legitimize_address.  Below are what we got from that function:
>
> For FORM1, the address expression is legitimized into below insn sequence
> and rtx:
>r84:DI=r73:DI+r74:DI
>r85:DI=r84:DI+0x3000
>r83:DI=r85:DI
>"r83 + 4092"
>
> For FORM2, the address expression is legitimized into below insn sequence
> and rtx:
>r108:DI=r73:DI<<0x2
>r109:DI=r108:DI+r74:DI
>r110:DI=r109:DI+0x3000
>r107:DI=r110:DI
>"r107 + 4092"
>
> So the costs computed are 12/16 respectively.  The high cost prevents IVO
> from choosing right candidates.  Besides cost computation, I also think 
> the
> legitmization is bad in terms of code generation.
> The root cause in aarch64_legitimize_address can be described by it's
> comment:
>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>   where mask is selected by alignment and size of the offset.
>   We try to pick as large a range for the offset as possible to
>   maximize the chance of a CSE.  However, for aligned addresses
>   we limit the range to 4k so that structures with different sized
>   elements are likely to use the same base.  */
> I think the split of CONST is intended for REG+CONST where the const 
> offset
> is not in the range of AArch64's addressing modes.  Unfortunately, it
> doesn't explicitly handle/reject "REG+REG+CONST" and 
> "REG+REG< when the CONST are in the range of addressing modes.  As a result, these 
> two
> cases fallthrough this logic, resulting in sub-optimal results.
>
> It's obvious we can do below legitimization:
> FORM1:
>r83:DI=r73:DI+r74:DI
>"r83 + 16380"
> FORM2:
>r107:DI=0x3ffc
>r106:DI=r74:DI+r107:DI
>   REG_EQUAL r74:DI+0x3ffc
>"r106 + r73 << 2"
>
> This patch handles these two cases as described.

 Thanks for the description, it made the patch very easy to review. I only
 have a style comment.

> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>
> 2015-11-04  Bin Cheng  
>   Jiong Wang  
>
>   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>   address expressions like REG+REG+CONST and REG+NON_REG+CONST.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..47875ac 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
> */, machine_mode mode)
>  {
>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>HOST_WIDE_INT base_offset;
> +  rtx op0 = XEXP (x,0);
> +
> +  if (GET_CODE (op0) == PLUS)
> + {
> +   rtx op0_ = XEXP (op0, 0);
> +   rtx op1_ = XEXP (op0, 1);

 I don't see this trailing _ on a variable name in many places in the source
 tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
 Can we pick a different name for op0_ and op1_?

> +
> +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
> +  reach here, the 'CONST' may be valid in which case we should
> +  not split.  */
> +   if (REG_P (op0_) && REG_P (op1_))
> + {
> +   machine_mode addr_mode = GET_MODE (op0);
> +   rtx addr = gen_reg_rtx (addr_mode);
> +
> +   rtx ret = plus_constant (addr_mode, addr, offset);
> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
> + {
> +   emit_insn (gen_adddi3 (addr, op0_, op1_));
> +   return ret;
> + }
> + }
> +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
> +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
> +  we split it into Y=REG+CONST, Y+NON_REG.  */
> +   else if (REG_P (op0_) || REG_P (op1_))
> + {
> +   machine_mode addr_mode = GET_MODE (op0);
> +   rtx addr = gen_reg_rtx (addr_mode);
> +
> +   /* Switch to make sure that register is in op0_.  */
> +   if (REG_P (op1_))
> + 

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-20 Thread Richard Earnshaw
On 20/11/15 08:31, Bin.Cheng wrote:
> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng  wrote:
>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>>  wrote:
>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
 Hi,
 GIMPLE IVO needs to call backend interface to calculate costs for addr
 expressions like below:
FORM1: "r73 + r74 + 16380"
FORM2: "r73 << 2 + r74 + 16380"

 They are invalid address expression on AArch64, so will be legitimized by
 aarch64_legitimize_address.  Below are what we got from that function:

 For FORM1, the address expression is legitimized into below insn sequence
 and rtx:
r84:DI=r73:DI+r74:DI
r85:DI=r84:DI+0x3000
r83:DI=r85:DI
"r83 + 4092"

 For FORM2, the address expression is legitimized into below insn sequence
 and rtx:
r108:DI=r73:DI<<0x2
r109:DI=r108:DI+r74:DI
r110:DI=r109:DI+0x3000
r107:DI=r110:DI
"r107 + 4092"

 So the costs computed are 12/16 respectively.  The high cost prevents IVO
 from choosing right candidates.  Besides cost computation, I also think the
 legitmization is bad in terms of code generation.
 The root cause in aarch64_legitimize_address can be described by it's
 comment:
/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
   where mask is selected by alignment and size of the offset.
   We try to pick as large a range for the offset as possible to
   maximize the chance of a CSE.  However, for aligned addresses
   we limit the range to 4k so that structures with different sized
   elements are likely to use the same base.  */
 I think the split of CONST is intended for REG+CONST where the const offset
 is not in the range of AArch64's addressing modes.  Unfortunately, it
 doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>>> when the CONST are in the range of addressing modes.  As a result, these 
 two
 cases fallthrough this logic, resulting in sub-optimal results.

 It's obvious we can do below legitimization:
 FORM1:
r83:DI=r73:DI+r74:DI
"r83 + 16380"
 FORM2:
r107:DI=0x3ffc
r106:DI=r74:DI+r107:DI
   REG_EQUAL r74:DI+0x3ffc
"r106 + r73 << 2"

 This patch handles these two cases as described.
>>>
>>> Thanks for the description, it made the patch very easy to review. I only
>>> have a style comment.
>>>
 Bootstrap & test on AArch64 along with other patch.  Is it OK?

 2015-11-04  Bin Cheng  
   Jiong Wang  

   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
   address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>>
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 5c8604f..47875ac 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
 */, machine_mode mode)
  {
HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
HOST_WIDE_INT base_offset;
 +  rtx op0 = XEXP (x,0);
 +
 +  if (GET_CODE (op0) == PLUS)
 + {
 +   rtx op0_ = XEXP (op0, 0);
 +   rtx op1_ = XEXP (op0, 1);
>>>
>>> I don't see this trailing _ on a variable name in many places in the source
>>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>>> Can we pick a different name for op0_ and op1_?
>>>
 +
 +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
 +  reach here, the 'CONST' may be valid in which case we should
 +  not split.  */
 +   if (REG_P (op0_) && REG_P (op1_))
 + {
 +   machine_mode addr_mode = GET_MODE (op0);
 +   rtx addr = gen_reg_rtx (addr_mode);
 +
 +   rtx ret = plus_constant (addr_mode, addr, offset);
 +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
 + {
 +   emit_insn (gen_adddi3 (addr, op0_, op1_));
 +   return ret;
 + }
 + }
 +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
 +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
 +  we split it into Y=REG+CONST, Y+NON_REG.  */
 +   else if (REG_P (op0_) || REG_P (op1_))
 + {
 +   machine_mode addr_mode = GET_MODE (op0);
 +   rtx addr = gen_reg_rtx (addr_mode);
 +
 +   /* Switch to make sure that register is in op0_.  */
 +   if (REG_P (op1_))
 + std::swap (op0_, op1_);
 +
 +   rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
 +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
 +

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-20 Thread Bin.Cheng
On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng  wrote:
> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>  wrote:
>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>>> Hi,
>>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>>> expressions like below:
>>>FORM1: "r73 + r74 + 16380"
>>>FORM2: "r73 << 2 + r74 + 16380"
>>>
>>> They are invalid address expression on AArch64, so will be legitimized by
>>> aarch64_legitimize_address.  Below are what we got from that function:
>>>
>>> For FORM1, the address expression is legitimized into below insn sequence
>>> and rtx:
>>>r84:DI=r73:DI+r74:DI
>>>r85:DI=r84:DI+0x3000
>>>r83:DI=r85:DI
>>>"r83 + 4092"
>>>
>>> For FORM2, the address expression is legitimized into below insn sequence
>>> and rtx:
>>>r108:DI=r73:DI<<0x2
>>>r109:DI=r108:DI+r74:DI
>>>r110:DI=r109:DI+0x3000
>>>r107:DI=r110:DI
>>>"r107 + 4092"
>>>
>>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>>> from choosing right candidates.  Besides cost computation, I also think the
>>> legitmization is bad in terms of code generation.
>>> The root cause in aarch64_legitimize_address can be described by it's
>>> comment:
>>>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>>   where mask is selected by alignment and size of the offset.
>>>   We try to pick as large a range for the offset as possible to
>>>   maximize the chance of a CSE.  However, for aligned addresses
>>>   we limit the range to 4k so that structures with different sized
>>>   elements are likely to use the same base.  */
>>> I think the split of CONST is intended for REG+CONST where the const offset
>>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>> when the CONST are in the range of addressing modes.  As a result, these two
>>> cases fallthrough this logic, resulting in sub-optimal results.
>>>
>>> It's obvious we can do below legitimization:
>>> FORM1:
>>>r83:DI=r73:DI+r74:DI
>>>"r83 + 16380"
>>> FORM2:
>>>r107:DI=0x3ffc
>>>r106:DI=r74:DI+r107:DI
>>>   REG_EQUAL r74:DI+0x3ffc
>>>"r106 + r73 << 2"
>>>
>>> This patch handles these two cases as described.
>>
>> Thanks for the description, it made the patch very easy to review. I only
>> have a style comment.
>>
>>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>>
>>> 2015-11-04  Bin Cheng  
>>>   Jiong Wang  
>>>
>>>   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>>   address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 5c8604f..47875ac 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>>> */, machine_mode mode)
>>>  {
>>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>>HOST_WIDE_INT base_offset;
>>> +  rtx op0 = XEXP (x,0);
>>> +
>>> +  if (GET_CODE (op0) == PLUS)
>>> + {
>>> +   rtx op0_ = XEXP (op0, 0);
>>> +   rtx op1_ = XEXP (op0, 1);
>>
>> I don't see this trailing _ on a variable name in many places in the source
>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>> Can we pick a different name for op0_ and op1_?
>>
>>> +
>>> +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>>> +  reach here, the 'CONST' may be valid in which case we should
>>> +  not split.  */
>>> +   if (REG_P (op0_) && REG_P (op1_))
>>> + {
>>> +   machine_mode addr_mode = GET_MODE (op0);
>>> +   rtx addr = gen_reg_rtx (addr_mode);
>>> +
>>> +   rtx ret = plus_constant (addr_mode, addr, offset);
>>> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>> + {
>>> +   emit_insn (gen_adddi3 (addr, op0_, op1_));
>>> +   return ret;
>>> + }
>>> + }
>>> +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>>> +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>>> +  we split it into Y=REG+CONST, Y+NON_REG.  */
>>> +   else if (REG_P (op0_) || REG_P (op1_))
>>> + {
>>> +   machine_mode addr_mode = GET_MODE (op0);
>>> +   rtx addr = gen_reg_rtx (addr_mode);
>>> +
>>> +   /* Switch to make sure that register is in op0_.  */
>>> +   if (REG_P (op1_))
>>> + std::swap (op0_, op1_);
>>> +
>>> +   rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>>> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
>>> + {
>>> +   addr = force_operand (plus_constant (addr_mode,
>>> +op0_, offset),
>>> +  

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-18 Thread Bin.Cheng
On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
 wrote:
> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
>> Hi,
>> GIMPLE IVO needs to call backend interface to calculate costs for addr
>> expressions like below:
>>FORM1: "r73 + r74 + 16380"
>>FORM2: "r73 << 2 + r74 + 16380"
>>
>> They are invalid address expression on AArch64, so will be legitimized by
>> aarch64_legitimize_address.  Below are what we got from that function:
>>
>> For FORM1, the address expression is legitimized into below insn sequence
>> and rtx:
>>r84:DI=r73:DI+r74:DI
>>r85:DI=r84:DI+0x3000
>>r83:DI=r85:DI
>>"r83 + 4092"
>>
>> For FORM2, the address expression is legitimized into below insn sequence
>> and rtx:
>>r108:DI=r73:DI<<0x2
>>r109:DI=r108:DI+r74:DI
>>r110:DI=r109:DI+0x3000
>>r107:DI=r110:DI
>>"r107 + 4092"
>>
>> So the costs computed are 12/16 respectively.  The high cost prevents IVO
>> from choosing right candidates.  Besides cost computation, I also think the
>> legitmization is bad in terms of code generation.
>> The root cause in aarch64_legitimize_address can be described by it's
>> comment:
>>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>>   where mask is selected by alignment and size of the offset.
>>   We try to pick as large a range for the offset as possible to
>>   maximize the chance of a CSE.  However, for aligned addresses
>>   we limit the range to 4k so that structures with different sized
>>   elements are likely to use the same base.  */
>> I think the split of CONST is intended for REG+CONST where the const offset
>> is not in the range of AArch64's addressing modes.  Unfortunately, it
>> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<> when the CONST are in the range of addressing modes.  As a result, these two
>> cases fallthrough this logic, resulting in sub-optimal results.
>>
>> It's obvious we can do below legitimization:
>> FORM1:
>>r83:DI=r73:DI+r74:DI
>>"r83 + 16380"
>> FORM2:
>>r107:DI=0x3ffc
>>r106:DI=r74:DI+r107:DI
>>   REG_EQUAL r74:DI+0x3ffc
>>"r106 + r73 << 2"
>>
>> This patch handles these two cases as described.
>
> Thanks for the description, it made the patch very easy to review. I only
> have a style comment.
>
>> Bootstrap & test on AArch64 along with other patch.  Is it OK?
>>
>> 2015-11-04  Bin Cheng  
>>   Jiong Wang  
>>
>>   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>>   address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 5c8604f..47875ac 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
>> machine_mode mode)
>>  {
>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>HOST_WIDE_INT base_offset;
>> +  rtx op0 = XEXP (x,0);
>> +
>> +  if (GET_CODE (op0) == PLUS)
>> + {
>> +   rtx op0_ = XEXP (op0, 0);
>> +   rtx op1_ = XEXP (op0, 1);
>
> I don't see this trailing _ on a variable name in many places in the source
> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
> Can we pick a different name for op0_ and op1_?
>
>> +
>> +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
>> +  reach here, the 'CONST' may be valid in which case we should
>> +  not split.  */
>> +   if (REG_P (op0_) && REG_P (op1_))
>> + {
>> +   machine_mode addr_mode = GET_MODE (op0);
>> +   rtx addr = gen_reg_rtx (addr_mode);
>> +
>> +   rtx ret = plus_constant (addr_mode, addr, offset);
>> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
>> + {
>> +   emit_insn (gen_adddi3 (addr, op0_, op1_));
>> +   return ret;
>> + }
>> + }
>> +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
>> +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
>> +  we split it into Y=REG+CONST, Y+NON_REG.  */
>> +   else if (REG_P (op0_) || REG_P (op1_))
>> + {
>> +   machine_mode addr_mode = GET_MODE (op0);
>> +   rtx addr = gen_reg_rtx (addr_mode);
>> +
>> +   /* Switch to make sure that register is in op0_.  */
>> +   if (REG_P (op1_))
>> + std::swap (op0_, op1_);
>> +
>> +   rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
>> + {
>> +   addr = force_operand (plus_constant (addr_mode,
>> +op0_, offset),
>> + NULL_RTX);
>> +   ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
>> +   return ret;
>> + }
>
> The 

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-17 Thread James Greenhalgh
On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
> Hi,
> GIMPLE IVO needs to call backend interface to calculate costs for addr
> expressions like below:
>FORM1: "r73 + r74 + 16380"
>FORM2: "r73 << 2 + r74 + 16380"
> 
> They are invalid address expression on AArch64, so will be legitimized by
> aarch64_legitimize_address.  Below are what we got from that function:
> 
> For FORM1, the address expression is legitimized into below insn sequence
> and rtx:
>r84:DI=r73:DI+r74:DI
>r85:DI=r84:DI+0x3000
>r83:DI=r85:DI
>"r83 + 4092"
> 
> For FORM2, the address expression is legitimized into below insn sequence
> and rtx:
>r108:DI=r73:DI<<0x2
>r109:DI=r108:DI+r74:DI
>r110:DI=r109:DI+0x3000
>r107:DI=r110:DI
>"r107 + 4092"
> 
> So the costs computed are 12/16 respectively.  The high cost prevents IVO
> from choosing right candidates.  Besides cost computation, I also think the
> legitmization is bad in terms of code generation.
> The root cause in aarch64_legitimize_address can be described by it's
> comment:
>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
>   where mask is selected by alignment and size of the offset.
>   We try to pick as large a range for the offset as possible to
>   maximize the chance of a CSE.  However, for aligned addresses
>   we limit the range to 4k so that structures with different sized
>   elements are likely to use the same base.  */
> I think the split of CONST is intended for REG+CONST where the const offset
> is not in the range of AArch64's addressing modes.  Unfortunately, it
> doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG< when the CONST are in the range of addressing modes.  As a result, these two
> cases fallthrough this logic, resulting in sub-optimal results.
> 
> It's obvious we can do below legitimization:
> FORM1:
>r83:DI=r73:DI+r74:DI
>"r83 + 16380"
> FORM2:
>r107:DI=0x3ffc
>r106:DI=r74:DI+r107:DI
>   REG_EQUAL r74:DI+0x3ffc
>"r106 + r73 << 2"
> 
> This patch handles these two cases as described.

Thanks for the description, it made the patch very easy to review. I only
have a style comment.

> Bootstrap & test on AArch64 along with other patch.  Is it OK?
> 
> 2015-11-04  Bin Cheng  
>   Jiong Wang  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
>   address expressions like REG+REG+CONST and REG+NON_REG+CONST.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..47875ac 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
> machine_mode mode)
>  {
>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>HOST_WIDE_INT base_offset;
> +  rtx op0 = XEXP (x,0);
> +
> +  if (GET_CODE (op0) == PLUS)
> + {
> +   rtx op0_ = XEXP (op0, 0);
> +   rtx op1_ = XEXP (op0, 1);

I don't see this trailing _ on a variable name in many places in the source
tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
Can we pick a different name for op0_ and op1_?

> +
> +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
> +  reach here, the 'CONST' may be valid in which case we should
> +  not split.  */
> +   if (REG_P (op0_) && REG_P (op1_))
> + {
> +   machine_mode addr_mode = GET_MODE (op0);
> +   rtx addr = gen_reg_rtx (addr_mode);
> +
> +   rtx ret = plus_constant (addr_mode, addr, offset);
> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
> + {
> +   emit_insn (gen_adddi3 (addr, op0_, op1_));
> +   return ret;
> + }
> + }
> +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
> +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
> +  we split it into Y=REG+CONST, Y+NON_REG.  */
> +   else if (REG_P (op0_) || REG_P (op1_))
> + {
> +   machine_mode addr_mode = GET_MODE (op0);
> +   rtx addr = gen_reg_rtx (addr_mode);
> +
> +   /* Switch to make sure that register is in op0_.  */
> +   if (REG_P (op1_))
> + std::swap (op0_, op1_);
> +
> +   rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
> +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
> + {
> +   addr = force_operand (plus_constant (addr_mode,
> +op0_, offset),
> + NULL_RTX);
> +   ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
> +   return ret;
> + }

The logic here is a bit hairy to follow, you construct a PLUS RTX to check
aarch64_legitimate_address_hook_p, then construct a different PLUS RTX
to use as the return value. This can pr

[PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-17 Thread Bin Cheng
Hi,
GIMPLE IVO needs to call backend interface to calculate costs for addr
expressions like below:
   FORM1: "r73 + r74 + 16380"
   FORM2: "r73 << 2 + r74 + 16380"

They are invalid address expression on AArch64, so will be legitimized by
aarch64_legitimize_address.  Below are what we got from that function:

For FORM1, the address expression is legitimized into below insn sequence
and rtx:
   r84:DI=r73:DI+r74:DI
   r85:DI=r84:DI+0x3000
   r83:DI=r85:DI
   "r83 + 4092"

For FORM2, the address expression is legitimized into below insn sequence
and rtx:
   r108:DI=r73:DI<<0x2
   r109:DI=r108:DI+r74:DI
   r110:DI=r109:DI+0x3000
   r107:DI=r110:DI
   "r107 + 4092"

So the costs computed are 12/16 respectively.  The high cost prevents IVO
from choosing right candidates.  Besides cost computation, I also think the
legitmization is bad in terms of code generation.
The root cause in aarch64_legitimize_address can be described by it's
comment:
   /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
  where mask is selected by alignment and size of the offset.
  We try to pick as large a range for the offset as possible to
  maximize the chance of a CSE.  However, for aligned addresses
  we limit the range to 4k so that structures with different sized
  elements are likely to use the same base.  */
I think the split of CONST is intended for REG+CONST where the const offset
is not in the range of AArch64's addressing modes.  Unfortunately, it
doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<
Jiong Wang  

* config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
address expressions like REG+REG+CONST and REG+NON_REG+CONST.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..47875ac 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)
 {
   HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
   HOST_WIDE_INT base_offset;
+  rtx op0 = XEXP (x,0);
+
+  if (GET_CODE (op0) == PLUS)
+   {
+ rtx op0_ = XEXP (op0, 0);
+ rtx op1_ = XEXP (op0, 1);
+
+ /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
+reach here, the 'CONST' may be valid in which case we should
+not split.  */
+ if (REG_P (op0_) && REG_P (op1_))
+   {
+ machine_mode addr_mode = GET_MODE (op0);
+ rtx addr = gen_reg_rtx (addr_mode);
+
+ rtx ret = plus_constant (addr_mode, addr, offset);
+ if (aarch64_legitimate_address_hook_p (mode, ret, false))
+   {
+ emit_insn (gen_adddi3 (addr, op0_, op1_));
+ return ret;
+   }
+   }
+ /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
+will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
+we split it into Y=REG+CONST, Y+NON_REG.  */
+ else if (REG_P (op0_) || REG_P (op1_))
+   {
+ machine_mode addr_mode = GET_MODE (op0);
+ rtx addr = gen_reg_rtx (addr_mode);
+
+ /* Switch to make sure that register is in op0_.  */
+ if (REG_P (op1_))
+   std::swap (op0_, op1_);
+
+ rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
+ if (aarch64_legitimate_address_hook_p (mode, ret, false))
+   {
+ addr = force_operand (plus_constant (addr_mode,
+  op0_, offset),
+   NULL_RTX);
+ ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_);
+ return ret;
+   }
+   }
+   }
 
   /* Does it look like we'll need a load/store-pair operation?  */
   if (GET_MODE_SIZE (mode) > 16