Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-23 Thread Christophe Lyon
On 23 November 2016 at 17:30, Michael Collison  wrote:
> Hi Christophe,
>
> This is not a regression per se; the patch causes the test case to generate 
> one less instruction overall, but one additional 'and'. Trunk before the 
> patch (-O2):
>
> foo:
> and w0, w0, 255
> lsl w1, w0, 20
> orr w0, w1, w0, lsl 8
> mov w1, 65024
> movkw1, 0xfe0, lsl 16
> and w0, w0, w1
> ret
>
> The new trunk aarch64 compiler with the patch generates fewer instructions 
> but one additional 'and'
>
> foo:
> and w0, w0, 255
> lsl w1, w0, 20
> orr w0, w1, w0, lsl 8
> and x0, x0, 268434944
> and x0, x0, -2031617
> ret
>

OK,
by "regression" I meant that a test used to pass and now fails.
It looks like you have to update it to take into account the new, better code.

Christophe

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Wednesday, November 23, 2016 6:42 AM
> To: James Greenhalgh
> Cc: Michael Collison; gcc-patches@gcc.gnu.org; nd
> Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions
>
> Hi Michael,
>
>
> On 21 November 2016 at 10:52, James Greenhalgh  
> wrote:
>> On Fri, Nov 18, 2016 at 07:41:58AM +, Michael Collison wrote:
>>> James,
>>>
>>> I incorporated all your suggestions, and successfully bootstrapped
>>> and re-ran the testsuite.
>>>
>>> Okay for trunk?
>>>
>>> 2016-11-18  Michael Collison  
>>>
>>>   * config/aarch64/aarch64-protos.h
>>>   (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>>   (aarch64_and_bitmask_imm): New prototypes
>>>   * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>>   New overloaded function to create bit mask covering the
>>>   lowest to highest bits set.
>>>   (aarch64_and_split_imm2): New overloaded functions to create bit
>>>   mask of zeros between first and last bit set.
>>>   (aarch64_and_bitmask_imm): New function to determine if a integer
>>>   is a valid two instruction "and" operation.
>>>   * config/aarch64/aarch64.md:(and3): New define_insn and _split
>>>   allowing wider range of constants with "and" operations.
>>>   * (ior3, xor3): Use new LOGICAL2 iterator to prevent
>>>   "and" operator from matching restricted constant range used for
>>>   ior and xor operators.
>>>   * config/aarch64/constraints.md (UsO constraint): New SImode 
>>> constraint
>>>   for constants in "and" operantions.
>>>   (UsP constraint): New DImode constraint for constants in "and" 
>>> operations.
>>>   * config/aarch64/iterators.md (lconst2): New mode iterator.
>>>   (LOGICAL2): New code iterator.
>>>   * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>>   predicate
>>>   (aarch64_logical_and_operand): New predicate allowing extended 
>>> constants
>>>   for "and" operations.
>>>   * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>>   additional constants are recognized and fewer instructions generated.
>>>   * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>>   additional constants are recognized and fewer instructions generated.
>>>
>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index 3cdd69b..7ef8cdf 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>> unsigned);  int aarch64_get_condition_code (rtx);  bool
>>> aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT
>>> +val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2
>>> +(HOST_WIDE_INT val_in); bool aarch64_and_bitmask_imm (unsigned
>>> +HOST_WIDE_INT val_in, machine_mode mode);
>>>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type
>>> aarch64_classify_symbolic_expression (rtx);  bool
>>> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git
>>> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch

RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-23 Thread Michael Collison
Hi Christophe,

This is not a regression per se; the patch causes the test case to generate one 
less instruction overall, but one additional 'and'. Trunk before the patch 
(-O2):

foo:
and w0, w0, 255
lsl w1, w0, 20
orr w0, w1, w0, lsl 8
mov w1, 65024
movkw1, 0xfe0, lsl 16
and w0, w0, w1
ret

The new trunk aarch64 compiler with the patch generates fewer instructions but 
one additional 'and'

foo:
and w0, w0, 255
lsl w1, w0, 20
orr w0, w1, w0, lsl 8
and x0, x0, 268434944
and x0, x0, -2031617
ret

-Original Message-
From: Christophe Lyon [mailto:christophe.l...@linaro.org] 
Sent: Wednesday, November 23, 2016 6:42 AM
To: James Greenhalgh
Cc: Michael Collison; gcc-patches@gcc.gnu.org; nd
Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

Hi Michael,


On 21 November 2016 at 10:52, James Greenhalgh  wrote:
> On Fri, Nov 18, 2016 at 07:41:58AM +, Michael Collison wrote:
>> James,
>>
>> I incorporated all your suggestions, and successfully bootstrapped 
>> and re-ran the testsuite.
>>
>> Okay for trunk?
>>
>> 2016-11-18  Michael Collison  
>>
>>   * config/aarch64/aarch64-protos.h
>>   (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>   (aarch64_and_bitmask_imm): New prototypes
>>   * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>   New overloaded function to create bit mask covering the
>>   lowest to highest bits set.
>>   (aarch64_and_split_imm2): New overloaded functions to create bit
>>   mask of zeros between first and last bit set.
>>   (aarch64_and_bitmask_imm): New function to determine if a integer
>>   is a valid two instruction "and" operation.
>>   * config/aarch64/aarch64.md:(and3): New define_insn and _split
>>   allowing wider range of constants with "and" operations.
>>   * (ior3, xor3): Use new LOGICAL2 iterator to prevent
>>   "and" operator from matching restricted constant range used for
>>   ior and xor operators.
>>   * config/aarch64/constraints.md (UsO constraint): New SImode constraint
>>   for constants in "and" operantions.
>>   (UsP constraint): New DImode constraint for constants in "and" 
>> operations.
>>   * config/aarch64/iterators.md (lconst2): New mode iterator.
>>   (LOGICAL2): New code iterator.
>>   * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>   predicate
>>   (aarch64_logical_and_operand): New predicate allowing extended 
>> constants
>>   for "and" operations.
>>   * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>   additional constants are recognized and fewer instructions generated.
>>   * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>   additional constants are recognized and fewer instructions generated.
>>
>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 3cdd69b..7ef8cdf 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;  
>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, 
>> unsigned);  int aarch64_get_condition_code (rtx);  bool 
>> aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT 
>> +val_in); unsigned HOST_WIDE_INT aarch64_and_split_imm2 
>> +(HOST_WIDE_INT val_in); bool aarch64_and_bitmask_imm (unsigned 
>> +HOST_WIDE_INT val_in, machine_mode mode);
>>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type 
>> aarch64_classify_symbolic_expression (rtx);  bool 
>> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git 
>> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 
>> 3e663eb..8e33c40 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
>> machine_mode mode)
>>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];  
>> }
>>
>> +/* Create mask of ones, covering the lowest to highest bits set in 
>> +VAL_IN.  */
>> +
>> +unsigned HOST_WIDE_INT
>> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
>> +  int lowest_bit_set = ctz_hwi (val_in);
>> +  int highest_bit_set = floor_log2 (val_in);
>> +  gcc_assert (val_in != 0);
>
> The comment above the function should make this precondition clear. Or 
> you could pick a well defined behaviour for when val_in == 0 (return 
> 0?), if that fits the rest of the algorithm.
>
> Otherwise, this patch looks OK to me.
>
> Thanks,
> James
>

This patch (r242739) causes a regression on aarch64:

  gcc.dg/zero_bits_compound-2.c scan-assembler-times \\(and: 2 now fails.

Christophe.


Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-23 Thread Christophe Lyon
Hi Michael,


On 21 November 2016 at 10:52, James Greenhalgh  wrote:
> On Fri, Nov 18, 2016 at 07:41:58AM +, Michael Collison wrote:
>> James,
>>
>> I incorporated all your suggestions, and successfully bootstrapped and re-ran
>> the testsuite.
>>
>> Okay for trunk?
>>
>> 2016-11-18  Michael Collison  
>>
>>   * config/aarch64/aarch64-protos.h
>>   (aarch64_and_split_imm1, aarch64_and_split_imm2)
>>   (aarch64_and_bitmask_imm): New prototypes
>>   * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>>   New overloaded function to create bit mask covering the
>>   lowest to highest bits set.
>>   (aarch64_and_split_imm2): New overloaded functions to create bit
>>   mask of zeros between first and last bit set.
>>   (aarch64_and_bitmask_imm): New function to determine if a integer
>>   is a valid two instruction "and" operation.
>>   * config/aarch64/aarch64.md:(and3): New define_insn and _split
>>   allowing wider range of constants with "and" operations.
>>   * (ior3, xor3): Use new LOGICAL2 iterator to prevent
>>   "and" operator from matching restricted constant range used for
>>   ior and xor operators.
>>   * config/aarch64/constraints.md (UsO constraint): New SImode constraint
>>   for constants in "and" operantions.
>>   (UsP constraint): New DImode constraint for constants in "and" 
>> operations.
>>   * config/aarch64/iterators.md (lconst2): New mode iterator.
>>   (LOGICAL2): New code iterator.
>>   * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>>   predicate
>>   (aarch64_logical_and_operand): New predicate allowing extended 
>> constants
>>   for "and" operations.
>>   * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>>   additional constants are recognized and fewer instructions generated.
>>   * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>>   additional constants are recognized and fewer instructions generated.
>>
>
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 3cdd69b..7ef8cdf 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>>  int aarch64_get_condition_code (rtx);
>>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
>> mode);
>>  int aarch64_branch_cost (bool, bool);
>>  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
>>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3e663eb..8e33c40 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
>> machine_mode mode)
>>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>>  }
>>
>> +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.  
>> */
>> +
>> +unsigned HOST_WIDE_INT
>> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in)
>> +{
>> +  int lowest_bit_set = ctz_hwi (val_in);
>> +  int highest_bit_set = floor_log2 (val_in);
>> +  gcc_assert (val_in != 0);
>
> The comment above the function should make this precondition clear. Or you
> could pick a well defined behaviour for when val_in == 0 (return 0?), if
> that fits the rest of the algorithm.
>
> Otherwise, this patch looks OK to me.
>
> Thanks,
> James
>

This patch (r242739) causes a regression on aarch64:

  gcc.dg/zero_bits_compound-2.c scan-assembler-times \\(and: 2
now fails.

Christophe.


Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-21 Thread James Greenhalgh
On Fri, Nov 18, 2016 at 07:41:58AM +, Michael Collison wrote:
> James,
> 
> I incorporated all your suggestions, and successfully bootstrapped and re-ran
> the testsuite.
> 
> Okay for trunk?
> 
> 2016-11-18  Michael Collison  
> 
>   * config/aarch64/aarch64-protos.h
>   (aarch64_and_split_imm1, aarch64_and_split_imm2)
>   (aarch64_and_bitmask_imm): New prototypes
>   * config/aarch64/aarch64.c (aarch64_and_split_imm1):
>   New overloaded function to create bit mask covering the
>   lowest to highest bits set.
>   (aarch64_and_split_imm2): New overloaded functions to create bit
>   mask of zeros between first and last bit set.
>   (aarch64_and_bitmask_imm): New function to determine if a integer
>   is a valid two instruction "and" operation.
>   * config/aarch64/aarch64.md:(and3): New define_insn and _split
>   allowing wider range of constants with "and" operations.
>   * (ior3, xor3): Use new LOGICAL2 iterator to prevent
>   "and" operator from matching restricted constant range used for
>   ior and xor operators.
>   * config/aarch64/constraints.md (UsO constraint): New SImode constraint
>   for constants in "and" operantions.
>   (UsP constraint): New DImode constraint for constants in "and" 
> operations.
>   * config/aarch64/iterators.md (lconst2): New mode iterator.
>   (LOGICAL2): New code iterator.
>   * config/aarch64/predicates.md (aarch64_logical_and_immediate): New
>   predicate
>   (aarch64_logical_and_operand): New predicate allowing extended constants
>   for "and" operations.
>   * testsuite/gcc.target/aarch64/and_const.c: New test to verify
>   additional constants are recognized and fewer instructions generated.
>   * testsuite/gcc.target/aarch64/and_const2.c: New test to verify
>   additional constants are recognized and fewer instructions generated.
> 

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..7ef8cdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
> mode);
>  int aarch64_branch_cost (bool, bool);
>  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3e663eb..8e33c40 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3600,6 +3600,43 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
> machine_mode mode)
>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>  }
>  
> +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.  
> */
> +
> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in)
> +{
> +  int lowest_bit_set = ctz_hwi (val_in);
> +  int highest_bit_set = floor_log2 (val_in);
> +  gcc_assert (val_in != 0);

The comment above the function should make this precondition clear. Or you
could pick a well defined behaviour for when val_in == 0 (return 0?), if
that fits the rest of the algorithm.

Otherwise, this patch looks OK to me.

Thanks,
James 



RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-17 Thread Michael Collison
James,

I incorporated all your suggestions, and successfully bootstrapped and re-ran 
the testsuite.

Okay for trunk?

2016-11-18  Michael Collison  

* config/aarch64/aarch64-protos.h
(aarch64_and_split_imm1, aarch64_and_split_imm2)
(aarch64_and_bitmask_imm): New prototypes
* config/aarch64/aarch64.c (aarch64_and_split_imm1):
New overloaded function to create bit mask covering the
lowest to highest bits set.
(aarch64_and_split_imm2): New overloaded functions to create bit
mask of zeros between first and last bit set.
(aarch64_and_bitmask_imm): New function to determine if a integer
is a valid two instruction "and" operation.
* config/aarch64/aarch64.md:(and3): New define_insn and _split
allowing wider range of constants with "and" operations.
* (ior3, xor3): Use new LOGICAL2 iterator to prevent
"and" operator from matching restricted constant range used for
ior and xor operators.
* config/aarch64/constraints.md (UsO constraint): New SImode constraint
for constants in "and" operantions.
(UsP constraint): New DImode constraint for constants in "and" 
operations.
* config/aarch64/iterators.md (lconst2): New mode iterator.
(LOGICAL2): New code iterator.
* config/aarch64/predicates.md (aarch64_logical_and_immediate): New
predicate
(aarch64_logical_and_operand): New predicate allowing extended constants
for "and" operations.
* testsuite/gcc.target/aarch64/and_const.c: New test to verify
additional constants are recognized and fewer instructions generated.
* testsuite/gcc.target/aarch64/and_const2.c: New test to verify
additional constants are recognized and fewer instructions generated.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Thursday, November 17, 2016 9:26 AM
To: Michael Collison
Cc: gcc-patches@gcc.gnu.org; nd
Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

On Thu, Oct 27, 2016 at 08:44:02PM +, Michael Collison wrote:
> This patch is designed to improve code generation for "and" 
> instructions with certain immediate operands.
> 
> For the following test case:
> 
> int f2(int x)
> {
>x &= 0x0ff8;
> 
>x &= 0xff001fff;
> 
>return x;
> }
> 
> the trunk aarch64 compiler generates:
> 
> mov   w1, 8184
> movk  w1, 0xf00, lsl 16
> and   w0, w0, w1
> 
> We can generate one fewer instruction if we recognize certain 
> constants. With the attached patch the current trunk compiler generates:
> 
> and   w0, w0, 268435448
> and   w0, w0, -16769025
> 
> Bootstrapped and make check successfully completed with no regressions 
> on aarch64-linux-gnu.
> 
> Okay for trunk?

Hi Michael,

I have some minor comments in line.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..7ef8cdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;  
> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);  
> int aarch64_get_condition_code (rtx);  bool aarch64_bitmask_imm 
> (HOST_WIDE_INT val, machine_mode);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); 
> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); 
> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, 
> +machine_mode mode);
>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type 
> aarch64_classify_symbolic_expression (rtx);  bool 
> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git 
> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 
> 3e663eb..db82c5c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
> machine_mode mode)
>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];  }
>  
> +/* Create mask of ones, covering the lowest to highest bits set in 
> +VAL_IN.  */
> +
> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
> +  int first_bit_set = ctz_hwi (val_in);
> +  int last_bit_set = floor_log2 (val_in);
> +  gcc_assert (first_bit_set < last_bit_set);

This assert can never trigger (by definition) unless VAL_IN == 0 (in which case 
floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case you're 
testing for, then this assert would be more explicit as

 gcc_assert (val_in != 0)

I'd suggest renaming these to lowest_bit_set and hi

Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-17 Thread James Greenhalgh
On Thu, Oct 27, 2016 at 08:44:02PM +, Michael Collison wrote:
> This patch is designed to improve code generation for "and" instructions with
> certain immediate operands.
> 
> For the following test case:
> 
> int f2(int x)
> {
>x &= 0x0ff8;
> 
>x &= 0xff001fff;
> 
>return x;
> }
> 
> the trunk aarch64 compiler generates:
> 
> mov   w1, 8184
> movk  w1, 0xf00, lsl 16
> and   w0, w0, w1
> 
> We can generate one fewer instruction if we recognize certain constants. With
> the attached patch the current trunk compiler generates:
> 
> and   w0, w0, 268435448
> and   w0, w0, -16769025
> 
> Bootstrapped and make check successfully completed with no regressions on
> aarch64-linux-gnu.
> 
> Okay for trunk?

Hi Michael,

I have some minor comments in line.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..7ef8cdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
> mode);
>  int aarch64_branch_cost (bool, bool);
>  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3e663eb..db82c5c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
> machine_mode mode)
>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>  }
>  
> +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.  
> */
> +
> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in)
> +{
> +  int first_bit_set = ctz_hwi (val_in);
> +  int last_bit_set = floor_log2 (val_in);
> +  gcc_assert (first_bit_set < last_bit_set);

This assert can never trigger (by definition) unless VAL_IN == 0 (in which
case floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case
you're testing for, then this assert would be more explicit as

 gcc_assert (val_in != 0)

I'd suggest renaming these to lowest_bit_set and highest_bit_set, as first
and last are ambiguous (you have them the opposite way round from what
I'd consider first [highest, thus leading bits] and last
[lowest/trailing bits]).

> +
> +  return ((HOST_WIDE_INT_UC (2) << last_bit_set) -
> +   (HOST_WIDE_INT_1U << first_bit_set));
> +}
> +
> +/* Create constant with zero bits between first and last bit set and one
> +   bits elsewhere.  */

I can't parse this comment in relation to what the code below does.
Looking at the code, you're building a new constant where the bits between
the first and last bits are unmodified, and all other bits are set to one.

i.e. for  1010 1000 you'd generate  1010 .

> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm2 (HOST_WIDE_INT val_in)
> +{
> +  return val_in | ~aarch64_and_split_imm1 (val_in);
> +}
> +
> +/* Return true if VAL_IN is a valid 'and' bitmask immediate.  */
> +
> +bool
> +aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode)
> +{
> +  if (aarch64_bitmask_imm (val_in, mode))
> +return false;
> +
> +  if (aarch64_move_imm (val_in, mode))
> +return false;
> +
> +  unsigned HOST_WIDE_INT imm2 = aarch64_and_split_imm2 (val_in);
> +
> +  if (!aarch64_bitmask_imm (imm2, mode))
> +return false;
> +
> +  return true;

You can replace these four lines with:

  return aarch64_bitmask_imm (imm2, mode);

> +}
>  
>  /* Return true if val is an immediate that can be loaded into a
> register in a single instruction.  */

> diff --git a/gcc/testsuite/gcc.target/aarch64/and_const.c 
> b/gcc/testsuite/gcc.target/aarch64/and_const.c
> new file mode 100644
> index 000..9c377d7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/and_const.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int f2 (int x)
> +{
> +   x &= 0x0ff8;
> +
> +   x &= 0xff001fff;
> +
> +   return x;
> +}

It would be good to have a test for the DImode cases too (using long long).

Thanks,
James