[02/nn] Add more vec_duplicate simplifications

2017-10-23 Thread Richard Sandiford
This patch adds a vec_duplicate_p helper that tests for constant
or non-constant vector duplicates.  Together with the existing
const_vec_duplicate_p, this complements the gen_vec_duplicate
and gen_const_vec_duplicate added by a previous patch.

The patch uses the new routines to add more rtx simplifications
involving vector duplicates.  These mirror simplifications that
we already do for CONST_VECTOR broadcasts and are needed for
variable-length SVE, which uses:

  (const:M (vec_duplicate:M X))

to represent constant broadcasts instead.  The simplifications do
trigger on the testsuite for variable duplicates too, and in each
case I saw the change was an improvement.  E.g.:

- Several targets had this simplification in gcc.dg/pr49948.c
  when compiled at -O3:

-Failed to match this instruction:
+Successfully matched this instruction:
 (set (reg:DI 88)
-(subreg:DI (vec_duplicate:V2DI (reg/f:DI 75 [ _4 ])) 0))
+(reg/f:DI 75 [ _4 ]))

  On aarch64 this gives:

ret
.p2align 2
 .L8:
+   adrpx1, b
sub sp, sp, #80
-   adrpx2, b
-   add x1, sp, 12
+   add x2, sp, 12
str wzr, [x0, #:lo12:a]
+   str x2, [x1, #:lo12:b]
mov w0, 0
-   dup v0.2d, x1
-   str d0, [x2, #:lo12:b]
add sp, sp, 80
ret
.size   foo, .-foo

  On x86_64:

jg  .L2
leaq-76(%rsp), %rax
movl$0, a(%rip)
-   movq%rax, -96(%rsp)
-   movq-96(%rsp), %xmm0
-   punpcklqdq  %xmm0, %xmm0
-   movq%xmm0, b(%rip)
+   movq%rax, b(%rip)
 .L2:
xorl%eax, %eax
ret

  etc.

- gcc.dg/torture/pr58018.c compiled at -O3 on aarch64 has an instance of:

 Trying 50, 52, 46 -> 53:
 Failed to match this instruction:
 (set (reg:V4SI 167)
-(and:V4SI (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
-(reg:V4SI 209))
-(const_vector:V4SI [
-(const_int 1 [0x1])
-(const_int 1 [0x1])
-(const_int 1 [0x1])
-(const_int 1 [0x1])
-])))
+(and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
+(reg:V4SI 209)))
 Successfully matched this instruction:
 (set (reg:V4SI 163 [ vect_patt_16.14 ])
 (vec_duplicate:V4SI (reg:SI 132 [ _165 ])))
+Successfully matched this instruction:
+(set (reg:V4SI 167)
+(and:V4SI (reg:V4SI 163 [ vect_patt_16.14 ])
+(reg:V4SI 209)))

  where (reg:SI 132) is the result of a scalar comparison and so
  is known to be 0 or 1.  This saves a MOVI and vector AND:

cmp w7, 4
bls .L15
dup v1.4s, w2
-   lsr w2, w1, 2
+   dup v2.4s, w6
moviv3.4s, 0
-   mov w0, 0
-   moviv2.4s, 0x1
+   lsr w2, w1, 2
mvniv0.4s, 0
+   mov w0, 0
cmgev1.4s, v1.4s, v3.4s
and v1.16b, v2.16b, v1.16b
-   dup v2.4s, w6
-   and v1.16b, v1.16b, v2.16b
.p2align 3
 .L7:
and v0.16b, v0.16b, v1.16b

- powerpc64le has many instances of things like:

-Failed to match this instruction:
+Successfully matched this instruction:
 (set (reg:V4SI 161 [ vect_cst__24 ])
-(vec_select:V4SI (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
-(parallel [
-(const_int 0 [0])
-])))
-(parallel [
-(const_int 2 [0x2])
-(const_int 3 [0x3])
-(const_int 0 [0])
-(const_int 1 [0x1])
-])))
+(vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
+(parallel [
+(const_int 0 [0])
+]

  This removes redundant XXPERMDIs from many tests.

The best way of testing the new simplifications seemed to be
via selftests.  The patch cribs part of David's patch here:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .


2017-10-23  Richard Sandiford  
David Malcolm  
Alan Hayward  
David Sherwood  

gcc/
* rtl.h (vec_duplicate_p): New function.
* selftest-rtl.c (assert_rtx_eq_at): New function.
* selftest-rtl.h (ASSERT_RTX_EQ): New macro.
(assert_rtx_eq_at): Declare.
* selftest.h (selftest::simplify_rtx_c_tests): Declare.
* selftest-run-tests.c (selftest::run_tests): Call it.
* simplify-rtx.c: Include selftest.h and selftest-rtl.h.
(simplify_unary_operation_1): Recursively handle vector duplicates.
(simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
vector dupl

Re: [02/nn] Add more vec_duplicate simplifications

2017-10-25 Thread Jeff Law
On 10/23/2017 05:17 AM, Richard Sandiford wrote:
> This patch adds a vec_duplicate_p helper that tests for constant
> or non-constant vector duplicates.  Together with the existing
> const_vec_duplicate_p, this complements the gen_vec_duplicate
> and gen_const_vec_duplicate added by a previous patch.
> 
> The patch uses the new routines to add more rtx simplifications
> involving vector duplicates.  These mirror simplifications that
> we already do for CONST_VECTOR broadcasts and are needed for
> variable-length SVE, which uses:
> 
>   (const:M (vec_duplicate:M X))
> 
> to represent constant broadcasts instead.  The simplifications do
> trigger on the testsuite for variable duplicates too, and in each
> case I saw the change was an improvement.  E.g.:
> 
[ snip ]

> 
> The best way of testing the new simplifications seemed to be
> via selftests.  The patch cribs part of David's patch here:
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
Cool.  I really wish I had more time to promote David's work by adding
selftests to various things.  There's certainly cases where it's the
most direct and useful way to test certain bits of lower level
infrastructure we have.  Glad to see you found it useful here.



> 
> 
> 2017-10-23  Richard Sandiford  
>   David Malcolm  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (vec_duplicate_p): New function.
>   * selftest-rtl.c (assert_rtx_eq_at): New function.
>   * selftest-rtl.h (ASSERT_RTX_EQ): New macro.
>   (assert_rtx_eq_at): Declare.
>   * selftest.h (selftest::simplify_rtx_c_tests): Declare.
>   * selftest-run-tests.c (selftest::run_tests): Call it.
>   * simplify-rtx.c: Include selftest.h and selftest-rtl.h.
>   (simplify_unary_operation_1): Recursively handle vector duplicates.
>   (simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
>   vector duplicates.
>   (simplify_subreg): Handle subregs of vector duplicates.
>   (make_test_reg, test_vector_ops_duplicate, test_vector_ops)
>   (selftest::simplify_rtx_c_tests): New functions.
Thanks for the examples of how this affects various targets.  Seems like
it ought to be a consistent win when they trigger.

jeff


Re: [02/nn] Add more vec_duplicate simplifications

2017-11-10 Thread Christophe Lyon
On 25 October 2017 at 18:29, Jeff Law  wrote:
> On 10/23/2017 05:17 AM, Richard Sandiford wrote:
>> This patch adds a vec_duplicate_p helper that tests for constant
>> or non-constant vector duplicates.  Together with the existing
>> const_vec_duplicate_p, this complements the gen_vec_duplicate
>> and gen_const_vec_duplicate added by a previous patch.
>>
>> The patch uses the new routines to add more rtx simplifications
>> involving vector duplicates.  These mirror simplifications that
>> we already do for CONST_VECTOR broadcasts and are needed for
>> variable-length SVE, which uses:
>>
>>   (const:M (vec_duplicate:M X))
>>
>> to represent constant broadcasts instead.  The simplifications do
>> trigger on the testsuite for variable duplicates too, and in each
>> case I saw the change was an improvement.  E.g.:
>>
> [ snip ]
>
>>
>> The best way of testing the new simplifications seemed to be
>> via selftests.  The patch cribs part of David's patch here:
>> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
> Cool.  I really wish I had more time to promote David's work by adding
> selftests to various things.  There's certainly cases where it's the
> most direct and useful way to test certain bits of lower level
> infrastructure we have.  Glad to see you found it useful here.
>
>
>
>>
>>
>> 2017-10-23  Richard Sandiford  
>>   David Malcolm  
>>   Alan Hayward  
>>   David Sherwood  
>>
>> gcc/
>>   * rtl.h (vec_duplicate_p): New function.
>>   * selftest-rtl.c (assert_rtx_eq_at): New function.
>>   * selftest-rtl.h (ASSERT_RTX_EQ): New macro.
>>   (assert_rtx_eq_at): Declare.
>>   * selftest.h (selftest::simplify_rtx_c_tests): Declare.
>>   * selftest-run-tests.c (selftest::run_tests): Call it.
>>   * simplify-rtx.c: Include selftest.h and selftest-rtl.h.
>>   (simplify_unary_operation_1): Recursively handle vector duplicates.
>>   (simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
>>   vector duplicates.
>>   (simplify_subreg): Handle subregs of vector duplicates.
>>   (make_test_reg, test_vector_ops_duplicate, test_vector_ops)
>>   (selftest::simplify_rtx_c_tests): New functions.

Hi Richard,

I've noticed that this patch (r254294) causes
FAIL: gcc.dg/vect/vect-126.c (internal compiler error)
FAIL: gcc.dg/vect/vect-126.c -flto -ffat-lto-objects (internal compiler error)
on arm* targets.
Sorry if this has been reported before, I've restarted validations
only recently,
so the process is still catching up.

gcc.log has this:
spawn -ignore SIGHUP
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/
/gcc/testsuite/gcc.dg/vect/vect-126.c -fno-diagnostics-show-caret
-fdiagnostics-color=never -ffast-math -ftree-vectorize
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o
vect-126.s
during RTL pass: combine
/gcc/testsuite/gcc.dg/vect/vect-126.c: In function 'f5':
/gcc/testsuite/gcc.dg/vect/vect-126.c:53:1: internal compiler error:
in neon_valid_immediate, at config/arm/arm.c:11850
0xf3e6c8 neon_valid_immediate
/gcc/config/arm/arm.c:11850
0xf3ea9a neon_immediate_valid_for_move(rtx_def*, machine_mode, rtx_def**, int*)
/gcc/config/arm/arm.c:11968
0xf40a20 arm_rtx_costs_internal
/gcc/config/arm/arm.c:10695
0xf40a20 arm_rtx_costs
/gcc/config/arm/arm.c:10946
0xb113ef rtx_cost(rtx_def*, machine_mode, rtx_code, int, bool)
/gcc/rtlanal.c:4187
0xb1169f set_src_cost
/gcc/rtl.h:2700
0xb1169f pattern_cost(rtx_def*, bool)
/gcc/rtlanal.c:5315
0x128bb3b combine_validate_cost
/gcc/combine.c:893
0x128bb3b try_combine
/gcc/combine.c:4113
0x12923d5 combine_instructions
/gcc/combine.c:1452
0x12926ed rest_of_handle_combine
/gcc/combine.c:14795
0x12926ed execute
/gcc/combine.c:14840
Please submit a full bug report,


Thanks,

Christophe

> Thanks for the examples of how this affects various targets.  Seems like
> it ought to be a consistent win when they trigger.
>
> jeff