from:"Sudi Das"

[PATCH][ARM] Fix more -Wreturn-type fallout

2017-11-10 Thread Sudi Das


Hi

This patch fixes a couple of more tests that are giving out warnings with 
-Wreturn-type:
- g++.dg/ext/pr57735.C
- gcc.target/arm/pr54300.C

*** gcc/testsuite/ChangeLog ***

2017-11-10  Sudakshina Das  

* g++.dg/ext/pr57735.C: Add -Wno-return-type for test.
* gcc.target/arm/pr54300.C (main): Add return type and
return a value.
diff --git a/gcc/testsuite/g++.dg/ext/pr57735.C b/gcc/testsuite/g++.dg/ext/pr57735.C
index a8f7d05712c9a823564d58e2a149c7bc356a5e9e..d9fc9e4aa5e4f47a77c2f9266456e274a9f06d68 100644
--- a/gcc/testsuite/g++.dg/ext/pr57735.C
+++ b/gcc/testsuite/g++.dg/ext/pr57735.C
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target arm_arch_v5te_ok } */
 /* { dg-require-effective-target arm_arm_ok } */
 /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } {"-mfloat-abi=soft" } } */
-/* { dg-options "-march=armv5te -marm  -mtune=xscale -mfloat-abi=soft -O1" } */
+/* { dg-options "-march=armv5te -marm  -mtune=xscale -mfloat-abi=soft -O1 -Wno-return-type" } */
 
 typedef unsigned int size_t;
 __extension__
diff --git a/gcc/testsuite/gcc.target/arm/pr54300.C b/gcc/testsuite/gcc.target/arm/pr54300.C
index eb1a74e36cff6d57f7444e1ca34b704c80efbf54..9105e279b331180aed8c5cadef2194cfe5b446ea 100644
--- a/gcc/testsuite/gcc.target/arm/pr54300.C
+++ b/gcc/testsuite/gcc.target/arm/pr54300.C
@@ -51,6 +51,7 @@ test(unsigned short *_Inp, int32_t *_Out,
   vst1q_s32( _Out, c );
 }
 
+int
 main()
 {
   unsigned short a[4] = {1, 2, 3, 4};
@@ -58,4 +59,5 @@ main()
   test(a, b, 1, 1, ~0);
   if (b[0] != 1 || b[1] != 2 || b[2] != 3 || b[3] != 4)
 abort();
+  return 0;
 }

[PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-16 Thread Sudi Das

Hi

This patch fixes the test case armv8_2-fp16-move-1.c for 
arm-none-linux-gnueabihf where 2 of the scan-assembler directives were failing. 
We now generate less vmov between core and VFP registers. Thus changing those 
directives to reflect that.

Is this ok for trunk?
If yes could someone commit it on my behalf?

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-16  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler
directives.

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
 /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 
-/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } }  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
 
 int
 test_compare_1 (__fp16 a, __fp16 b)

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-17 Thread Sudi Das


Hi Kyrill

Thanks I have made the change.

Sudi



From: Kyrill Tkachov 
Sent: Thursday, November 16, 2017 5:03 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
  

Hi Sudi,

On 16/11/17 16:37, Sudi Das wrote:
> Hi
>
> This patch fixes the test case armv8_2-fp16-move-1.c for 
> arm-none-linux-gnueabihf where 2 of the scan-assembler directives were 
> failing. We now generate less vmov between core and VFP registers. 
> Thus changing those directives to reflect that.
>
> Is this ok for trunk?
> If yes could someone commit it on my behalf?
>
> Sudi
>
>
> *** gcc/testsuite/ChangeLog ***
>
> 2017-11-16  Sudakshina Das  
>
> * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler
> directives.
>

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
  /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 
1 } }  */
  /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 
1 } }  */
  
-/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } }  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
  
Some of the moves between core and fp registers were the result of inefficient 
codegen and in hindsight
scanning for them was not very useful. Now that we emit only the required ones 
I think scanning for the plain
vmovs between two S-registers doesn't test anything useful.
So can you please just remove the second scan-assembler directive here?

Thanks,
Kyrill

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..8c0a53c 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,7 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
 /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 
-/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } }  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }  */
 
 int
 test_compare_1 (__fp16 a, __fp16 b)

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-22 Thread Sudi Das


Hi Kyrill and Christophe


In case of soft fp testing, there are other assembly directives apart from the 
vmov one which are also failing. The directives probably make more sense in the 
hard fp context so instead of removing the vmov, I have added the 
-mfloat-abi=hard option.

Is this ok for trunk? If yes could someone post it on my behalf? 

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-22  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option.




From: Kyrill Tkachov 
Sent: Monday, November 20, 2017 2:20 PM
To: Christophe Lyon
Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard 
Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
  


On 20/11/17 14:14, Christophe Lyon wrote:
> Hi,
>
> On 17 November 2017 at 12:12, Kyrill  Tkachov
>  wrote:
>> On 17/11/17 10:45, Sudi Das wrote:
>>> Hi Kyrill
>>>
>>> Thanks I have made the change.
>>
>> Thanks Sudi, I've committed this on your behalf with r254863.
>>
>> Kyrill
>>
>>
>>> Sudi
>>>
>>>
>>>
>>> From: Kyrill Tkachov 
>>> Sent: Thursday, November 16, 2017 5:03 PM
>>> To: Sudi Das; gcc-patches@gcc.gnu.org
>>> Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
>>> Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
>>>
>>> Hi Sudi,
>>>
>>> On 16/11/17 16:37, Sudi Das wrote:
>>>> Hi
>>>>
>>>> This patch fixes the test case armv8_2-fp16-move-1.c for
>>>> arm-none-linux-gnueabihf where 2 of the scan-assembler directives were
>>>> failing. We now generate less vmov between core and VFP registers.
>>>> Thus changing those directives to reflect that.
>>>>
>>>> Is this ok for trunk?
>>>> If yes could someone commit it on my behalf?
>>>>
>>>> Sudi
>>>>
>>>>
>>>> *** gcc/testsuite/ChangeLog ***
>>>>
>>>> 2017-11-16  Sudakshina Das  
>>>>
>>>>    * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov
>>>> scan-assembler
>>>>    directives.
>>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> index bb4e68f..0ed8560 100644
>>> --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
>>> @@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
>>> /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+,
>>> s[0-9]+} 1 } }  */
>>> /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+,
>>> s[0-9]+} 1 } }  */
>>> -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 }
>>> }  */
>>> -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }
>>> */
>>> +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }
>>> */
>>> +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
>>> Some of the moves between core and fp registers were the result of
>>> inefficient codegen and in hindsight
>>> scanning for them was not very useful. Now that we emit only the required
>>> ones I think scanning for the plain
>>> vmovs between two S-registers doesn't test anything useful.
>>> So can you please just remove the second scan-assembler directive here?
>>>
> You are probably already aware of that: the tests fail on
> arm-none-linux-gnueabi/arm-none-eabi
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times)
>
> but this is not a regression, the previous version of the test had the
> same problem.

Grrr, that's because the softfp ABI necessitates moves between core and 
FP registers,
so scanning for a particular number of vmovs between them is just not 
gonna be stable across
soft-float ABIs. At this point I'd just remove the scan for vmovs 
completely as it doesn't
seem to check anything useful.

A patch to remove that scan for VMOV is pre-approved.

Thanks for reminding me of this Christophe, softfp tends to slip my mind :(
Kyrill

> Christophe
>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index 8c0a53c..167286d 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile }  */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
-/* { dg-options "-O2" }  */
+/* { dg-options "-O2 -mfloat-abi=hard" }  */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
 
 __fp16

[COMMITTED, AARCH64, SVE TESTSUITE] Remove a couple of xfail from slp_5.c

2018-05-18 Thread Sudi Das

Hi

Since Richard Biener's commit from 16th May:

commit 2f05c3c7324cd293b7b2ba197e0a88d9749361cc
Author: rguenth 
Date:   Wed May 16 13:08:04 2018 +

the test case slp_5.c had started showing a couple of XPASS failures.
XPASS: gcc.target/aarch64/sve/slp_5.c -march=armv8.2-a+sve  scan-assembler-not 
\\tld2d\\t
XPASS: gcc.target/aarch64/sve/slp_5.c -march=armv8.2-a+sve  
scan-assembler-times \\tld1d\\t 3

Richard Sandiford helped to confirm that these xfails can now be
removed since the said commit fixed the issue that was preventing the
expected behavior. This patch does the same.

Testing done: Only test case change, so only ran the testcase on 
aarch64-none-linux-gnu.

Committed as an obvious fix.

Thanks
Sudi

*** gcc/testsuite/ChangeLog ***

2018-05-18  Sudakshina Das  

    * gcc.target/aarch64/sve/slp_5.c: Remove xfail for tld1d and tld2d.diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c
index 7ff12c5..b75edc69e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c
@@ -42,11 +42,11 @@ TEST_ALL (VEC_PERM)
 /* { dg-final { scan-assembler-times {\tld1b\t} 1 } } */
 /* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */
 /* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */
-/* { dg-final { scan-assembler-times {\tld1d\t} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */
 /* { dg-final { scan-assembler-not {\tld2b\t} } } */
 /* { dg-final { scan-assembler-not {\tld2h\t} } } */
 /* { dg-final { scan-assembler-not {\tld2w\t} } } */
-/* { dg-final { scan-assembler-not {\tld2d\t} { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not {\tld2d\t} } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b} 4 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h} 4 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-08-01 Thread Sudi Das



 

Sorry about the delayed response but looking at the above discussion, should I 
conclude that this is a valid tree simplification?
I am pasting the diff of the assembly that AArch64 generates with the test case 
that I added. I see fewer instructions generated with the patch.

--- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
+++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
@@ -24,10 +24,8 @@
     str    x0, [sp, 8]
     ldr    x0, [sp, 8]
     mov    w1, w0
-    mov    w0, 63
-    sub    w0, w0, w1
-    mov    x1, 1
-    lsl    x0, x1, x0
+    mov    x0, -9223372036854775808
+    lsr    x0, x0, x1
     add    sp, sp, 16
     ret
     .size    f2, .-f2
@@ -39,10 +37,8 @@
     str    x0, [sp, 8]
     ldr    x0, [sp, 8]
     mov    w1, w0
-    mov    w0, 63
-    sub    w0, w0, w1
-    mov    x1, 1
-    lsl    x0, x1, x0
+    mov    x0, -9223372036854775808
+    lsr    x0, x0, x1
     add    sp, sp, 16
     ret
     .size    f3, .-f3
@@ -52,11 +48,9 @@
 f4:
     sub    sp, sp, #16
     str    w0, [sp, 12]
-    mov    w1, 31
     ldr    w0, [sp, 12]
-    sub    w0, w1, w0
-    mov    w1, 1
-    lsl    w0, w1, w0
+    mov    w1, -2147483648
+    lsr    w0, w1, w0
     add    sp, sp, 16
     ret
     .size    f4, .-f4


Thanks

Sudi 




From: Wilco Dijkstra
Sent: Thursday, April 13, 2017 1:01 PM
To: Richard Biener; Jakub Jelinek
Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
Richard Biener wrote:
> It is IMHO a valid GIMPLE optimization / canonicalization.
>
>    movabsq $-9223372036854775808, %rax
>
> so this should then have been generated as 1<<63?
>
> At some point variable shifts were quite expensive as well..

Yes I don't see a major difference between movabsq and

    movl    $1, %eax
    salq    $63, %rax

on my Sandy Bridge, but if the above is faster then that is what the x64
backend should emit - it's 1 byte smaller as well, so probably better in all
cases.

Wilco

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-08-07 Thread Sudi Das


Hi Richard

I have updated the patch according to your comments. Thanks for pointing it out 
and sorry for the delay!

Sudi

2017-08-07 Sudakshina Das  

* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
check type
for aarch64_simd_valid_immediate.
(aarch64_output_simd_general_immediate): New declaration.
(aarch64_simd_valid_immediate): Update prototype.

* config/aarch64/aarch64-simd.md (orr3): modified pattern to add
support for ORR-immediate.
(and3): modified pattern to add support for BIC-immediate.

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
checks
for valid immediate for BIC and ORR based on new enum argument.
(aarch64_output_simd_general_immediate): New function to output new 
BIC/ORR.
 
* config/aarch64/constraints.md (Do): New vector immediate constraint.
(Db): Likewise.

2017-08-07 Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.





From: Richard Earnshaw (lists) 
Sent: Friday, May 5, 2017 2:30 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc: nd; Marcus Shawcroft; James Greenhalgh
Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
    
On 18/04/17 17:39, Sudi Das wrote:
> 
> Hello all
> 
> This patch adds the support for BIC (vector, immediate) and ORR (vector, 
> immediate) SIMD patterns to the AArch64 backend.
> One of the examples of this is : (with -O2 -ftree-vectorize)
> 
> void
> bic_s (short *a)
> {
>   for (int i = 0; i < 1024; i++)
> a[i] &= ~(0xff);
> }
> 
> which now produces :
> bic_s:
>    add x1, x0, 2048
>    .p2align 2
> .L2:
>    ldr q0, [x0]
>    bic v0.8h, #255
>    str q0, [x0], 16
>    cmp x1, x0
>    bne .L2
>    ret
> 
> instead of
> bic_s:
>    movi    v1.8h, 0xff, lsl 8
>    add x1, x0, 2048
>    .p2align 2
> .L2:
>    ldr q0, [x0]
>    and v0.16b, v0.16b, v1.16b
>    str q0, [x0], 16
>    cmp x1, x0
>    bne .L2
>    ret
> 
> Added new tests and checked for regressions on bootstrapped 
> aarch64-none-linux-gnu
> Ok for stage 1?
> 
> Thanks 
> Sudi
> 
> 2017-04-04 Sudakshina Das  
> 
>    * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
>check type
>    for aarch64_simd_valid_immediate.
>    (aarch64_output_simd_general_immediate): New declaration.
>    (aarch64_simd_valid_immediate): Update prototype.
> 
>    * config/aarch64/aarch64-simd.md (*bic_imm_3): New pattern.
>    (*ior_imm_3): Likewise.
> 
>    * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function 
>now checks
>    for valid immediate for BIC and ORR based on new enum argument.
>    (aarch64_output_simd_general_immediate): New function to output new 
>BIC/ORR.
>  
>    * config/aarch64/predicates.md (aarch64_simd_valid_bic_imm_p) : New.
>    (aarch64_simd_valid_orr_imm_p) : Likewise.
> 
> 2017-04-04 Sudakshina Das  
> 
>    * gcc.target/aarch64/bic_imm_1.c: New test.
>    * gcc.target/aarch64/orr_imm_1.c: Likewise.
> 
> 
> patch-7260-2.diff
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 9543f8c..89cc455 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -297,6 +297,15 @@ enum aarch64_parse_opt_result
>    AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>  };
>  
> +/* Enum to distinguish which type of check is to be done in
> +   aarch64_simd_valid_immediate.  This is used as a bitmask where CHECK_ALL
> +   has both bits set.  Adding new types would require changes accordingly.  
> */
> +enum simd_immediate_check {
> +  CHECK_I   = 1, /* Perform only non-inverted immediate checks (ORR).  */
> +  CHECK_NI  = 2, /* Perform only inverted immediate checks (BIC).  */
> +  CHECK_ALL = 3  /* Perform all checks (MOVI/MNVI).  */
> +};
> +
>  extern struct tune_params aarch64_tune_params;
>  
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
> @@ -334,6 +343,8 @@ rtx aarch64_reverse_mask (enum machine_mode);
>  bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
>  char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode);
>  char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned);
> +char *aarch64_output_simd_general_immediate (rtx, machine_mode, unsigned,
> +  const char*);
>  bool aarch64_pad_arg_upward (machi

[PATCH][AArch64] Fix missing optimization for CMP+AND

2017-03-29 Thread Sudi Das


Hi all

During combine GCC tries to merge CMP (with zero) and AND into a TST. However, 
in cases where an ANDS operand is not compatible, this was being missed. Adding 
a define_split where this operand was moved to a register seems to help out. 

For example for a test :

int
f (unsigned char *p)
{
  return p[0] == 50 || p[0] == 52;
}

int
g (unsigned char *p)
{
  return (p[0] >> 4 & 0xfd) == 0;
}

we are now generating

f:
ldrbw0, [x0]
mov w1, 253
sub w0, w0, #50
tst w0, w1
csetw0, eq
ret
.size   f, .-f
.align  2
.p2align 3,,7
.global g
.type   g, %function
g:
ldrbw1, [x0]
mov w0, 13
tst w0, w1, lsr 4
csetw0, eq
ret

instead of

f:
ldrbw0, [x0]
sub w0, w0, #50
and w0, w0, -3
and w0, w0, 255
cmp w0, 0
csetw0, eq
ret
.size   f, .-f
.align  2
.p2align 3,,7
.global g
.type   g, %function
g:
ldrbw0, [x0]
lsr w0, w0, 4
and w0, w0, -3
cmp w0, 0
csetw0, eq
ret

Added this new test and checked for regressions on bootstrapped 
aarch64-none-linux-gnu
Ok for stage 1?

Thanks 
Sudi

2017-03-17 Kyrylo Tkachov  
   Sudakshina Das  

* config/aarch64/aarch64.md (define_split for and3nr_compare0): 
Move 
non aarch64_logical_operand to a register.
(define_split for and_3nr_compare0): Move non 
register 
immediate operand to a register.

* config/aarch64/predicates.md (aarch64_mov_imm_operand): New.

2017-03-17  Kyrylo Tkachov  
Sudakshina Das  

* gcc.target/aarch64/tst_imm_split_1.c: New Test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5adc5ed..5e5dbff 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3881,6 +3881,22 @@
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
+(define_split
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (and:GPI (match_operand:GPI 0 "register_operand")
+		  (match_operand:GPI 1 "aarch64_mov_imm_operand"))
+	 (const_int 0)))
+   (clobber (match_operand:SI 2 "register_operand"))]
+  ""
+  [(set (match_dup 2) (match_dup 1))
+   (set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (and:GPI (match_dup 0)
+		  (match_dup 2))
+	 (const_int 0)))]
+)
+
 (define_insn "*and3nr_compare0_zextract"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
@@ -3916,6 +3932,26 @@
   [(set_attr "type" "logics_shift_imm")]
 )
 
+(define_split
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (and:GPI (SHIFT:GPI
+		   (match_operand:GPI 0 "register_operand")
+		   (match_operand:QI 1 "aarch64_shift_imm_"))
+		  (match_operand:GPI 2 "aarch64_mov_imm_operand"))
+	(const_int 0)))
+(clobber (match_operand:SI 3 "register_operand"))]
+  ""
+  [(set (match_dup 3) (match_dup 2))
+   (set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (and:GPI (SHIFT:GPI
+		   (match_dup 0)
+		   (match_dup 1))
+		  (match_dup 3))
+	 (const_int 0)))]
+)
+
 ;; ---
 ;; Shifts
 ;; ---
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index e83d45b..ed62b8e 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -106,6 +106,10 @@
   (ior (match_operand 0 "register_operand")
(match_operand 0 "aarch64_logical_immediate")))
 
+(define_predicate "aarch64_mov_imm_operand"
+  (and (match_code "const_int")
+   (match_test "aarch64_move_imm (INTVAL (op), mode)")))
+
 (define_predicate "aarch64_logical_and_immediate"
   (and (match_code "const_int")
(match_test "aarch64_and_bitmask_imm (INTVAL (op), mode)")))
diff --git a/gcc/testsuite/gcc.target/aarch64/tst_imm_split_1.c b/gcc/testsuite/gcc.target/aarch64/tst_imm_split_1.c
new file mode 100644
index 000..33a2c0f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tst_imm_split_1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+f (unsigned char *p)
+{
+  return p[0] == 50 || p[0] == 52;
+}
+
+int
+g (unsigned char *p)
+{
+  return (p[0] >> 4 & 0xfd) == 0;
+}
+
+/* { dg-final { scan-assembler-not "and\\t\[xw\]\[0-9\]+, \[xw\]\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "tst\\t\[xw\]\[0-9\]+, \[xw\]\[0-9\]+" } } */
+/* { dg-final { scan-assembler "tst\\t\[xw\]\[0-9\]+, \[xw\]\[0-9\]+, lsr 4" } } */

[PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-12 Thread Sudi Das

Hi all

This is a fix for PR 80131 
Currently the code A << (B - C) is not simplified.
However at least a more specific case of 1U << (C -x) where C = precision(type) 
- 1 can be simplified to (1 << C) >> x.

This is done by adding a new simplification rule in match.pd

So for a test case :

unsigned int f1(unsigned int i)
{
  return 1U << (31 - i);
}

We see a gimple dump of 

f1 (unsigned int i)
{
  unsigned int D.3121;

  D.3121 = 2147483648 >> i;
  return D.3121;
}

instead of 

f1 (unsigned int i)
{
  unsigned int D.3121;

  _1 = 31 - i;
  D.3121 = 1 << _1;
  return D.3121;
}


Add a new test case and checked for regressions on bootstrapped 
aarch64-none-linux-gnu.
Ok for stage 1?

Thanks
Sudi

2017-03-23  Sudakshina Das  

PR middle-end/80131
* match.pd: Simplify 1 << (C - x) where C = precision (type) - 1.

2017-03-23  Sudakshina Das  

PR middle-end/80131
* testsuite/gcc.dg/pr80131-1.c: New Test.diff --git a/gcc/match.pd b/gcc/match.pd
index 7b96800..be20fb7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -508,6 +508,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+   && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))
+   (if (TYPE_UNSIGNED(type))
+ (rshift (lshift @0 @1) @2)
+   (with
+{ tree utype = unsigned_type_for (type); }
+(convert:type (rshift (lshift (convert:utype @0) @1) @2))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c
new file mode 100644
index 000..2bb6ff3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80131-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Checks the simplification of:
+   1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1
+   f1 is not simplified but f2, f3 and f4 are. */
+
+__INT64_TYPE__ f1 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (31 - i);
+}
+
+__INT64_TYPE__ f2 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (63 - i);
+}
+
+__UINT64_TYPE__ f3 (__INT64_TYPE__ i)
+{
+  return (__UINT64_TYPE__)1 << (63 - i);
+}
+
+__INT32_TYPE__ f4 (__INT32_TYPE__ i)
+{
+  return (__INT32_TYPE__)1 << (31 - i);
+}
+
+/* { dg-final { scan-tree-dump-times "= 31 -"  1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */

[PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-04-18 Thread Sudi Das


Hello all

This patch adds the support for BIC (vector, immediate) and ORR (vector, 
immediate) SIMD patterns to the AArch64 backend.
One of the examples of this is : (with -O2 -ftree-vectorize)

void
bic_s (short *a)
{
  for (int i = 0; i < 1024; i++)
a[i] &= ~(0xff);
}

which now produces :
bic_s:
add x1, x0, 2048
.p2align 2
.L2:
ldr q0, [x0]
bic v0.8h, #255
str q0, [x0], 16
cmp x1, x0
bne .L2
ret

instead of
bic_s:
moviv1.8h, 0xff, lsl 8
add x1, x0, 2048
.p2align 2
.L2:
ldr q0, [x0]
and v0.16b, v0.16b, v1.16b
str q0, [x0], 16
cmp x1, x0
bne .L2
ret

Added new tests and checked for regressions on bootstrapped 
aarch64-none-linux-gnu
Ok for stage 1?

Thanks 
Sudi

2017-04-04 Sudakshina Das  

* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
check type
for aarch64_simd_valid_immediate.
(aarch64_output_simd_general_immediate): New declaration.
(aarch64_simd_valid_immediate): Update prototype.

* config/aarch64/aarch64-simd.md (*bic_imm_3): New pattern.
(*ior_imm_3): Likewise.

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
checks
for valid immediate for BIC and ORR based on new enum argument.
(aarch64_output_simd_general_immediate): New function to output new 
BIC/ORR.
 
* config/aarch64/predicates.md (aarch64_simd_valid_bic_imm_p) : New.
(aarch64_simd_valid_orr_imm_p) : Likewise.

2017-04-04 Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9543f8c..89cc455 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -297,6 +297,15 @@ enum aarch64_parse_opt_result
   AARCH64_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
 };
 
+/* Enum to distinguish which type of check is to be done in
+   aarch64_simd_valid_immediate.  This is used as a bitmask where CHECK_ALL
+   has both bits set.  Adding new types would require changes accordingly.  */
+enum simd_immediate_check {
+  CHECK_I   = 1,	/* Perform only non-inverted immediate checks (ORR).  */
+  CHECK_NI  = 2,	/* Perform only inverted immediate checks (BIC).  */
+  CHECK_ALL = 3		/* Perform all checks (MOVI/MNVI).  */
+};
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -334,6 +343,8 @@ rtx aarch64_reverse_mask (enum machine_mode);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode);
 char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned);
+char *aarch64_output_simd_general_immediate (rtx, machine_mode, unsigned,
+	 const char*);
 bool aarch64_pad_arg_upward (machine_mode, const_tree);
 bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
 bool aarch64_regno_ok_for_base_p (int, bool);
@@ -345,7 +356,8 @@ bool aarch64_simd_imm_zero_p (rtx, machine_mode);
 bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode);
 bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_simd_valid_immediate (rtx, machine_mode, bool,
-   struct simd_immediate_info *);
+   struct simd_immediate_info *,
+   enum simd_immediate_check w = CHECK_ALL);
 bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c462164..92275dc 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -280,6 +280,26 @@
   [(set_attr "type" "neon_logic")]
 )
 
+(define_insn "*bic_imm_3"
+ [(set (match_operand:VDQ_I 0 "register_operand" "=w")
+   (and:VDQ_I (match_operand:VDQ_I 1 "register_operand" "0")
+		(match_operand:VDQ_I 2 "aarch64_simd_valid_bic_imm_p" "")))]
+ "TARGET_SIMD"
+ { return aarch64_output_simd_general_immediate (operands[2],
+			mode, GET_MODE_BITSIZE (mode), "bic"); }
+  [(set_attr "type" "neon_logic")]
+)
+
+(define_insn "*ior_imm_3"
+ [(set (match_operand:VDQ_I 0 "register_operand" "=w")
+   (ior:VDQ_I (match_operand:VDQ_I 1 "register_operand" "0")
+		(match_operand:VDQ_I 2 "aarch64_simd_valid_orr_imm_p" "")))]
+ "TARGET_SIMD"
+ { return aarch64_output_simd_general_immediate (operands[2],
+			mode, GET_MODE_BITSIZE (mode), "orr"); }
+  [(set_attr "type" "neon_logic")]
+)
+
 (define_insn "add3"
   [(set (match_operand:VDQ_I 0 "register_operand" "=w")
 (plus:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4f769a4..450c42d 100644
---

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-09-25 Thread Sudi Das

Hi James

I put aarch64_output_simd_general_immediate looking at the similarities of the 
immediates
for mov/mvni and orr/bic. The CHECK macro in aarch64_simd_valid_immediate both 
checks
and converts the immediates in a manner that are needed for the instructions.

Having said that, I agree that maybe I could have refactored 
aarch64_output_simd_mov_immediate
to do the work rather than creating a new functions to do similar things. I 
have done so in this patch.

I have also changed the names of the enum simd_immediate_check to be better 
indicative of what they are doing. 

Lastly I have added more cases in the tests (according to all the possible 
CHECKs) and made
them dg-do assemble (although I had to add --save-temps so that the 
scan-assembler
would work). Do you think I should not put that option and rather create 
separate tests?

Thanks,
Sudi

From: James Greenhalgh 
Sent: Wednesday, September 20, 2017 11:39 AM
To: Sudi Das
Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

On Mon, Aug 07, 2017 at 02:56:09PM +0100, Sudi Das wrote:
> 
> Hi Richard
> 
> I have updated the patch according to your comments. Thanks for pointing it
> out and sorry for the delay!

Hi Sudi,

I've taken a look at your patch - at a high level, I think that adding
aarch64_output_simd_general_immediate seems like an over-complication, and
I'm not sure I follow why we need some of the logic you add. I think we
might be able to really simplify this patch, by adding a new character to
aarch64_print_operand for an (optionally) shifted immmediate to bic/orr.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index beff28e..5ae73c7 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -308,6 +308,15 @@ enum aarch64_parse_opt_result
>    AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>  };
>  
> +/* Enum to distinguish which type of check is to be done in
> +   aarch64_simd_valid_immediate.  This is used as a bitmask where CHECK_ALL
> +   has both bits set.  Adding new types would require changes accordingly.  
> */
> +enum simd_immediate_check {
> +  CHECK_I   = 1, /* Perform only non-inverted immediate checks (ORR).  */
> +  CHECK_NI  = 2, /* Perform only inverted immediate checks (BIC).  */

Are these comments the right way round? If so, why is I non-inverted and
NI inverted - those names seem back to front to me.

These should probably all be AARCH64_CHECK_* rather than just CHECK_*

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 055ebaf..2cce5af 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13029,6 +13041,47 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
>    return templ;
>  }
>  
> +/* This function is similar to aarch64_output_simd_mov_immediate, used for
> +   immediate versions of 'bic' or 'orr'.  */
> +char*
> +aarch64_output_simd_general_immediate (rtx const_vector,
> +    machine_mode mode,
> +    unsigned width,

You can drop this parameter - it is always GET_MODE_BITSIZE (mode) so just
calculate that in the function body.

> +    const char *mnemonic)
> +{
> +  bool is_valid;
> +  static char templ[40];
> +  unsigned int lane_count = 0;
> +  char element_char;
> +
> +  struct simd_immediate_info info = { NULL_RTX, 0, 0, false, false };
> +
> +  if (strcmp (mnemonic, "orr") == 0)

For all the difference it makes, I'd just pass a bool and save save
yourself the short string comparison.

> +    is_valid = aarch64_simd_valid_immediate (const_vector, mode, false,
> +  &info, CHECK_I);
> +  else
> +    is_valid = aarch64_simd_valid_immediate (const_vector, mode, false,
> +  &info, CHECK_NI);
> +
> +  gcc_assert (is_valid);
> +  gcc_assert (CONST_INT_P (info.value));
> +  element_char = sizetochar (info.element_width);
> +  lane_count = width / info.element_width;

Width is always GET_MODE_BITSIZE (mode), and mode is one of
V8QI V16QI V4HI V8HI V2SI V4SI V2DI - so width is either 64 or 128.
in the CHECK_I and CHECK_NI cases of aarch64_simd_valid_immediate the elsize
(used to set info.element_width) is only ever 16 or 32. That means lane count
is either going to be 2, 4 or 8...

> +  if (lane_count == 1)
> +    sprintf (templ, "%s\t%%d0, #" HOST_WIDE_INT_PRINT_DEC,
> +  mnemonic, UINTVAL (info.value));

Which means this case never fires -- Which is a good thing, as my copy of
the Arm Architecture Reference Manual does

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-09-26 Thread Sudi Das


Still waiting on Jakub's comment on whether there are more things needed at the 
backend. But I have updated the patch according to Richard's comments.

Thanks
Sudi



From: Richard Biener 
Sent: Friday, August 4, 2017 11:16 AM
To: Sudi Das
Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James 
Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das  wrote:
>
>
>
>
> Sorry about the delayed response but looking at the above discussion, should 
> I conclude that this is a valid tree simplification?

Yes, I think so.  Jakub requested code to undo this at RTL expansion
based on target costs, not sure if we really should
require that from you given the user could have written the target
sequence himself.

Few comments about the patch:

+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))

I think this warrants a single_use check on the minus (note :s isn't enough
as with the unsigned case we'd happily ignore it by design).

+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+   && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))

You can relax this with using

  && wi::eq_p (@1, TYPE_PRECISION (type) - 1)

+   (if (TYPE_UNSIGNED(type))
+ (rshift (lshift @0 @1) @2)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert:type (rshift (lshift (convert:utype @0) @1) @2))
+

You can write (convert (rshift ...)), without the :type.

I'm leaving it to Jakub whether you need to write that RTL expansion tweak.

Thanks,
Richard.

> I am pasting the diff of the assembly that AArch64 generates with the test 
> case that I added. I see fewer instructions generated with the patch.
>
> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
> @@ -24,10 +24,8 @@
>  str    x0, [sp, 8]
>  ldr    x0, [sp, 8]
>  mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>  add    sp, sp, 16
>  ret
>  .size    f2, .-f2
> @@ -39,10 +37,8 @@
>  str    x0, [sp, 8]
>  ldr    x0, [sp, 8]
>  mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>  add    sp, sp, 16
>  ret
>  .size    f3, .-f3
> @@ -52,11 +48,9 @@
>  f4:
>  sub    sp, sp, #16
>  str    w0, [sp, 12]
> -    mov    w1, 31
>  ldr    w0, [sp, 12]
> -    sub    w0, w1, w0
> -    mov    w1, 1
> -    lsl    w0, w1, w0
> +    mov    w1, -2147483648
> +    lsr    w0, w1, w0
>  add    sp, sp, 16
>  ret
>  .size    f4, .-f4
>
>
> Thanks
>
> Sudi
>
>
>
>
> From: Wilco Dijkstra
> Sent: Thursday, April 13, 2017 1:01 PM
> To: Richard Biener; Jakub Jelinek
> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> Richard Biener wrote:
>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>
>>    movabsq $-9223372036854775808, %rax
>>
>> so this should then have been generated as 1<<63?
>>
>> At some point variable shifts were quite expensive as well..
>
> Yes I don't see a major difference between movabsq and
>
> movl    $1, %eax
> salq    $63, %rax
>
> on my Sandy Bridge, but if the above is faster then that is what the x64
> backend should emit - it's 1 byte smaller as well, so probably better in all
> cases.
>
> Wilco
diff --git a/gcc/match.pd b/gcc/match.pd
index e9017e4..160c12d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+   && wi::eq_p (@2, TYPE_PRECISION (type) - 1)
+   && single_use (@1))
+   (if (TYPE_UNSIGNED(type))
+ (rshift (lshift @0 @2) @3)
+   (with
+{ tree utype = unsigned_type_for (type); }
+(convert (rshift (lshift (convert:utype @0) @2) @3))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/p

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-09-27 Thread Sudi Das



Hi James

I have made the requested changes to the patch.


2017-09-27 Sudakshina Das  

* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
check type
for aarch64_simd_valid_immediate.
(aarch64_output_simd_mov_immediate): Update prototype.
(aarch64_simd_valid_immediate): Update prototype.

* config/aarch64/aarch64-simd.md (orr3): modified pattern to add
support for ORR-immediate.
(and3): modified pattern to add support for BIC-immediate.

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
checks
for valid immediate for BIC and ORR based on new enum argument.
(aarch64_output_simd_mov_immediate): Function now used to output 
BIC/ORR imm
as well based on new enum argument.
 
* config/aarch64/constraints.md (Do): New vector immediate constraint.
(Db): Likewise.

2017-09-27 Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.


Thanks
Sudi

  
From: James Greenhalgh 
Sent: Tuesday, September 26, 2017 8:04:38 PM
To: Sudi Das
Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
    
On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote:
> 
> Hi James
> 
> I put aarch64_output_simd_general_immediate looking at the similarities of
> the immediates for mov/mvni and orr/bic. The CHECK macro in
> aarch64_simd_valid_immediate both checks
> and converts the immediates in a manner that are needed for the instructions.
> 
> Having said that, I agree that maybe I could have refactored
> aarch64_output_simd_mov_immediate to do the work rather than creating a new
> functions to do similar things. I have done so in this patch.

Thanks, this looks much neater.

> I have also changed the names of the enum simd_immediate_check to be better
> indicative of what they are doing. 

Thanks, I'd tweak them to look more like the bitmasks you use them as, but
that is a small change for my personal preference.

> Lastly I have added more cases in the tests (according to all the possible
> CHECKs) and made them dg-do assemble (although I had to add --save-temps so
> that the scan-assembler would work). Do you think I should not put that
> option and rather create separate tests?

This is good - thanks.

I think clean up the enum definitions and this patch will be good.

> @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
>    AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>  };
>  
> +/* Enum to distinguish which type of check is to be done in
> +   aarch64_simd_valid_immediate.  This is used as a bitmask where
> +   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
> +   perform all checks.  Adding new types would require changes accordingly.  
> */
> +enum simd_immediate_check {
> +  AARCH64_CHECK_ORR  = 1,    /* Perform immediate checks for ORR.  */
> +  AARCH64_CHECK_BIC  = 2,    /* Perform immediate checks for BIC.  */
> +  AARCH64_CHECK_MOV  = 3 /* Perform all checks (used for MOVI/MNVI).  */

These are used in bit-mask style, so how about:

  AARCH64_CHECK_ORR = 1 << 0,
  AARCH64_CHECK_BIC = 1 << 1,
  AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC

Which is more self-documenting.

> @@ -13001,7 +13013,8 @@ aarch64_float_const_representable_p (rtx x)
>  char*
>  aarch64_output_simd_mov_immediate (rtx const_vector,
>   machine_mode mode,
> -    unsigned width)
> +    unsigned width,
> +    enum simd_immediate_check which)

This function is sorely missing a comment explaining the parameters - it
would be very helpful if you could add one as part of this patch.

Thanks,
James

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e67c2ed..5d7c5df 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
   AARCH64_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
 };
 
+/* Enum to distinguish which type of check is to be done in
+   aarch64_simd_valid_immediate.  This is used as a bitmask where
+   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
+   perform all checks.  Adding new types would require changes accordingly.  */
+enum simd_immediate_check {
+  AARCH64_CHECK_ORR  = 1 << 0,
+  AARCH64_CHECK_BIC  = 1 << 1,
+  AARCH64_CHECK_MOV  = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC
+};
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -345,7 +355,8 @@ bool aarch64_mov_operand_p (

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-10-02 Thread Sudi Das


Hi Richard

Thanks, I have made the change to the patch.


2017-10-02 Sudakshina Das  

* config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
check type
for aarch64_simd_valid_immediate.
(aarch64_output_simd_mov_immediate): Update prototype.
(aarch64_simd_valid_immediate): Update prototype.

* config/aarch64/aarch64-simd.md (orr3): modified pattern to add
support for ORR-immediate.
(and3): modified pattern to add support for BIC-immediate.

* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now 
checks
for valid immediate for BIC and ORR based on new enum argument.
(aarch64_output_simd_mov_immediate): Function now used to output 
BIC/ORR imm
as well based on new enum argument.
 
* config/aarch64/constraints.md (Do): New vector immediate constraint.
(Db) : Likewise.

* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): New predicate.
(aarch64_reg_or_bic_imm): Likewise.


2017-10-02 Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.


From: Richard Earnshaw (lists) 
Sent: Thursday, September 28, 2017 9:55 AM
To: Sudi Das; James Greenhalgh
Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
    
On 27/09/17 18:57, Sudi Das wrote:
> 
> 
> Hi James
> 
> I have made the requested changes to the patch.
> 
> 
> 2017-09-27 Sudakshina Das  
> 
>    * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
>check type
>    for aarch64_simd_valid_immediate.
>    (aarch64_output_simd_mov_immediate): Update prototype.
>    (aarch64_simd_valid_immediate): Update prototype.
> 
>    * config/aarch64/aarch64-simd.md (orr3): modified pattern to add
>    support for ORR-immediate.
>    (and3): modified pattern to add support for BIC-immediate.
> 
>    * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function 
>now checks
>    for valid immediate for BIC and ORR based on new enum argument.
>    (aarch64_output_simd_mov_immediate): Function now used to output 
>BIC/ORR imm
>    as well based on new enum argument.
>  
>    * config/aarch64/constraints.md (Do): New vector immediate constraint.
>    (Db): Likewise.
> 
> 2017-09-27 Sudakshina Das  
> 
>    * gcc.target/aarch64/bic_imm_1.c: New test.
>    * gcc.target/aarch64/orr_imm_1.c: Likewise.
> 
> 
> Thanks
> Sudi
> 
>   
> From: James Greenhalgh 
> Sent: Tuesday, September 26, 2017 8:04:38 PM
> To: Sudi Das
> Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
> Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
> 
> On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote:
>>
>> Hi James
>>
>> I put aarch64_output_simd_general_immediate looking at the similarities of
>> the immediates for mov/mvni and orr/bic. The CHECK macro in
>> aarch64_simd_valid_immediate both checks
>> and converts the immediates in a manner that are needed for the instructions.
>>
>> Having said that, I agree that maybe I could have refactored
>> aarch64_output_simd_mov_immediate to do the work rather than creating a new
>> functions to do similar things. I have done so in this patch.
> 
> Thanks, this looks much neater.
> 
>> I have also changed the names of the enum simd_immediate_check to be better
>> indicative of what they are doing. 
> 
> Thanks, I'd tweak them to look more like the bitmasks you use them as, but
> that is a small change for my personal preference.
> 
>> Lastly I have added more cases in the tests (according to all the possible
>> CHECKs) and made them dg-do assemble (although I had to add --save-temps so
>> that the scan-assembler would work). Do you think I should not put that
>> option and rather create separate tests?
> 
> This is good - thanks.
> 
> I think clean up the enum definitions and this patch will be good.
> 
>> @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result
>> AARCH64_PARSE_INVALID_ARG  /* Invalid arch, tune, cpu arg.  */
>>   };
>>   
>> +/* Enum to distinguish which type of check is to be done in
>> +   aarch64_simd_valid_immediate.  This is used as a bitmask where
>> +   AARCH64_CHECK_MOV has both bits set.  Thus AARCH64_CHECK_MOV will
>> +   perform all checks.  Adding new types would require changes accordingly. 
>>  */
>> +enum simd_immediate_check {
>> +  AARCH64_CHECK_ORR  = 1,    /* Perform immediate checks for ORR.  */
>> +  AARCH64_CHECK_BIC  = 2,    /* Perform imm

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

2017-10-05 Thread Sudi Das


Hi Steve

Sorry about this.
I am on it. I have a fix and I am running tests on it right now.

Sudi


From: Steve Ellcey 
Sent: Thursday, October 5, 2017 12:05 AM
To: Richard Earnshaw; Sudi Das; James Greenhalgh
Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
    
On Wed, 2017-10-04 at 16:41 +0100, Richard Earnshaw (lists) wrote:
> On 02/10/17 10:05, Sudi Das wrote:
> > 
> > 2017-10-02 Sudakshina Das  
> > 
> >  * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New 
> >check type
> >  for aarch64_simd_valid_immediate.
> >  (aarch64_output_simd_mov_immediate): Update prototype.
> >  (aarch64_simd_valid_immediate): Update prototype.
> > 
> >  * config/aarch64/aarch64-simd.md (orr3): modified pattern to add
> >  support for ORR-immediate.
> >  (and3): modified pattern to add support for BIC-immediate.
> > 
> >  * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function 
> >now checks
> >  for valid immediate for BIC and ORR based on new enum argument.
> >  (aarch64_output_simd_mov_immediate): Function now used to output 
> >BIC/ORR imm
> >  as well based on new enum argument.
> >  
> >  * config/aarch64/constraints.md (Do): New vector immediate constraint.
> >  (Db) : Likewise.
> > 
> >  * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): New
> > predicate.
> >  (aarch64_reg_or_bic_imm): Likewise.

I think this patch is causing a bunch of test failures on aarch64.  I
had to apply the patch for PR82396 (that was reverted) in order to
build ToT GCC, but when I did that and ran the testsuite I got a bunch
of failures like:

/home/sellcey/cavium-pr-27386/src/gcc/gcc/testsuite/gcc.c-
torture/compile/pr54713-1.c:45:18: internal compiler error: in
aarch64_simd_valid_immediate, at config/aarch64/aarch64.c:11539
0xf2227b aarch64_simd_valid_immediate(rtx_def*, machine_mode, bool,
simd_immediate_info*, simd_immediate_check)
../../../src/gcc/gcc/config/aarch64/aarch64.c:11539
0x11047b3 aarch64_reg_or_bic_imm(rtx_def*, machine_mode)
../../../src/gcc/gcc/config/aarch64/predicates.md:79
0xab29ab insn_operand_matches(insn_code, unsigned int, rtx_def*)
../../../src/gcc/gcc/optabs.c:6891
0xab29ab maybe_legitimize_operand_same_code
../../../src/gcc/gcc/optabs.c:6919
0xab545f maybe_legitimize_operand
../../../src/gcc/gcc/optabs.c:6990
0xab545f maybe_legitimize_operands(insn_code, unsigned int, unsigned
int, expand_operand*)
../../../src/gcc/gcc/optabs.c:7055
0xab5a8f maybe_gen_insn(insn_code, unsigned int, expand_operand*)
../../../src/gcc/gcc/optabs.c:7073
0xab8503 expand_binop_directly
../../../src/gcc/gcc/optabs.c:1075
0xab87af expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
rtx_def*, int, optab_methods)
../../../src/gcc/gcc/optabs.c:1156
0x8736d7 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../../src/gcc/gcc/expr.c:9582
0x749027 expand_gimple_stmt_1
../../../src/gcc/gcc/cfgexpand.c:3691
0x749027 expand_gimple_stmt
../../../src/gcc/gcc/cfgexpand.c:3751
0x750387 expand_gimple_basic_block
../../../src/gcc/gcc/cfgexpand.c:5750
0x751ef7 execute
../../../src/gcc/gcc/cfgexpand.c:6357

[PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

2017-10-06 Thread Sudi Das



 Hi

This patch is a fix for PR 82440.
The predicates aarch64_reg_or_bic_imm and aarch64_reg_or_orr_imm missed out on
checking for a CONST_VECTOR before calling aarch64_simd_valid_immediate 
function.
Also I think James forgot to add the test cases in the original patch submitted.

Testing done : Checked for regressions on bootstrapped aarch64-none-linux-gnu.
Ok for trunk?

Thanks
Sudi

The ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-10-06  Sudakshina Das  

PR target/82440
* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Modified.
(aarch64_reg_or_bic_imm): Likewise.

*** gcc/testsuite/ChangeLog ***

2017-10-06  Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 887a13e..bf23b88 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -71,13 +71,15 @@
 
 (define_predicate "aarch64_reg_or_orr_imm"
(ior (match_operand 0 "register_operand")
-	(match_test "aarch64_simd_valid_immediate (op, mode, false,
-		   NULL, AARCH64_CHECK_ORR)")))
+	(and (match_code "const_vector")
+	 (match_test "aarch64_simd_valid_immediate (op, mode, false,
+		NULL, AARCH64_CHECK_ORR)"
 
 (define_predicate "aarch64_reg_or_bic_imm"
(ior (match_operand 0 "register_operand")
-	(match_test "aarch64_simd_valid_immediate (op, mode, false,
-		   NULL, AARCH64_CHECK_BIC)")))
+	(and (match_code "const_vector")
+	 (match_test "aarch64_simd_valid_immediate (op, mode, false,
+		NULL, AARCH64_CHECK_BIC)"
 
 (define_predicate "aarch64_fp_compare_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c b/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c
new file mode 100644
index 000..b14f009
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c
@@ -0,0 +1,56 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+
+/* Each function uses the correspoding 'CLASS' in
+   Marco CHECK (aarch64_simd_valid_immediate).  */
+
+void
+bic_6 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] &= ~(0xab);
+}
+
+void
+bic_7 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] &= ~(0xcd00);
+}
+
+void
+bic_8 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] &= ~(0xef);
+}
+
+void
+bic_9 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] &= ~(0x1200);
+}
+
+void
+bic_10 (short *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] &= ~(0x34);
+}
+
+
+void
+bic_11 (short *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] &= ~(0x5600);
+}
+
+
+/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #171" } } */
+/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #205, lsl #8" } } */
+/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #239, lsl #16" } } */
+/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #18, lsl #24" } } */
+/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.8h, #52" } } */
+/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.8h, #86, lsl #8" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/orr_imm_1.c b/gcc/testsuite/gcc.target/aarch64/orr_imm_1.c
new file mode 100644
index 000..ff6f683
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/orr_imm_1.c
@@ -0,0 +1,54 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+
+/* Each function uses the correspoding 'CLASS' in
+   Marco CHECK (aarch64_simd_valid_immediate).  */
+
+void
+orr_0 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] |= 0xab;
+}
+
+void
+orr_1 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] |= 0xcd00;
+}
+
+void
+orr_2 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] |= 0x00ef;
+}
+
+void
+orr_3 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] |= 0x1200;
+}
+
+void
+orr_4 (short *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] |= 0x00340034;
+}
+
+void
+orr_5 (int *a)
+{
+  for (int i = 0; i < 1024; i++)
+a[i] |= 0x56005600;
+}
+
+/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #171" } } */
+/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #205, lsl #8" } } */
+/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #239, lsl #16" } } */
+/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #18, lsl #24" } } */
+/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.8h, #52" } } */
+/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.8h, #86, lsl #8" } } */

Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

2017-10-06 Thread Sudi Das

Hi Jakub

I have modified the entries:

*** gcc/ChangeLog ***

2017-10-05  Sudakshina Das  

PR target/82440
* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Changed to
to only call aarch64_simd_valid_immediate on CONST_VECTORs.
(aarch64_reg_or_bic_imm): Likewise.

*** gcc/testsuite/ChangeLog ***

2017-10-05  Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise..

Thanks
Sudi

From: Jakub Jelinek 
Sent: Friday, October 6, 2017 11:11 AM
To: Sudi Das
Cc: gcc-patches@gcc.gnu.org; nd; sell...@cavium.com; Marcus Shawcroft; Richard 
Earnshaw; James Greenhalgh
Subject: Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

On Fri, Oct 06, 2017 at 09:52:35AM +, Sudi Das wrote:
> This patch is a fix for PR 82440.
> The predicates aarch64_reg_or_bic_imm and aarch64_reg_or_orr_imm missed out on
> checking for a CONST_VECTOR before calling aarch64_simd_valid_immediate 
> function.
> Also I think James forgot to add the test cases in the original patch 
> submitted.
> 
> Testing done : Checked for regressions on bootstrapped aarch64-none-linux-gnu.
> Ok for trunk?
> 
> Thanks
> Sudi
> 
> The ChangeLog entry is as follows:
> 
> *** gcc/ChangeLog ***
> 
> 2017-10-06  Sudakshina Das  
> 
>    PR target/82440
>    * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Modified.
>    (aarch64_reg_or_bic_imm): Likewise.

I'll defer the actual review to aarch64 maintainers, just want to say that
this is not a correct ChangeLog entry.  You should say what has changed, not
just that something has changed.  Something like
Only call aarch64_simd_valid_immediate on CONST_VECTORs.
or similar.

    Jakub

Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

2017-10-06 Thread Sudi Das

Hi Richard

*** gcc/ChangeLog ***

2017-10-05  Sudakshina Das  

PR target/82440
* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Only call
aarch64_simd_valid_immediate on CONST_VECTORs.
(aarch64_reg_or_bic_imm): Likewise.

*** gcc/testsuite/ChangeLog ***

2017-10-05  Sudakshina Das  

* gcc.target/aarch64/bic_imm_1.c: New test.
* gcc.target/aarch64/orr_imm_1.c: Likewise.


Also can someone please apply it for me. I do not have commit access.

Thanks
Sudi


From: Richard Earnshaw (lists) 
Sent: Friday, October 6, 2017 2:01 PM
To: Sudi Das; Jakub Jelinek
Cc: gcc-patches@gcc.gnu.org; nd; sell...@cavium.com; Marcus Shawcroft; James 
Greenhalgh
Subject: Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate
    
On 06/10/17 12:01, Sudi Das wrote:
> 
> Hi Jakub
> 
> I have modified the entries:
> 
> *** gcc/ChangeLog ***
> 
> 2017-10-05  Sudakshina Das  
> 
> PR target/82440
>    * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Changed to
>    to only call aarch64_simd_valid_immediate on CONST_VECTORs.

You don't need to say 'Changed to' (or even 'Changed to to' :-).  Simply
say  'Only call ...'.

>    (aarch64_reg_or_bic_imm): Likewise.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2017-10-05  Sudakshina Das  
> 
>    * gcc.target/aarch64/bic_imm_1.c: New test.
>    * gcc.target/aarch64/orr_imm_1.c: Likewise..

too many full stops.

OK with those nits fixed.

R.

> 
> 
> Thanks
> Sudi
> 
> 
> From: Jakub Jelinek 
> Sent: Friday, October 6, 2017 11:11 AM
> To: Sudi Das
> Cc: gcc-patches@gcc.gnu.org; nd; sell...@cavium.com; Marcus Shawcroft; 
> Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate
> 
> On Fri, Oct 06, 2017 at 09:52:35AM +, Sudi Das wrote:
>> This patch is a fix for PR 82440.
>> The predicates aarch64_reg_or_bic_imm and aarch64_reg_or_orr_imm missed out 
>> on
>> checking for a CONST_VECTOR before calling aarch64_simd_valid_immediate 
>> function.
>> Also I think James forgot to add the test cases in the original patch 
>> submitted.
>>
>> Testing done : Checked for regressions on bootstrapped 
>> aarch64-none-linux-gnu.
>> Ok for trunk?
>>
>> Thanks
>> Sudi
>>
>> The ChangeLog entry is as follows:
>>
>> *** gcc/ChangeLog ***
>>
>> 2017-10-06  Sudakshina Das  
>>
>> PR target/82440
>> * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Modified.
>> (aarch64_reg_or_bic_imm): Likewise.
> 
> I'll defer the actual review to aarch64 maintainers, just want to say that
> this is not a correct ChangeLog entry.  You should say what has changed, not
> just that something has changed.  Something like
> Only call aarch64_simd_valid_immediate on CONST_VECTORs.
> or similar.
> 
> Jakub
> 
>

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-10-06 Thread Sudi Das

Hi Jakub

As per the discussions, I have a created a bug report for the possible 
regression this may cause.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82454

Sudi

From: Wilco Dijkstra
Sent: Tuesday, September 26, 2017 2:20 PM
To: Sudi Das; Jakub Jelinek
Cc: Richard Biener; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)

Jakub Jelinek wrote:

> Well, we don't want to regress performance wise on one of the most important
> primary targets.  I don't care that much if the RTL/backend work is done
> together with the patch, or as a follow-up during stage1/3, but it should be
> done, the testcases I've posted can be used as a basis of a P1 runtime
> performance regression.

It should be sufficient to file a bug about inefficient 64-bit constant 
expansions on
x64. I didn't see a significant difference in my benchmarking of it on x64, so 
I'd say
it's only a performance regression if large benchmarks regress measurably (quite
unlikely).

Wilco

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-10-10 Thread Sudi Das



Thanks, I have made the changes to the patch.
Also can someone please apply it for me. I do not have commit access.

2017-10-10  Sudakshina Das  

PR middle-end/80131
* match.pd: Simplify 1 << (C - x) where C = precision (x) - 1.

2017-10-10  Sudakshina Das  

PR middle-end/80131
* testsuite/gcc.dg/pr80131-1.c: New Test.


With regards to the existing missed optimizations needed to the x86 RTL 
expansion, I think the discussions can take place on the bug report that I 
created and maybe someone will pick it up.

Thanks
Sudi




From: Wilco Dijkstra
Sent: Monday, October 9, 2017 2:02 PM
To: Richard Biener; Sudi Das
Cc: Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
Richard Biener wrote:

> I think the patch is ok with these changes but obviously we should try
> to address
> the code-generation issue on x86 at RTL expansion time.  They are sort-of
> existing missing optimizations.

Note the only x64 specific issue is the backend expansion of 64-bit immediates
which could be improved like I suggested. However what we're really missing
is a generic optimization pass that tries to simplify immediates using accurate
target costs. In eg. x >= C or x > C we can use either C or C-1 - on many 
targets
one option may be a single instruction, while the other might take 2 or even 3.

Wilcodiff --git a/gcc/match.pd b/gcc/match.pd
index e58a65a..7a25a1b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+   && wi::eq_p (@2, TYPE_PRECISION (type) - 1)
+   && single_use (@1))
+   (if (TYPE_UNSIGNED (type))
+ (rshift (lshift @0 @2) @3)
+   (with
+{ tree utype = unsigned_type_for (type); }
+(convert (rshift (lshift (convert:utype @0) @2) @3))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c
new file mode 100644
index 000..0bfe1f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80131-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Checks the simplification of:
+   1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1
+   f1 is not simplified but f2, f3 and f4 are. */
+
+__INT64_TYPE__ f1 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (31 - i);
+}
+
+__INT64_TYPE__ f2 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (63 - i);
+}
+
+__UINT64_TYPE__ f3 (__INT64_TYPE__ i)
+{
+  return (__UINT64_TYPE__)1 << (63 - i);
+}
+
+__INT32_TYPE__ f4 (__INT32_TYPE__ i)
+{
+  return (__INT32_TYPE__)1 << (31 - i);
+}
+
+/* { dg-final { scan-tree-dump-times "= 31 -"  1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */

[PATCH] [ARM] Cleanup macro TARGET_EITHER

2017-03-16 Thread Sudi Das


Hi all

This is a cleanup patch to remove the macro TARGET_EITHER. This macro seems to 
have become irrelevant in recent times since its previous definition had been 
commented out and replaced with 1.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Sudi

2017-03-10  Sudakshina Das  

* config/arm/arm.h (TARGET_EITHER): Delete.
* config/arm/arm.md: Delete instances of TARGET_EITHER.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index e95eda3..6afb54b 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -130,7 +130,6 @@ extern tree arm_fp16_type_node;
 #define TARGET_REALLY_IWMMXT2		(TARGET_IWMMXT2 && TARGET_32BIT)
 #define TARGET_IWMMXT_ABI (TARGET_32BIT && arm_abi == ARM_ABI_IWMMXT)
 #define TARGET_ARM  (! TARGET_THUMB)
-#define TARGET_EITHER			1 /* (TARGET_ARM | TARGET_THUMB) */
 #define TARGET_BACKTRACE	(crtl->is_leaf \
  ? TARGET_TPCS_LEAF_FRAME \
  : TARGET_TPCS_FRAME)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8720a71..3d9a09f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -448,7 +448,7 @@
 	  (plus:DI (match_operand:DI 1 "s_register_operand" "")
 	   (match_operand:DI 2 "arm_adddi_operand"  "")))
 (clobber (reg:CC CC_REGNUM))])]
-  "TARGET_EITHER"
+  ""
   "
   if (TARGET_THUMB1)
 {
@@ -577,7 +577,7 @@
   [(set (match_operand:SI  0 "s_register_operand" "")
 	(plus:SI (match_operand:SI 1 "s_register_operand" "")
 		 (match_operand:SI 2 "reg_or_int_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (TARGET_32BIT && CONST_INT_P (operands[2]))
 {
@@ -1254,7 +1254,7 @@
 	  (minus:DI (match_operand:DI 1 "s_register_operand" "")
 	(match_operand:DI 2 "s_register_operand" "")))
 (clobber (reg:CC CC_REGNUM))])]
-  "TARGET_EITHER"
+  ""
   "
   if (TARGET_THUMB1)
 {
@@ -1430,7 +1430,7 @@
   [(set (match_operand:SI   0 "s_register_operand" "")
 	(minus:SI (match_operand:SI 1 "reg_or_int_operand" "")
 		  (match_operand:SI 2 "s_register_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (CONST_INT_P (operands[1]))
 {
@@ -1566,7 +1566,7 @@
   [(set (match_operand:SI  0 "s_register_operand" "")
 	(mult:SI (match_operand:SI 2 "s_register_operand" "")
 		 (match_operand:SI 1 "s_register_operand" "")))]
-  "TARGET_EITHER"
+  ""
   ""
 )
 
@@ -2345,7 +2345,7 @@
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(and:SI (match_operand:SI 1 "s_register_operand" "")
 		(match_operand:SI 2 "reg_or_int_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (TARGET_32BIT)
 {
@@ -3210,7 +3210,7 @@
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(ior:SI (match_operand:SI 1 "s_register_operand" "")
 		(match_operand:SI 2 "reg_or_int_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (CONST_INT_P (operands[2]))
 {
@@ -3394,7 +3394,7 @@
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(xor:SI (match_operand:SI 1 "s_register_operand" "")
 		(match_operand:SI 2 "reg_or_int_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "if (CONST_INT_P (operands[2]))
 {
   if (TARGET_32BIT)
@@ -4052,7 +4052,7 @@
   [(set (match_operand:SI0 "s_register_operand" "")
 	(ashift:SI (match_operand:SI 1 "s_register_operand" "")
 		   (match_operand:SI 2 "arm_rhs_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (CONST_INT_P (operands[2])
   && (UINTVAL (operands[2])) > 31)
@@ -4121,7 +4121,7 @@
   [(set (match_operand:SI  0 "s_register_operand" "")
 	(ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
 		 (match_operand:SI 2 "arm_rhs_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (CONST_INT_P (operands[2])
   && UINTVAL (operands[2]) > 31)
@@ -4187,7 +4187,7 @@
   [(set (match_operand:SI  0 "s_register_operand" "")
 	(lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
 		 (match_operand:SI 2 "arm_rhs_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (CONST_INT_P (operands[2])
   && (UINTVAL (operands[2])) > 31)
@@ -4219,7 +4219,7 @@
   [(set (match_operand:SI  0 "s_register_operand" "")
 	(rotatert:SI (match_operand:SI 1 "s_register_operand" "")
 		 (match_operand:SI 2 "arm_rhs_operand" "")))]
-  "TARGET_EITHER"
+  ""
   "
   if (TARGET_32BIT)
 {
@@ -4677,7 +4677,7 @@
[(set (match_operand:DI 0 "s_register_operand" "")
 	 (neg:DI (match_operand:DI 1 "s_register_operand" "")))
 (clobber (reg:CC CC_REGNUM))])]
-  "TARGET_EITHER"
+  ""
   {
 if (TARGET_NEON)
   {
@@ -4730,7 +4730,7 @@
 (define_expand "negsi2"
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(neg:SI (match_operand:SI 1 "s_register_operand" "")))]
-  "TARGET_EITHER"
+  ""
   ""
 )
 
@@ -4882,7 +4882,7 @@
 [(set (match_operand:SI 0 "s_register_operand" "")
 	  (abs:SI (match_operand:SI 1 "s_register_operand" "")))
  (clobber (match_dup 2))])]
-  "TARGET_EITHER"
+  ""
   "
   if (TARGET_THUMB1)

[PATCH][AArch64] Allow CMP+SHIFT when comparing with zero

2017-03-16 Thread Sudi Das

Hi all

The backend pattern for combining a CMP+SHIFT was missing out on a case when 
comparing with zero. This was happening because aarch64_select_cc_mode 
(SELECT_CC_MODE) was not returning the correct mode (in this case CC_SWP) which 
was needed to identify the combine. This patch adds this missing case.

For the test case :


int f3 (int x, int y)
{
  int res = x << 3;
  return res != 0;
}

We are now generating (at -O2)

f3:
cmp wzr, w0, lsl 3
csetw0, ne
ret

instead of :

f3:
lsl w0, w0, 3
cmp w0, 0
csetw0, ne
ret


Added this new test and checked for regressions on bootstrapped 
aarch64-none-linux-gnu.
Ok for stage 1?

Thanks
Sudi


2017-03-10  Sudakshina Das  

* config/aarch64/aarch64.c (aarch64_select_cc_mode): Return CC_SWP for
comparision with zero.

2017-03-10  Sudakshina Das  

* gcc.target/aarch64/cmp_shifted_reg_1.c: New Test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 714bb79..01af2a7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4707,7 +4707,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
  the comparison will have to be swapped when we emit the assembly
  code.  */
   if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
-  && (REG_P (y) || GET_CODE (y) == SUBREG)
+  && (REG_P (y) || GET_CODE (y) == SUBREG || y == const0_rtx)
   && (GET_CODE (x) == ASHIFT || GET_CODE (x) == ASHIFTRT
 	  || GET_CODE (x) == LSHIFTRT
 	  || GET_CODE (x) == ZERO_EXTEND || GET_CODE (x) == SIGN_EXTEND))
diff --git a/gcc/testsuite/gcc.target/aarch64/cmp_shifted_reg_1.c b/gcc/testsuite/gcc.target/aarch64/cmp_shifted_reg_1.c
new file mode 100644
index 000..cacecf4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmp_shifted_reg_1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 " } */
+
+int f3 (int x, int y)
+{
+  int res = x << 3;
+  return res != 0;
+}
+
+/* We should combine the shift and compare */
+/* { dg-final { scan-assembler "cmp\.*\twzr, w\[0-9\]+, lsl 3" } } */

[PATCH][ARM] Fix more -Wreturn-type fallout

[PATCH][ARM] Fix test armv8_2-fp16-move-1.c

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

[COMMITTED, AARCH64, SVE TESTSUITE] Remove a couple of xfail from slp_5.c

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

[PATCH][AArch64] Fix missing optimization for CMP+AND

[PATCH][GCC] Simplification of 1U << (31 - x)

[PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern

[PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

[PATCH] [ARM] Cleanup macro TARGET_EITHER

[PATCH][AArch64] Allow CMP+SHIFT when comparing with zero

22 matches

Site Navigation

Mail list logo

Footer information