Ping [PATCHv2, rs6000] Merge two vector shift when their sources are the same

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi
  Gently ping this.
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612944.html

Thanks
Gui Haochen

在 2023/2/28 10:31, HAO CHEN GUI 写道:
> Hi,
>   This patch merges two "vsldoi" insns when their sources are the
> same. Particularly, it is simplified to be one move if the total
> shift is multiples of 16 bytes.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions.
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> 2023-02-28  Haochen Gui 
> 
> gcc/
>   * config/rs6000/altivec.md (*altivec_vsldoi_dup_): New
>   insn_and_split to merge two vsldoi when the sources are the same.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsldoi_merge.c: New.
> 
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 84660073f32..fae8ec2b2e8 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2529,6 +2529,35 @@ (define_insn "altivec_vsldoi_"
>"vsldoi %0,%1,%2,%3"
>[(set_attr "type" "vecperm")])
> 
> +(define_insn_and_split "*altivec_vsldoi_dup_"
> +  [(set (match_operand:VM 0 "register_operand" "=v")
> + (unspec:VM [(unspec:VM [(match_operand:VM 1 "register_operand" "v")
> + (match_dup 1)
> + (match_operand:QI 2 "immediate_operand" "i")]
> +UNSPEC_VSLDOI)
> + (unspec:VM [(match_dup 1)
> + (match_dup 1)
> + (match_dup 2)]
> +UNSPEC_VSLDOI)
> + (match_operand:QI 3 "immediate_operand" "i")]
> +UNSPEC_VSLDOI))]
> +  "TARGET_ALTIVEC"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  unsigned int shift1 = UINTVAL (operands[2]);
> +  unsigned int shift2 = UINTVAL (operands[3]);
> +
> +  unsigned int shift = (shift1 + shift2) % 16;
> +  if (shift)
> +emit_insn (gen_altivec_vsldoi_ (operands[0], operands[1],
> +   operands[1], GEN_INT (shift)));
> +  else
> +emit_move_insn (operands[0], operands[1]);
> +  DONE;
> +})
> +
>  (define_insn "altivec_vupkhs"
>[(set (match_operand:VP 0 "register_operand" "=v")
>   (unspec:VP [(match_operand: 1 "register_operand" "v")]
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsldoi_merge.c 
> b/gcc/testsuite/gcc.target/powerpc/vsldoi_merge.c
> new file mode 100644
> index 000..eebd7b4d382
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vsldoi_merge.c
> @@ -0,0 +1,59 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -save-temps" } */
> +
> +#include "altivec.h"
> +
> +#ifdef DEBUG
> +#include 
> +#endif
> +
> +void abort (void);
> +
> +__attribute__ ((noipa)) vector signed int
> +test1 (vector signed int a)
> +{
> +  a = vec_sld (a, a, 2);
> +  a = vec_sld (a, a, 6);
> +  return a;
> +}
> +
> +__attribute__ ((noipa)) vector signed int
> +test2 (vector signed int a)
> +{
> +  a = vec_sld (a, a, 14);
> +  a = vec_sld (a, a, 2);
> +  return a;
> +}
> +
> +int main (void)
> +{
> +  vector signed int a = {1,2,3,4};
> +  vector signed int result_a;
> +  int i;
> +
> +  result_a = test1 (a);
> +  vector signed int expect_a = {3,4,1,2};
> +
> +  for (i = 0; i< 4; i++)
> +if (result_a[i] != expect_a[i])
> +#ifdef DEBUG
> +  printf("ERROR: test1 result[%d] = %d, not expected[%d] = %d\n",
> +  i, result_a[i], i, expect_a[i]);
> +#else
> +  abort ();
> +#endif
> +
> +  result_a = test2 (a);
> +
> +  for (i = 0; i< 4; i++)
> +if (result_a[i] != a[i])
> +#ifdef DEBUG
> +  printf("ERROR: test2 result[%d] = %d, not expected[%d] = %d\n",
> +  i, result_a[i], i, a[i]);
> +#else
> +  abort ();
> +#endif
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvsldoi\M} 1 } } */


Ping^2 [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694]

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html

Thanks
Gui Haochen

在 2023/2/20 10:10, HAO CHEN GUI 写道:
> Hi,
>   Gently ping this:
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html
> 
> Gui Haochen
> Thanks
> 
> 在 2023/2/8 13:08, HAO CHEN GUI 写道:
>> Hi,
>>   The logical operations for TImode is split after reload pass right now. 
>> Some
>> potential optimizations miss as the split is too late. This patch removes
>> TImode from "AND", "IOR", "XOR" and "NOT" expander so that these logical
>> operations can be split at expand pass. The new test case illustrates the
>> optimization.
>>
>>   Two test cases of pr92398 are merged into one as all sub-targets generates
>> the same sequence of instructions with the patch.
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> 2023-02-08  Haochen Gui 
>>
>> gcc/
>>  PR target/100694
>>  * config/rs6000/rs6000.md (BOOL_128_V): New mode iterator for 128-bit
>>  vector types.
>>  (and3): Replace BOOL_128 with BOOL_128_V.
>>  (ior3): Likewise.
>>  (xor3): Likewise.
>>  (one_cmpl2 expander): New expander with BOOL_128_V.
>>  (one_cmpl2 insn_and_split): Rename to ...
>>  (*one_cmpl2): ... this.
>>
>> gcc/testsuite/
>>  PR target/100694
>>  * gcc.target/powerpc/pr100694.c: New.
>>  * gcc.target/powerpc/pr92398.c: New.
>>  * gcc.target/powerpc/pr92398.h: Remove.
>>  * gcc.target/powerpc/pr92398.p9-.c: Remove.
>>  * gcc.target/powerpc/pr92398.p9+.c: Remove.
>>
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index 4bd1dfd3da9..455b7329643 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -743,6 +743,15 @@ (define_mode_iterator BOOL_128  [TI
>>   (V2DF  "TARGET_ALTIVEC")
>>   (V1TI  "TARGET_ALTIVEC")])
>>
>> +;; Mode iterator for logical operations on 128-bit vector types
>> +(define_mode_iterator BOOL_128_V[(V16QI "TARGET_ALTIVEC")
>> + (V8HI  "TARGET_ALTIVEC")
>> + (V4SI  "TARGET_ALTIVEC")
>> + (V4SF  "TARGET_ALTIVEC")
>> + (V2DI  "TARGET_ALTIVEC")
>> + (V2DF  "TARGET_ALTIVEC")
>> + (V1TI  "TARGET_ALTIVEC")])
>> +
>>  ;; For the GPRs we use 3 constraints for register outputs, two that are the
>>  ;; same as the output register, and a third where the output register is an
>>  ;; early clobber, so we don't have to deal with register overlaps.  For the
>> @@ -7135,23 +7144,23 @@ (define_expand "subti3"
>>  ;; 128-bit logical operations expanders
>>
>>  (define_expand "and3"
>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>> -(and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>> -  (match_operand:BOOL_128 2 "vlogical_operand")))]
>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>> +(and:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>> +(match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>""
>>"")
>>
>>  (define_expand "ior3"
>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>> -(ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>> -  (match_operand:BOOL_128 2 "vlogical_operand")))]
>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>> +(ior:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>> +(match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>""
>>"")
>>
>>  (define_expand "xor3"
>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>> -(xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>> -  (match_operand:BOOL_128 2 "vlogical_operand")))]
>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>> +(xor:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>> +(match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>""
>>"")
>>
>> @@ -7449,7 +7458,14 @@ (define_insn_and_split "*eqv3_internal2"
>>   (const_string "16")))])
>>
>>  ;; 128-bit one's complement
>> -(define_insn_and_split "one_cmpl2"
>> +(define_expand "one_cmpl2"
>> +[(set (match_operand:BOOL_128_V 0 "vlogical_operand" "=")
>> +(not:BOOL_128_V
>> +  (match_operand:BOOL_128_V 1 "vlogical_operand" "")))]
>> +  ""
>> +  "")
>> +
>> +(define_insn_and_split "*one_cmpl2"
>>[(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
>>  (not:BOOL_128
>>(match_operand:BOOL_128 1 "vlogical_operand" "")))]
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100694.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr100694.c
>> new file mode 100644
>> index 

Re: [PATCH-4, rs6000] Change ilp32 target check for some scalar-extract-sig and scalar-insert-exp test cases

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gently ping this.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609372.html

Thanks
Gui Haochen

在 2023/1/4 14:17, HAO CHEN GUI 写道:
> Hi,
>   "ilp32" is used in these test cases to make sure test cases only run on a
> 32-bit environment. Unfortunately, these cases also run with
> "-m32/-mpowerpc64" which causes unexpected errors. This patch changes the
> target check to skip if "has_arch_ppc64" is set. So the test cases won't run
> when arch_ppc64 has been set.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Gui Haochen
> 
> ChangeLog
> 2023-01-03  Haochen Gui  
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Replace ilp32 check
>   with dg-skip-if has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> index 39ee74c94dc..148b5fbd9fa 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target ilp32 } */
> +/* { dg-skip-if "" { has_arch_ppc64 } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> index efd69725905..956c1183beb 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target ilp32 } */
> +/* { dg-skip-if "" { has_arch_ppc64 } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> index f85966a6fdf..9a7949fb89a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target ilp32 } */
> +/* { dg-skip-if "" { has_arch_ppc64 } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 


Re: [PATCH-3, rs6000] Change mode and insn condition for scalar insert exp instruction

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gently ping this.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609371.html

Thanks
Gui Haochen

在 2023/1/4 14:17, HAO CHEN GUI 写道:
> Hi,
>   This patch changes the mode of exponent to GPR in scalar insert exp
> pattern, as the exponent can be put into a 32-bit register. Also the
> condition check is changed from TARGET_64BIT to TARGET_POWERPC64.
> 
>   The test cases are modified according to the changes of expand pattern.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Gui Haochen
> 
> ChangeLog
> 2023-01-03  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_insert_exp): Replace bif-pattern from xsiexpdp
>   to xsiexpdp_di.
>   (__builtin_vsx_scalar_insert_exp_dp): Replace bif-pattern from
>   xsiexpdpf to xsiexpdpf_di.
>   * config/rs6000/vsx.md (xsiexpdp): Rename to...
>   (xsiexpdp_): ..., set the mode of second operand to GPR and
>   replace TARGET_64BIT with TARGET_POWERPC64.
>   (xsiexpdpf): Rename to...
>   (xsiexpdpf_): ..., set the mode of second operand to GPR and
>   replace TARGET_64BIT with TARGET_POWERPC64.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-insert-exp-0.c: Replace lp64 check
>   with has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-1.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-12.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-13.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-3.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-4.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 25647b7bdd2..b1b5002d7d9 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2854,10 +2854,10 @@
> 
>const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
>  unsigned long long);
> -VSIEDP xsiexpdp {}
> +VSIEDP xsiexpdp_di {}
> 
>const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned long 
> long);
> -VSIEDPF xsiexpdpf {}
> +VSIEDPF xsiexpdpf_di {}
> 
>pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>  XL_LEN_R xl_len_r {}
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 27e03a4cf6c..3376090cc6f 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5137,22 +5137,22 @@ (define_insn "xsiexpqp_"
>[(set_attr "type" "vecmove")])
> 
>  ;; VSX Scalar Insert Exponent Double-Precision
> -(define_insn "xsiexpdp"
> +(define_insn "xsiexpdp_"
>[(set (match_operand:DF 0 "vsx_register_operand" "=wa")
>   (unspec:DF [(match_operand:DI 1 "register_operand" "r")
> - (match_operand:DI 2 "register_operand" "r")]
> + (match_operand:GPR 2 "register_operand" "r")]
>UNSPEC_VSX_SIEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
>"xsiexpdp %x0,%1,%2"
>[(set_attr "type" "fpsimple")])
> 
>  ;; VSX Scalar Insert Exponent Double-Precision Floating Point Argument
> -(define_insn "xsiexpdpf"
> +(define_insn "xsiexpdpf_"
>[(set (match_operand:DF 0 "vsx_register_operand" "=wa")
>   (unspec:DF [(match_operand:DF 1 "register_operand" "r")
> - (match_operand:DI 2 "register_operand" "r")]
> + (match_operand:GPR 2 "register_operand" "r")]
>UNSPEC_VSX_SIEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
>"xsiexpdp %x0,%1,%2"
>[(set_attr "type" "fpsimple")])
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c
> index d8243258a67..88d77564158 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-0.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c
> index 8260b107178..2f219ddc83a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8" } */
> 

Ping [PATCH-2, rs6000] Change mode and insn condition for scalar extract sig instruction

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gently ping this.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609370.html

Thanks
Gui Haochen

在 2023/1/4 14:16, HAO CHEN GUI 写道:
> Hi,
>   This patch changes the return type of __builtin_vsx_scalar_extract_sig
> from const signed long to const signed long long, so that it can be called
> with "-m32/-mpowerpc64" option. The bif needs TARGET_POWERPC64 instead of
> TARGET_64BIT. So the condition check in the expander is changed.
> 
>   The test cases are modified according to the changes of expand pattern.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Gui Haochen
> 
> ChangeLog
> 2023-01-03  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_sig): Set return type to const signed
>   long long.
>   * config/rs6000/vsx.md (xsxsigdp): Replace TARGET_64BIT with
>   TARGET_POWERPC64.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-sig-0.c: Replace lp64 check
>   with has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-1.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-6.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index a8f1d3f1b3d..25647b7bdd2 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2849,7 +2849,7 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const signed long __builtin_vsx_scalar_extract_sig (double);
> +  const signed long long __builtin_vsx_scalar_extract_sig (double);
>  VSESDP xsxsigdp {}
> 
>const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 229c26c3a61..27e03a4cf6c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5111,7 +5111,7 @@ (define_insn "xsxsigdp"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (unspec:DI [(match_operand:DF 1 "vsx_register_operand" "wa")]
>UNSPEC_VSX_SXSIG))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64"
>"xsxsigdp %0,%x1"
>[(set_attr "type" "integer")])
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c
> index 637080652b7..d22f7d1b274 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-0.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c
> index f12eed3d9d5..64747d73a51 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c
> index c85072da138..561be53fb9b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-6.c
> @@ -1,7 +1,7 @@
>  /* { dg-do run { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target p9vector_hw } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  /* This test should succeed only on 64-bit configurations.  */
>  #include 


Ping [PATCH-1, rs6000] Change mode and insn condition for scalar extract exp instruction

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi,
  Gently ping this.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609369.html

Thanks
Gui Haochen

在 2023/1/4 14:16, HAO CHEN GUI 写道:
> Hi,
>   This patch changes the return type of __builtin_vsx_scalar_extract_exp
> from const signed long to const signed int, as the exponent can be put in
> a signed int. It is also inline with the external interface definition of
> the bif. The mode of exponent operand in "xsxexpdp" is changed to GPR mode
> and TARGET_64BIT check is removed, as the instruction can be executed on
> a 32-bit environment.
> 
>   The test cases are modified according to the changes of expand pattern.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Gui Haochen
> 
> ChangeLog
> 2022-12-23  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_exp): Set return type to const unsigned
>   int and set its bif-pattern to xsxexpdp_si, move it from power9-64 to
>   power9 catalog.
>   * config/rs6000/vsx.md (xsxexpdp): Rename to ...
>   (xsxexpdp_): ..., set mode of operand 0 to GPR and remove
>   TARGET_64BIT check.
>   * doc/extend.texi (scalar_extract_exp): Remove 64-bit environment
>   requirement when it has a 64-bit argument.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-exp-0.c: Remove lp64 check.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-1.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Deleted as the case is
>   invalid.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-6.c: Remove lp64 check.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f76f54793d7..a8f1d3f1b3d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2833,6 +2833,8 @@
>const signed int __builtin_dtstsfi_ov_td (const int<6>, _Decimal128);
>  TSTSFI_OV_TD dfptstsfi_unordered_td {}
> 
> +  const signed int  __builtin_vsx_scalar_extract_exp (double);
> +VSEEDP xsxexpdp_si {}
> 
>  [power9-64]
>void __builtin_altivec_xst_len_r (vsc, void *, long);
> @@ -2847,9 +2849,6 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const signed long __builtin_vsx_scalar_extract_exp (double);
> -VSEEDP xsxexpdp {}
> -
>const signed long __builtin_vsx_scalar_extract_sig (double);
>  VSESDP xsxsigdp {}
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 992fbc983be..229c26c3a61 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5089,11 +5089,11 @@ (define_insn "xsxexpqp_"
>[(set_attr "type" "vecmove")])
> 
>  ;; VSX Scalar Extract Exponent Double-Precision
> -(define_insn "xsxexpdp"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> - (unspec:DI [(match_operand:DF 1 "vsx_register_operand" "wa")]
> +(define_insn "xsxexpdp_"
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> + (unspec:GPR [(match_operand:DF 1 "vsx_register_operand" "wa")]
>UNSPEC_VSX_SXEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR"
>"xsxexpdp %0,%x1"
>[(set_attr "type" "integer")])
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d3812fa55b0..7c087967234 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -19598,7 +19598,10 @@ bool scalar_test_neg (double source);
>  bool scalar_test_neg (__ieee128 source);
>  @end smallexample
> 
> -The @code{scalar_extract_exp} and @code{scalar_extract_sig}
> +The @code{scalar_extract_exp} with a 64-bit source argument
> +function requires an environment supporting ISA 3.0 or later.
> +The @code{scalar_extract_exp} with a 128-bit source argument
> +and @code{scalar_extract_sig}
>  functions require a 64-bit environment supporting ISA 3.0 or later.
>  The @code{scalar_extract_exp} and @code{scalar_extract_sig} built-in
>  functions return the significand and the biased exponent value
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c
> index 35bf1b240f3..d971833748e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-0.c
> @@ -1,9 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> 
> -/* This test should succeed only on 64-bit configurations.  */
>  #include 
> 
>  unsigned int
> diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-1.c
> index 9737762c1d4..1cb438f9b70 100644
> --- 

PING^2 [PATCH, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2023-04-23 Thread HAO CHEN GUI via Gcc-patches
Hi,
   Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601909.html

Thanks
Gui Haochen

在 2022/12/14 13:30, HAO CHEN GUI 写道:
> Hi,
>Gentle ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601909.html
> 
> Thanks
> Gui Haochen
> 
> 在 2022/9/21 13:13, HAO CHEN GUI 写道:
>> Hi,
>>   This patch adds a new insn for vector splat with small V2DI constants on 
>> P8.
>> If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be 
>> loaded
>> with vspltisw and vupkhsw on P8. It should be efficient than loading vector 
>> from
>> TOC.
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>> Is this okay for trunk? Any recommendations? Thanks a lot.
>>
>> ChangeLog
>> 2022-09-21 Haochen Gui 
>>
>> gcc/
>>  PR target/104124
>>  * config/rs6000/altivec.md (*altivec_vupkhs_direct): Renamed
>>  to...
>>  (altivec_vupkhs_direct): ...this.
>>  * config/rs6000/constraints.md (wT constraint): New constant for a
>>  vector constraint that can be loaded with vspltisw and vupkhsw.
>>  * config/rs6000/predicates.md (vspltisw_constant_split): New
>>  predicate for wT constraint.
>>  * config/rs6000/rs6000-protos.h (vspltisw_constant_p): Add declaration.
>>  * config/rs6000/rs6000.cc (easy_altivec_constant): Call
>>  vspltisw_constant_p to judge if a V2DI constant can be synthesized with
>>  a vspltisw and a vupkhsw.
>>  * (vspltisw_constant_p): New function to return true if OP mode is
>>  V2DI and can be synthesized with ISA 2.07 instruction vupkhsw and
>>  vspltisw.
>>  * gcc/config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
>>  constants with vspltisw and vupkhsw.
>>
>> gcc/testsuite/
>>  PR target/104124
>>  * gcc.target/powerpc/p8-splat.c: New.
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 2c4940f2e21..185414df021 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs"
>>  }
>>[(set_attr "type" "vecperm")])
>>
>> -(define_insn "*altivec_vupkhs_direct"
>> +(define_insn "altivec_vupkhs_direct"
>>[(set (match_operand:VP 0 "register_operand" "=v")
>>  (unspec:VP [(match_operand: 1 "register_operand" "v")]
>>   UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
>> diff --git a/gcc/config/rs6000/constraints.md 
>> b/gcc/config/rs6000/constraints.md
>> index 5a44a92142e..f65dea6e0c7 100644
>> --- a/gcc/config/rs6000/constraints.md
>> +++ b/gcc/config/rs6000/constraints.md
>> @@ -150,6 +150,10 @@ (define_constraint "wS"
>>"@internal Vector constant that can be loaded with XXSPLTIB & sign 
>> extension."
>>(match_test "xxspltib_constant_split (op, mode)"))
>>
>> +(define_constraint "wT"
>> +  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
>> +  (match_test "vspltisw_constant_split (op, mode)"))
>> +
>>  ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update 
>> form.
>>  ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the 
>> multiple-of-four
>>  ;; offset is enforced for 32-bit too.
>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index b1fcc69bb60..00cf60bbe58 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -694,6 +694,19 @@ (define_predicate "xxspltib_constant_split"
>>return num_insns > 1;
>>  })
>>
>> +;; Return true if the operand is a constant that can be loaded with a 
>> vspltisw
>> +;; instruction and then a vupkhsw instruction.
>> +
>> +(define_predicate "vspltisw_constant_split"
>> +  (match_code "const_vector,vec_duplicate")
>> +{
>> +  int value = 32;
>> +
>> +  if (!vspltisw_constant_p (op, mode, ))
>> +return false;
>> +
>> +  return true;
>> +})
>>
>>  ;; Return 1 if the operand is constant that can loaded directly with a 
>> XXSPLTIB
>>  ;; instruction.
>> diff --git a/gcc/config/rs6000/rs6000-protos.h 
>> b/gcc/config/rs6000/rs6000-protos.h
>> index b3c16e7448d..45f3d044eee 100644
>> --- a/gcc/config/rs6000/rs6000-protos.h
>> +++ b/gcc/config/rs6000/rs6000-protos.h
>> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, 
>> rtx, int, int, int,
>>
>>  extern int easy_altivec_constant (rtx, machine_mode);
>>  extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>> +extern bool vspltisw_constant_p (rtx, machine_mode, int *);
>>  extern int vspltis_shifted (rtx);
>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..984624026c2 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -6292,6 +6292,12 @@ easy_altivec_constant (rtx op, machine_mode mode)
>>&& INTVAL (CONST_VECTOR_ELT (op, 1)) 

Re: Re: [PATCH] RISC-V: Fix redundant vmv1r.v instruction in vmsge.vx codegen

2023-04-23 Thread juzhe.zh...@rivai.ai
I can push codes yet. Can you push them for me?



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-22 04:42
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix redundant vmv1r.v instruction in vmsge.vx 
codegen
 
 
On 3/22/23 06:15, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> Current expansion of vmsge will make RA produce redundant vmv1r.v.
> 
> testcase:
> void f1 (void * in, void *out, int32_t x)
> {
>  vbool32_t mask = *(vbool32_t*)in;
>  asm volatile ("":::"memory");
>  vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
>  vint32m1_t v2 = __riscv_vle32_v_i32m1_m (mask, in, 4);
>  vbool32_t m3 = __riscv_vmsge_vx_i32m1_b32 (v, x, 4);
>  vbool32_t m4 = __riscv_vmsge_vx_i32m1_b32_mu (mask, m3, v, x, 4);
>  m4 = __riscv_vmsge_vv_i32m1_b32_m (m4, v2, v2, 4);
>  __riscv_vsm_v_b32 (out, m4, 4);
> }
> 
> Before this patch:
> f1:
>  vsetvli a5,zero,e8,mf4,ta,ma
>  vlm.v   v0,0(a0)
>  vsetivlizero,4,e32,m1,ta,mu
>  vle32.v v3,0(a0)
>  vle32.v v2,0(a0),v0.t
>  vmslt.vxv1,v3,a2
>  vmnot.m v1,v1
>  vmslt.vxv1,v3,a2,v0.t
>  vmxor.mmv1,v1,v0
>  vmv1r.v v0,v1
>  vmsge.vvv2,v2,v2,v0.t
>  vsm.v   v2,0(a1)
>  ret
> 
> After this patch:
> f1:
>  vsetvli a5,zero,e8,mf4,ta,ma
>  vlm.v   v0,0(a0)
>  vsetivlizero,4,e32,m1,ta,mu
>  vle32.v v3,0(a0)
>  vle32.v v2,0(a0),v0.t
>  vmslt.vxv1,v3,a2
>  vmnot.m v1,v1
>  vmslt.vxv1,v3,a2,v0.t
>  vmxor.mmv0,v1,v0
>  vmsge.vvv2,v2,v2,v0.t
>  vsm.v   v2,0(a1)
>  ret
> 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Fix redundant vmv1r.v.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/base/binop_vx_constraint-150.c: Adapt 
> assembly check.
OK.  Please push this to the trunk.
 
jeff
 


Re: Re: [PATCH] RISC-V: Fine tune gather load RA constraint

2023-04-23 Thread juzhe.zh...@rivai.ai
Adding  earlyclobber is to make dest operand do not overlap with source operand.
For example:
for gather load, vluxei.v v8,(a5),v8 is illegal according to RVV ISA.
GCC is using same way as LLVM which is also adding earlyclobber for modeling 
disabling overlap between dest and source operand.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-22 04:36
To: juzhe.zhong; gcc-patches
CC: kito.cheng
Subject: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
 
 
On 3/13/23 02:28, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> For DEST EEW < SOURCE EEW, we can partial overlap register
> according to RVV ISA.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Fix RA constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.
This is OK.
 
The one question I keep having when I read these patterns is why we have 
the earlyclobber.
 
Earlyclobber means that the output is potentially written before the 
inputs are consumed.   Typically for a single instruction pattern such 
constraints wouldn't make a lot of sense as *usually* the inputs are 
consumed before the output is written.
 
Just looking for a clarification as to why the earlyclobbers are needed 
at all, particularly for non-reduction patterns.
 
jeff
 


Re: Re: [GCC14 QUEUE PATCH] RISC-V: Optimize fault only first load

2023-04-23 Thread 钟居哲
Hi, Jeff.
I have fixed patches as you suggested:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616515.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616518.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616516.html 

Can you merge these patches?


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-04-22 11:18
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [GCC14 QUEUE PATCH] RISC-V: Optimize fault only first load
 
 
On 3/29/23 19:28, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (pass_vsetvl::cleanup_insns): Adapt 
> PASS.
This doesn't provide any useful information as far as I can tell. 
Perhaps something like:
Erase AVL from instructions with the fault first load property.
 
OK with a better ChangeLog entry.
 
Related.  As a separate patch, can you add a function comment to 
cleanup_insns?  It doesn't have one and it should.
 
Thanks,
jeff
 


Re: [Patch, fortran] PRs 105152, 100193, 87946, 103389, 104429 and 82774

2023-04-23 Thread Harald Anlauf via Gcc-patches

Hi Paul,

Am 22.04.23 um 10:32 schrieb Paul Richard Thomas via Gcc-patches:

Hi All,

As usual, I received a string of emails on retargeting for PRs for which I
was either responsible or was on the cc list. This time I decided to take a
look at them all, in order to reward the tireless efforts of Richi, Jakub
and Martin with some attention at least.

I have fixed the PRs in the title line: See the attached changelog, patch
and testcases.

OK for 14-branch?


the patch looks essentially good to me.

Can you please have a look at testcase pr100193.f90, which fails
for me because the module file is not generated and there is no
corresponding dg-pattern:

FAIL: gfortran.dg/pr100193.f90   -O  (test for excess errors)

=== gfortran Summary ===

# of expected passes1
# of unexpected failures1

You could either simply omit the main program or add a pattern.
(The shortened testcase would still fail w/o the patch.)


Of the others:
PR100815 - fixed already for 12-branch on. Martin located the fix from
Tobias, for which thanks. It's quite large but has stood the test of time.
Should I backport to 11-branch?
PR103366 - fixed on 12-branch on. I closed it.
PR103715 - might be fixed but the report is for gcc with checking enabled.
I will give that a go.
PR103716 - a gimple problem with assumed shape characters. A TODO.
PR103931 - I couldn't reproduce the bug, which involves 'ambiguous c_ptr'.
To judge by the comments, it seems that this bug is a bit elusive.
PR65381 - Seems to be fixed for 12-branch on
PR82064 - Seems to be fixed.
PR83209 - Coarray allocation - seems to be fixed.
PR84244 - Coarray segfault. I have no acquaintance with the inner works of
coarrays and so don't think that I can fix this one.
PR87674 - Segfault in runtime with non-overridable proc-pointer. A TODO.
PR96087 - A module procedure problem. A TODO.

I have dejagnu-ified testcases for the already fixed PRs ready to go.
Should these be committed or do we assume that the fixes already provided
adequate tests?


I think this depends.  A testcase that is "sufficiently orthogonal" to
those in the testsuite may be worth to be added.  Otherwise randomly
adding testcases might just increase the runtime for regression testing,
which could be counter-productive for the development process.  So
better really decide on a case-by-case basis?

(Of course this is only my opinion, and other may have a different view
upon this.)

I checked PR100815.  The testcase in comment#0 appears to work for me
for all open branches (10 to 14).  The commit that supposedly fixed
the issue applies to 12-branch and newer.  Either there is something
else in 11-branch which fixed it in a different way, or the bisecting
unfortunately pointed to the wrong commit.  And since there is no
traceback information in the PR, I am simply confused.
So do you think this testcase improves coverage and thus adds value?

PR103715: with valgrind I get invalid reads, so I guess there is
something lurking here and it only appears to be fixed.

PR103931: it is indeed a bit elusive, but very sensitive to code
changes.  Also Bernhard had a look at it.  Given that there are
a couple of bugs related to module reading, and rename-on-use,
I'd recommend to leave that open for further analysis.

PR65381: works for me even on 11-branch.  I think this looks very
much like a duplicate of a PR that was fixed by Tobias.  Still
fails on 10-branch, but might not be worth fixing there.  Simply
close it as 10-only?

PR103716: I think that one is interesting, as there are a couple
of PRs involving inquiry functions.


Regards

Paul


Cheers,
Harald



Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Uros Bizjak via Gcc-patches
On Sun, Apr 23, 2023 at 6:48 PM Segher Boessenkool
 wrote:
>
> This minimal patch enables LRA for all targets.  It does not clean up
> the target code, nor does it do anything to generic code: it just
> deletes all target definitions of TARGET_LRA_P.
>
> There are three kinds of changes:
>
> 1) Targets that already always have LRA, but that redefine the hook
> anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
> really changes for these targets with this patch (but later patches
> will delete the superfluous hook implementations).
> 2) Targets that have LRA selectable.  Some of those are probably fine,
> since they default to using LRA (arc, mips, s390).  Two others don't
> though, maybe because there are problems (ft32 and sh).  I'd love to
> hear from all targets in this category what the status is, how easy it
> was to convert, etc.
> 3) Targets that as of yet never used LRA.  Many of those will be fine,
> but some others will need a little tuning, and a few might need some
> actual improvements to LRA itself.  These are cris, epiphany, fr30,
> frv, h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze,
> mmix, mn10300, msp430, nvptx, pa, rl78, stormy16, and visium.  We'll
> find out how many of those targets are ever tested, and how many of
> those work with LRA without further changes, and how well.
>
> I send this patch now so that people can start testing.  I don't plan to
> commit this for another week at least, for a week after GCC 13 release I
> guess?  How does that plan sound to people?

An old patch to enable Alpha is at

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66207

Uros.


Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Segher Boessenkool
On Sun, Apr 23, 2023 at 07:56:56PM +0100, Maciej W. Rozycki wrote:
> On Sun, 23 Apr 2023, Segher Boessenkool wrote:
> > 1) Targets that already always have LRA, but that redefine the hook
> > anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
> > really changes for these targets with this patch (but later patches
> > will delete the superfluous hook implementations).
> 
>  Umm, no, VAX has LRA selectable and for a reason it defaults to off:

I read TARGET_LRA as something that denotes LRA is selected :-)  See the
other mail.  So, vax is in category 2).

>  There are extra ICEs in regression testing and code quality is poor; cf. 
> .  

Do you have something you can show for this?  Maybe in a PR?

And, are the ICEs in the generic code, or something vax-specific?


Segher


Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Segher Boessenkool
(You didn't leave me in Cc: on the reply.  Maybe you did a
reply-to-only-one-person?)

On Sun, Apr 23, 2023 at 11:01:05AM -0600, Jeff Law via Gcc-patches wrote:
> On 4/23/23 10:47, Segher Boessenkool wrote:
> >3) Targets that as of yet never used LRA.  Many of those will be fine,
> >but some others will need a little tuning, and a few might need some
> >actual improvements to LRA itself.  These are cris, epiphany, fr30,
> >frv, h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze,
> >mmix, mn10300, msp430, nvptx, pa, rl78, stormy16, and visium.  We'll
> >find out how many of those targets are ever tested, and how many of
> >those work with LRA without further changes, and how well.
> These test daily except for m68k and pa which test weekly ;-)

Ah, very good!  But still none of them are tested with LRA at all, after
so many years :-(


Segher


Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Segher Boessenkool
Hi!

On Sun, Apr 23, 2023 at 02:36:05PM -0400, Paul Koning wrote:
> > On Apr 23, 2023, at 12:47 PM, Segher Boessenkool 
> >  wrote:
> > 1) Targets that already always have LRA, but that redefine the hook
> > anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
> > really changes for these targets with this patch (but later patches
> > will delete the superfluous hook implementations).
> 
> I thought that the existing coding for pdp11 makes LRA selectable (via -mlra) 
> and defaults to off.

static bool
pdp11_lra_p (void)
{
  return TARGET_LRA;
}

Ah, that is a target flag, not an enum value or such?  Aha!

mlra
Target Mask(LRA)
Use LRA register allocator.

This is true for sparc and vax and xtensa as well, and rx with
TARGET_ENABLE_LRA.  gcn does in fact do
#define TARGET_LRA_P hook_bool_void_true
which is a funny way to have the same effect as not defining it at all.

So these five targets are category 2).  Thanks for correcting me!

> I had planned to change it to default to on but leave it selectable.  I 
> suppose just having it on is ok too, although the code from LRA wasn't as 
> efficient as the old last I looked (which is a while ago).

The plan is to delete old reload completely, with all follow-up
simplifications and cleanups.


Segher


[PATCH] PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

2023-04-23 Thread Roger Sayle

This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR.  The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:

(insn 7 4 8 2 (set (reg:HI 51)
(lshiftrt:HI (reg/v:HI 49 [ b ])
(const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
 (nil))

into a pair of QImode SUBREG assignments:

(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
(reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
(const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))

but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions.  The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.

To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).

This patch has been tested on x86_64-pc-linux-gnu with a make
bootstrap and make -k check, both 64-bit and 32-bit, with no
new regressions.  Many thanks to Jeff Law for testing this patch
on his build farm, which spotted an issue on xstormy16, which
should now be fixed by (either of) my recent xstormy16 patches.
Ok for mainline?


2023-04-23  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/109476
* lower-subreg.cc: Include explow.h for force_reg.
(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
If decomposing a suitable LSHIFTRT and we're not splitting
ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
instead of setting a high part SUBREG to zero, which helps combine.
(decompose_multiword_subregs): Update call to resolve_shift_zext.

gcc/testsuite/ChangeLog
PR rtl-optimization/109476
* gcc.target/avr/mmcu/pr109476.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 481e1e8..81fc5380 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgbuild.h"
 #include "dce.h"
 #include "expr.h"
+#include "explow.h"
 #include "tree-pass.h"
 #include "lower-subreg.h"
 #include "rtl-iter.h"
@@ -1299,11 +1300,12 @@ find_decomposable_shift_zext (rtx_insn *insn, bool 
speed_p)
 
 /* Decompose a more than word wide shift (in INSN) of a multiword
pseudo or a multiword zero-extend of a wordmode pseudo into a move
-   and 'set to zero' insn.  Return a pointer to the new insn when a
-   replacement was done.  */
+   and 'set to zero' insn.  SPEED_P says whether we are optimizing
+   for speed or size, when checking if a ZERO_EXTEND is preferable.
+   Return a pointer to the new insn when a replacement was done.  */
 
 static rtx_insn *
-resolve_shift_zext (rtx_insn *insn)
+resolve_shift_zext (rtx_insn *insn, bool speed_p)
 {
   rtx set;
   rtx op;
@@ -1378,14 +1380,29 @@ resolve_shift_zext (rtx_insn *insn)
dest_reg, GET_CODE (op) != ASHIFTRT);
 }
 
-  if (dest_reg != src_reg)
-emit_move_insn (dest_reg, src_reg);
-  if (GET_CODE (op) != ASHIFTRT)
-emit_move_insn (dest_upper, CONST0_RTX (word_mode));
-  else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
-emit_move_insn (dest_upper, copy_rtx (src_reg));
+  /* Consider using ZERO_EXTEND instead of setting DEST_UPPER to zero
+ if this is considered reasonable.  */
+  if (GET_CODE (op) == LSHIFTRT
+  && GET_MODE (op) == twice_word_mode
+  && REG_P (SET_DEST (set))
+  && !choices[speed_p].splitting_zext)
+{
+  rtx tmp = force_reg (word_mode, copy_rtx (src_reg));
+  tmp = simplify_gen_unary (ZERO_EXTEND, twice_word_mode, tmp, word_mode);
+  emit_move_insn (SET_DEST (set), tmp);
+}
   else
-emit_move_insn (dest_upper, upper_src);
+{
+  if (dest_reg != src_reg)
+   emit_move_insn (dest_reg, src_reg);
+  if (GET_CODE (op) != ASHIFTRT)
+   emit_move_insn (dest_upper, CONST0_RTX (word_mode));
+  else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
+   emit_move_insn (dest_upper, copy_rtx (src_reg));
+  else
+   emit_move_insn (dest_upper, upper_src);
+}
+
   insns = get_insns ();
 
   end_sequence ();
@@ -1670,7 +1687,7 @@ decompose_multiword_subregs (bool 

[r14-162 Regression] FAIL: gcc.dg/guality/pr90716.c -Os -DPREVENT_OPTIMIZATION line 23 j + 1 == 9 on Linux/x86_64

2023-04-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

cda246f8b421ba855a9e5f9d7bfcd0fc49b7bd4b is the first bad commit
commit cda246f8b421ba855a9e5f9d7bfcd0fc49b7bd4b
Author: Jan Hubicka 
Date:   Sat Apr 22 09:20:45 2023 +0200

Update loop estimate after header duplication

caused

FAIL: gcc.dg/guality/pr43051-1.c   -O2  -DPREVENT_OPTIMIZATION  line 34 c == 
[0]
FAIL: gcc.dg/guality/pr43051-1.c   -O2  -DPREVENT_OPTIMIZATION  line 39 c == 
[0]
FAIL: gcc.dg/guality/pr43051-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 34 c == [0]
FAIL: gcc.dg/guality/pr43051-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 39 c == [0]
FAIL: gcc.dg/guality/pr43051-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 34 c == [0]
FAIL: gcc.dg/guality/pr43051-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 39 c == [0]
FAIL: gcc.dg/guality/pr43051-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 34 c == 
[0]
FAIL: gcc.dg/guality/pr43051-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 39 c == 
[0]
FAIL: gcc.dg/guality/pr54693-2.c   -O1  -DPREVENT_OPTIMIZATION  line 21 i == v 
+ 1
FAIL: gcc.dg/guality/pr54693-2.c   -O2  -DPREVENT_OPTIMIZATION  line 21 i == v 
+ 1
FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 21 i == v + 1
FAIL: gcc.dg/guality/pr54693-2.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 21 i == 
v + 1
FAIL: gcc.dg/guality/pr54693.c   -O1  -DPREVENT_OPTIMIZATION  line 22 i == c - 
48
FAIL: gcc.dg/guality/pr54693.c   -O2  -DPREVENT_OPTIMIZATION  line 22 i == c - 
48
FAIL: gcc.dg/guality/pr54693.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 22 i == c - 48
FAIL: gcc.dg/guality/pr54693.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 22 i == c - 48
FAIL: gcc.dg/guality/pr54693.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  -DPREVENT_OPTIMIZATION  line 22 i == 
c - 48
FAIL: gcc.dg/guality/pr54693.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 22 i == c 
- 48
FAIL: gcc.dg/guality/pr54693.c   -Os  -DPREVENT_OPTIMIZATION  line 22 i == c - 
48
FAIL: gcc.dg/guality/pr89463.c   -O1  -DPREVENT_OPTIMIZATION  line 23 i + 1 == 7
FAIL: gcc.dg/guality/pr90074.c   -O1  -DPREVENT_OPTIMIZATION  line 28 c + 1 == 2
FAIL: gcc.dg/guality/pr90074.c   -O1  -DPREVENT_OPTIMIZATION  line 28 i + 1 == 8
FAIL: gcc.dg/guality/pr90074.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 28 c + 1 == 2
FAIL: gcc.dg/guality/pr90074.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 28 i + 1 == 8
FAIL: gcc.dg/guality/pr90716.c   -O1  -DPREVENT_OPTIMIZATION  line 23 j + 1 == 9
FAIL: gcc.dg/guality/pr90716.c   -O2  -DPREVENT_OPTIMIZATION  line 23 j + 1 == 9
FAIL: gcc.dg/guality/pr90716.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 23 j + 1 == 9
FAIL: gcc.dg/guality/pr90716.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 23 j + 1 == 9
FAIL: gcc.dg/guality/pr90716.c   -Os  -DPREVENT_OPTIMIZATION  line 23 j + 1 == 9

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-162/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr43051-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr43051-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54693.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr89463.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && 

Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Maciej W. Rozycki
On Sun, 23 Apr 2023, Segher Boessenkool wrote:

> 1) Targets that already always have LRA, but that redefine the hook
> anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
> really changes for these targets with this patch (but later patches
> will delete the superfluous hook implementations).

 Umm, no, VAX has LRA selectable and for a reason it defaults to off:

> diff --git a/gcc/config/vax/vax.cc b/gcc/config/vax/vax.cc
> index 82a176d3bfc9..17fdc4483797 100644
> --- a/gcc/config/vax/vax.cc
> +++ b/gcc/config/vax/vax.cc
> @@ -114,9 +114,6 @@ static HOST_WIDE_INT vax_starting_frame_offset (void);
>  #undef TARGET_STRUCT_VALUE_RTX
>  #define TARGET_STRUCT_VALUE_RTX vax_struct_value_rtx
>  
> -#undef TARGET_LRA_P
> -#define TARGET_LRA_P vax_lra_p
> -

 There are extra ICEs in regression testing and code quality is poor; cf. 
.  
I'll have a look into it sometime, but it may not be soon and the broken 
VAX exception unwinder is more important and will have to take precedence 
anyway.  I have fixing the unwinder scheduled hopefully for this release 
cycle, but LRA will have to wait.

  Maciej


Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Paul Koning via Gcc-patches



> On Apr 23, 2023, at 12:47 PM, Segher Boessenkool  
> wrote:
> 
> This minimal patch enables LRA for all targets.  It does not clean up
> the target code, nor does it do anything to generic code: it just
> deletes all target definitions of TARGET_LRA_P.
> 
> There are three kinds of changes:
> 
> 1) Targets that already always have LRA, but that redefine the hook
> anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
> really changes for these targets with this patch (but later patches
> will delete the superfluous hook implementations).

I thought that the existing coding for pdp11 makes LRA selectable (via -mlra) 
and defaults to off.  I had planned to change it to default to on but leave it 
selectable.  I suppose just having it on is ok too, although the code from LRA 
wasn't as efficient as the old last I looked (which is a while ago).

paul




Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Jeff Law via Gcc-patches




On 4/23/23 10:47, Segher Boessenkool wrote:

3) Targets that as of yet never used LRA.  Many of those will be fine,
but some others will need a little tuning, and a few might need some
actual improvements to LRA itself.  These are cris, epiphany, fr30,
frv, h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze,
mmix, mn10300, msp430, nvptx, pa, rl78, stormy16, and visium.  We'll
find out how many of those targets are ever tested, and how many of
those work with LRA without further changes, and how well.

These test daily except for m68k and pa which test weekly ;-)

Jeff


[PATCH] Turn on LRA on all targets

2023-04-23 Thread Segher Boessenkool
This minimal patch enables LRA for all targets.  It does not clean up
the target code, nor does it do anything to generic code: it just
deletes all target definitions of TARGET_LRA_P.

There are three kinds of changes:

1) Targets that already always have LRA, but that redefine the hook
anyway.  These are gcn, pdp11, rx, sparc, vax, and xtensa.  Nothing
really changes for these targets with this patch (but later patches
will delete the superfluous hook implementations).
2) Targets that have LRA selectable.  Some of those are probably fine,
since they default to using LRA (arc, mips, s390).  Two others don't
though, maybe because there are problems (ft32 and sh).  I'd love to
hear from all targets in this category what the status is, how easy it
was to convert, etc.
3) Targets that as of yet never used LRA.  Many of those will be fine,
but some others will need a little tuning, and a few might need some
actual improvements to LRA itself.  These are cris, epiphany, fr30,
frv, h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze,
mmix, mn10300, msp430, nvptx, pa, rl78, stormy16, and visium.  We'll
find out how many of those targets are ever tested, and how many of
those work with LRA without further changes, and how well.

I send this patch now so that people can start testing.  I don't plan to
commit this for another week at least, for a week after GCC 13 release I
guess?  How does that plan sound to people?

This is an RFC, so no changelog, no one can accidentally commit it :-)
I thought about Cc:ing lots and lots of people, should I do that?


Segher
---
 gcc/config/alpha/alpha.cc   | 3 ---
 gcc/config/arc/arc.cc   | 2 --
 gcc/config/avr/avr.cc   | 3 ---
 gcc/config/bfin/bfin.cc | 3 ---
 gcc/config/c6x/c6x.cc   | 3 ---
 gcc/config/cris/cris.cc | 3 ---
 gcc/config/epiphany/epiphany.cc | 2 --
 gcc/config/fr30/fr30.cc | 3 ---
 gcc/config/frv/frv.cc   | 3 ---
 gcc/config/ft32/ft32.cc | 3 ---
 gcc/config/gcn/gcn.cc   | 2 --
 gcc/config/h8300/h8300.cc   | 3 ---
 gcc/config/ia64/ia64.cc | 3 ---
 gcc/config/iq2000/iq2000.cc | 3 ---
 gcc/config/lm32/lm32.cc | 2 --
 gcc/config/m32c/m32c.cc | 3 ---
 gcc/config/m32r/m32r.cc | 3 ---
 gcc/config/m68k/m68k.cc | 3 ---
 gcc/config/mcore/mcore.cc   | 3 ---
 gcc/config/microblaze/microblaze.cc | 3 ---
 gcc/config/mips/mips.cc | 2 --
 gcc/config/mmix/mmix.cc | 3 ---
 gcc/config/mn10300/mn10300.cc   | 3 ---
 gcc/config/msp430/msp430.cc | 3 ---
 gcc/config/nvptx/nvptx.cc   | 3 ---
 gcc/config/pa/pa.cc | 3 ---
 gcc/config/pdp11/pdp11.cc   | 3 ---
 gcc/config/rl78/rl78.cc | 3 ---
 gcc/config/rx/rx.cc | 3 ---
 gcc/config/s390/s390.cc | 3 ---
 gcc/config/sh/sh.cc | 3 ---
 gcc/config/sparc/sparc.cc   | 3 ---
 gcc/config/stormy16/stormy16.cc | 3 ---
 gcc/config/vax/vax.cc   | 3 ---
 gcc/config/visium/visium.cc | 3 ---
 gcc/config/xtensa/xtensa.cc | 3 ---
 36 files changed, 103 deletions(-)

diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 1d826085198e..784348dbe787 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -10094,9 +10094,6 @@ alpha_can_change_mode_class (machine_mode from, 
machine_mode to,
 #define TARGET_MANGLE_TYPE alpha_mangle_type
 #endif
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P alpha_legitimate_address_p
 
diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 22eb2e9cb459..8cd5a794073f 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -804,8 +804,6 @@ static rtx arc_legitimize_address_0 (rtx, rtx, machine_mode 
mode);
 #define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P \
   arc_no_speculation_in_delay_slots_p
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P arc_lra_p
 #define TARGET_REGISTER_PRIORITY arc_register_priority
 /* Stores with scaled offsets have different displacement ranges.  */
 #define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P hook_bool_void_true
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index c193430cf073..f2a075381d47 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -14663,9 +14663,6 @@ avr_float_lib_compare_returns_bool (machine_mode mode, 
enum rtx_code)
 #undef  TARGET_CONVERT_TO_TYPE
 #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef  TARGET_ADDR_SPACE_SUBSET_P
 #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
 
diff --git a/gcc/config/bfin/bfin.cc b/gcc/config/bfin/bfin.cc
index c70d2281f068..565f817d2e77 100644
--- a/gcc/config/bfin/bfin.cc
+++ b/gcc/config/bfin/bfin.cc
@@ -5834,9 +5834,6 @@ 

[PATCH][committed] aarch64: Annotate fcvtn pattern for vec_concat with zeroes

2023-04-23 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Using the define_substs in aarch64-simd.md this is a straightforward annotation 
to remove
a redundant fmov insn.

So the codegen goes from:
foo_d:
fcvtn   v0.2s, v0.2d
fmovd0, d0
ret

to the simple:
foo_d:
fcvtn   v0.2s, v0.2d
ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_): Rename 
to...
(aarch64_float_truncate_lo_): ... This.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/float_truncate_zero.c: New test.


fcvtn.patch
Description: fcvtn.patch


[PATCH][committed] aarch64: Add vect_concat with zeroes annotation to addp pattern

2023-04-23 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Similar to others, the addp pattern can be safely annotated with  
to create
the implicit vec_concat-with-zero variants.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_addp): Rename to...
(aarch64_addp): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for vpadd intrinsics.


addp.patch
Description: addp.patch


[PATCH] LoongArch: Enable shrink wrapping

2023-04-23 Thread Xi Ruoyao via Gcc-patches
This commit implements the target macros for shrink wrapping of function
prologues/epilogues shrink wrapping on LoongArch.

Bootstrapped and regtested on loongarch64-linux-gnu.  I don't have an
access to SPEC CPU so I hope the reviewer can perform a benchmark to see
if there is real benefit.

gcc/ChangeLog:

* config/loongarch/loongarch.h (struct machine_function): Add
reg_is_wrapped_separately array for register wrapping
information.
* config/loongarch/loongarch.cc
(loongarch_get_separate_components): New function.
(loongarch_components_for_bb): Likewise.
(loongarch_disqualify_components): Likewise.
(loongarch_process_components): Likewise.
(loongarch_emit_prologue_components): Likewise.
(loongarch_emit_epilogue_components): Likewise.
(loongarch_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
(TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
(TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
(TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
(loongarch_for_each_saved_reg): Skip registers that are wrapped
separately.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/shrink-wrap.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 179 +-
 gcc/config/loongarch/loongarch.h  |   2 +
 .../gcc.target/loongarch/shrink-wrap.c|  22 +++
 3 files changed, 200 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/shrink-wrap.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e523fcb6b7f..d0024237a6a 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "rtl-iter.h"
 #include "opts.h"
+#include "function-abi.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1017,19 +1018,23 @@ loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
   {
-   loongarch_save_restore_reg (word_mode, regno, offset, fn);
+   if (!cfun->machine->reg_is_wrapped_separately[regno])
+ loongarch_save_restore_reg (word_mode, regno, offset, fn);
+
offset -= UNITS_PER_WORD;
   }
 
   /* This loop must iterate over the same space as its companion in
  loongarch_compute_frame_info.  */
   offset = cfun->machine->frame.fp_sp_offset - sp_offset;
+  machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
+
   for (int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
   {
-   machine_mode mode = TARGET_DOUBLE_FLOAT ? DFmode : SFmode;
+   if (!cfun->machine->reg_is_wrapped_separately[regno])
+ loongarch_save_restore_reg (word_mode, regno, offset, fn);
 
-   loongarch_save_restore_reg (mode, regno, offset, fn);
offset -= GET_MODE_SIZE (mode);
   }
 }
@@ -6644,6 +6649,151 @@ loongarch_asan_shadow_offset (void)
   return TARGET_64BIT ? (HOST_WIDE_INT_1 << 46) : 0;
 }
 
+static sbitmap
+loongarch_get_separate_components (void)
+{
+  HOST_WIDE_INT offset;
+  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
+  bitmap_clear (components);
+  offset = cfun->machine->frame.gp_sp_offset;
+
+  /* The stack should be aligned to 16-bytes boundary, so we can make the use
+ of ldptr instructions.  */
+  gcc_assert (offset % UNITS_PER_WORD == 0);
+
+  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
+if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
+  {
+   /* We can wrap general registers saved at [sp, sp + 32768) using the
+  ldptr/stptr instructions.  For large offsets a pseudo register
+  might be needed which cannot be created during the shrink
+  wrapping pass.
+
+  TODO: This may need a revise when we add LA32 as ldptr.w is not
+  guaranteed available by the manual.  */
+   if (offset < 32768)
+ bitmap_set_bit (components, regno);
+
+   offset -= UNITS_PER_WORD;
+  }
+
+  offset = cfun->machine->frame.fp_sp_offset;
+  for (unsigned int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
+if (BITSET_P (cfun->machine->frame.fmask, regno - FP_REG_FIRST))
+  {
+   /* We can only wrap FP registers with imm12 offsets.  For large
+  offsets a pseudo register might be needed which cannot be
+  created during the shrink wrapping pass.  */
+   if (IMM12_OPERAND (offset))
+ bitmap_set_bit (components, regno);
+
+   offset -= 

[PATCH V2] RISC-V: Eliminate redundant vsetvli for duplicate AVL def

2023-04-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is the V2 
patch:https://patchwork.sourceware.org/project/gcc/patch/20230328010124.235703-1-juzhe.zh...@rivai.ai/

Address comments from Jeff. Add comments for all_avail_in_compatible_p and 
refine comments of codes.
 
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::all_avail_in_compatible_p): New function.
(pass_vsetvl::refine_vsetvls): Optimize vsetvls.
* config/riscv/riscv-vsetvl.h: New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 44 +--
 gcc/config/riscv/riscv-vsetvl.h   |  1 +
 .../riscv/rvv/vsetvl/avl_single-102.c | 16 +++
 3 files changed, 58 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index fa68b8a0462..89a45a428a4 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2446,6 +2446,26 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) 
const
   return true;
 }
 
+/* Return TRUE if the incoming vector configuration state
+   to CFG_BB is compatible with the vector configuration
+   state in CFG_BB, FALSE otherwise.  */
+bool
+vector_infos_manager::all_avail_in_compatible_p (const basic_block cfg_bb) 
const
+{
+  const auto  = vector_block_infos[cfg_bb->index].local_dem;
+  sbitmap avin = vector_avin[cfg_bb->index];
+  unsigned int bb_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
+  {
+const auto _info
+  = static_cast (*vector_exprs[bb_index]);
+if (!info.compatible_p (avin_info))
+  return false;
+  }
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_avl_p (const basic_block cfg_bb,
  sbitmap bitdata) const
@@ -3816,9 +3836,27 @@ pass_vsetvl::refine_vsetvls (void) const
  m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
}
-  rinsn = PREV_INSN (rinsn);
-  rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
-  change_insn (rinsn, new_pat);
+
+  /* If all incoming edges to a block have a vector state that is 
compatbile
+with the block. In such a case we need not emit a vsetvl in the current
+block.  */
+
+  gcc_assert (has_vtype_op (insn->rtl ()));
+  rinsn = PREV_INSN (insn->rtl ());
+  gcc_assert (vector_config_insn_p (PREV_INSN (insn->rtl (;
+  if (m_vector_manager->all_avail_in_compatible_p (cfg_bb))
+   {
+ size_t id = m_vector_manager->get_expr_id (info);
+ if (bitmap_bit_p (m_vector_manager->vector_del[cfg_bb->index], id))
+   continue;
+ eliminate_insn (rinsn);
+   }
+  else
+   {
+ rtx new_pat
+   = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
+ change_insn (rinsn, new_pat);
+   }
 }
 }
 
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 9041eee1281..d7a6c14e931 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -452,6 +452,7 @@ public:
   bool all_same_ratio_p (sbitmap) const;
 
   bool all_empty_predecessor_p (const basic_block) const;
+  bool all_avail_in_compatible_p (const basic_block) const;
 
   void release (void);
   void create_bitmap_vectors (void);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
new file mode 100644
index 000..8236d4e7f18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns 
-fno-schedule-insns2 -fno-tree-vectorize -frename-registers" } */
+
+#include "riscv_vector.h"
+
+void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
+{
+  vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
+  for (int i = 0; i < n; i++){
+v = __riscv_vor_vx_i8mf4 (v, 101, 32);
+v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
+  }
+  __riscv_vse8_v_i8mf4 (out, v, 32);
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" 
no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
-- 
2.36.1



[PATCH] RISC-V: Eliminate redundant vsetvli for duplicate AVL def

2023-04-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is the V2 
patch:https://patchwork.sourceware.org/project/gcc/patch/20230328010124.235703-1-juzhe.zh...@rivai.ai/

Address comments from Jeff. Add comments for all_avail_in_compatible_p and 
refine comments of codes.
 
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::all_avail_in_compatible_p): New function.
(pass_vsetvl::refine_vsetvls): Optimize vsetvls.
* config/riscv/riscv-vsetvl.h: New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 44 +--
 gcc/config/riscv/riscv-vsetvl.h   |  1 +
 .../riscv/rvv/vsetvl/avl_single-102.c | 16 +++
 3 files changed, 58 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index fa68b8a0462..89a45a428a4 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2446,6 +2446,26 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) 
const
   return true;
 }
 
+/* Return TRUE if the incoming vector configuration state
+   to CFG_BB is compatible with the vector configuration
+   state in CFG_BB, FALSE otherwise.  */
+bool
+vector_infos_manager::all_avail_in_compatible_p (const basic_block cfg_bb) 
const
+{
+  const auto  = vector_block_infos[cfg_bb->index].local_dem;
+  sbitmap avin = vector_avin[cfg_bb->index];
+  unsigned int bb_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
+  {
+const auto _info
+  = static_cast (*vector_exprs[bb_index]);
+if (!info.compatible_p (avin_info))
+  return false;
+  }
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_avl_p (const basic_block cfg_bb,
  sbitmap bitdata) const
@@ -3816,9 +3836,27 @@ pass_vsetvl::refine_vsetvls (void) const
  m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
}
-  rinsn = PREV_INSN (rinsn);
-  rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
-  change_insn (rinsn, new_pat);
+
+  /* If all incoming edges to a block have a vector state that is 
compatbile
+with the block. In such a case we need not emit a vsetvl in the current
+block.  */
+
+  gcc_assert (has_vtype_op (insn->rtl ()));
+  rinsn = PREV_INSN (insn->rtl ());
+  gcc_assert (vector_config_insn_p (PREV_INSN (insn->rtl (;
+  if (m_vector_manager->all_avail_in_compatible_p (cfg_bb))
+   {
+ size_t id = m_vector_manager->get_expr_id (info);
+ if (bitmap_bit_p (m_vector_manager->vector_del[cfg_bb->index], id))
+   continue;
+ eliminate_insn (rinsn);
+   }
+  else
+   {
+ rtx new_pat
+   = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
+ change_insn (rinsn, new_pat);
+   }
 }
 }
 
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 9041eee1281..d7a6c14e931 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -452,6 +452,7 @@ public:
   bool all_same_ratio_p (sbitmap) const;
 
   bool all_empty_predecessor_p (const basic_block) const;
+  bool all_avail_in_compatible_p (const basic_block) const;
 
   void release (void);
   void create_bitmap_vectors (void);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
new file mode 100644
index 000..8236d4e7f18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns 
-fno-schedule-insns2 -fno-tree-vectorize -frename-registers" } */
+
+#include "riscv_vector.h"
+
+void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
+{
+  vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
+  for (int i = 0; i < n; i++){
+v = __riscv_vor_vx_i8mf4 (v, 101, 32);
+v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
+  }
+  __riscv_vse8_v_i8mf4 (out, v, 32);
+}
+
+/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" 
no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
-- 
2.36.1



[PATCH] RISC-V: Add function comment for cleanup_insns.

2023-04-23 Thread juzhe . zhong
From: Juzhe-Zhong 

Address Jeff's comment: 
https://patchwork.sourceware.org/project/gcc/patch/20230330012804.110539-1-juzhe.zh...@rivai.ai/
Add a function comment.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::pre_vsetvl): Add function 
comment for cleanup_insns.

---
 gcc/config/riscv/riscv-vsetvl.cc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ac99028df43..fa68b8a0462 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3998,6 +3998,21 @@ pass_vsetvl::pre_vsetvl (void)
 commit_edge_insertions ();
 }
 
+/* Before VSETVL PASS, RVV instructions pattern is depending on AVL operand
+   implicitly. Since we will emit VSETVL instruction and make RVV instructions
+   depending on VL/VTYPE global status registers, we remove the such AVL 
operand
+   in the RVV instructions pattern here in order to remove AVL dependencies 
when
+   AVL operand is a register operand.
+
+   Before the VSETVL PASS:
+ li a5,32
+ ...
+ vadd.vv (..., a5)
+   After the VSETVL PASS:
+ li a5,32
+ vsetvli zero, a5, ...
+ ...
+ vadd.vv (..., const_int 0).  */
 void
 pass_vsetvl::cleanup_insns (void) const
 {
-- 
2.36.1



[PATCH V2] RISC-V: Optimize fault only first load

2023-04-23 Thread juzhe . zhong
From: Juzhe-Zhong 

V2 patch for: 
https://patchwork.sourceware.org/project/gcc/patch/20230330012804.110539-1-juzhe.zh...@rivai.ai/
which has been reviewed.

This patch address Jeff's comment, refine ChangeLog to give more
clear information.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: New unspec to refine fault first 
load pattern.
* config/riscv/vector.md: Refine fault first load pattern to erase avl 
from instructions
  with the fault first load property.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/ffload-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-5.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-6.c: New test.
* gcc.target/riscv/rvv/vsetvl/ffload-7.c: New test.

---
 gcc/config/riscv/vector-iterators.md  |  1 +
 gcc/config/riscv/vector.md| 10 +-
 .../gcc.target/riscv/rvv/vsetvl/ffload-1.c| 21 
 .../gcc.target/riscv/rvv/vsetvl/ffload-2.c| 28 
 .../gcc.target/riscv/rvv/vsetvl/ffload-3.c| 28 
 .../gcc.target/riscv/rvv/vsetvl/ffload-5.c| 29 +
 .../gcc.target/riscv/rvv/vsetvl/ffload-6.c| 29 +
 .../gcc.target/riscv/rvv/vsetvl/ffload-7.c| 32 +++
 8 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-7.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 3c6575208be..a8e856161d3 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -80,6 +80,7 @@
   UNSPEC_VRGATHEREI16
   UNSPEC_VCOMPRESS
   UNSPEC_VLEFF
+  UNSPEC_MODIFY_VL
 ])
 
 (define_mode_iterator V [
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0fda11ed67d..959afac2283 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7414,7 +7414,15 @@
  (unspec:V
[(match_operand:V 3 "memory_operand" "m, m, m,  
   m")] UNSPEC_VLEFF)
  (match_operand:V 2 "vector_merge_operand"  "   vu, 0,vu,  
   0")))
-   (set (reg:SI VL_REGNUM) (unspec:SI [(match_dup 0)] UNSPEC_VLEFF))]
+   (set (reg:SI VL_REGNUM)
+ (unspec:SI
+   [(if_then_else:V
+  (unspec:
+   [(match_dup 1) (match_dup 4) (match_dup 5)
+(match_dup 6) (match_dup 7)
+(reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (unspec:V [(match_dup 3)] UNSPEC_VLEFF)
+  (match_dup 2))] UNSPEC_MODIFY_VL))]
   "TARGET_VECTOR"
   "vleff.v\t%0,%3%p1"
   [(set_attr "type" "vldff")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-1.c
new file mode 100644
index 000..b2b7eafa945
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize 
-fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+void f (int8_t * restrict in, int8_t * restrict out, int n, int cond,size_t 
*new_vl,size_t *new_vl2)
+{
+  size_t vl = 101;
+  
+  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in, vl);
+  __riscv_vse8_v_i8mf8 (out, v, vl);
+  vbool64_t mask = __riscv_vlm_v_b64 (in + 100, vl);
+  vint8mf8_t v2 = __riscv_vle8ff_v_i8mf8_tumu (mask, v, in + 100, new_vl, vl);
+  __riscv_vse8_v_i8mf8 (out + 100, v2, *new_vl);
+  v2 = __riscv_vle8ff_v_i8mf8_tumu (mask, v2, in + 200, new_vl2, vl);
+  __riscv_vse8_v_i8mf8 (out + 200, v2, *new_vl2);
+}
+
+/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*mu} 2 { target { no-opts 
"-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {csrr} 2 { target { no-opts "-O0" no-opts 
"-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-not {vmv} { target { no-opts "-O0" no-opts "-g" 
no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-2.c
new file mode 100644
index 000..c0e21d461e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/ffload-2.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize 
-fno-schedule-insns -fno-schedule-insns2" } */
+

RE: [xstormy16] Add extendhisi2 and zero_extendhisi2 patterns to stormy16.md

2023-04-23 Thread Roger Sayle
On 4/33/23, Jeff Law wrote:
> On 4/22/23 14:57, Roger Sayle wrote:
> > Whilst there, I also fixed the instruction lengths and formatting of
> > the zero_extendqihi2 pattern.  Then, mostly for documentation purposes
> > as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255"
> > alternative to zero_extendqihi2 that takes advantage of its efficient
> > instruction encoding.
> >
> > This patch has been tested by building a cross-compiler to
> > xstormy16-elf on x86_64-pc-linux-gnu, and confirming that the new test
> > case passes with "make -k check-gcc".  Ok for mainline?
> >
> >
> > 2023-04-22  Roger Sayle  
> >
> > gcc/ChangeLog
> >  * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h
> >  format specifier to output high_part register name of SImode reg.
> >  * config/stormy16/stormy16.md (extendhisi2): New define_insn.
> >  (zero_extendqihi2): Fix lengths, consistent formatting and add
> >  "and Rx,#255" alternative, for documentation purposes.
> >  (zero_extendhisi2): New define_insn.
> >
> > gcc/testsuite/ChangeLog
> >  * gcc.target/xstormy16/extendhisi2.c: New test case.
> >  * gcc.target/xstormy16/zextendhisi2.c: Likewise.
> Does the "T" alternative ever match?  AFAICT its constraint check always
> fails:
> 
> > (define_constraint "T"
> >   "@internal"
> >   ;; For Rx; not implemented yet.
> >   (match_test "0"))
> 
> No objections, but just not sure what's going on with that T constraint.

This is an interesting/cool artifact of the xstormy16 architecture/instruction 
set
that isn't yet (fully) supported in GCC, but much of the infrastructure is in 
place.
Instructions on xstormy16 are encoded by either one or two 16-bit words.
If an immediate constant is between 0..15, arithmetic instructions can be
encoded in a single word, otherwise they require two words, with a full
16-bit immediate constant in the second word.  The possibly unique feature
of xstormy is an "Rx" addressing mode, that can be used when the destination
register is the same destination as the previous instruction, which by 
implicitly
encoding DEST, allows 8-bit immediate constants, to be encoded in a single word
instruction.

Handling this dependency between instructions is tricky, with Rx (aka Rpsw) 
depending
upon the N0..N3 bits in the flags register, and these bits being 
modified/updated
by almost every instruction.  The 'T' constraint is a placeholder, that 
currently
always returns false but in theory allows the register allocator to 
identify/select
this alternative, and the psw_operand attribute on each instruction indicating
how it updates N0..N3 (DEST) bits in the processor status word (PSW).

This feature is particularly useful for zero extension from QI to HI mode.  This
normally requires a shl/shr sequence, but when the register being extended was
modified in the preceding instruction, the single word instruction "and Rx,#255"
can be used.

Currently, for
unsigned char foo(unsigned char x) { return ~x; }
GCC -O2 generates:
foo:not r2
shl r2,#8 | shr r2,#8
ret
but more optimally could use:
foo:not r2  // Rx now means r2
and Rx,#255  // shorter than and r2,#255
ret


I doubt this functionality will be supported by the register allocator and/or 
scheduler
any time soon, but there's plenty that can be done with "macro instructions", 
for
example, a hypothetical "*onecmplqi_zexthi" which matches the above RTL,
something  (zero_extend:HI (not:QI (match_operand:QI "register_operand")),
could emit "not %0 | and Rx,#255".

Presumably, the semantics of "Rx" are correctly supported by the xstormy16 
simulator?

I hope this helps explain things (as I understand them).
Thanks again for your help.
Roger
--




[COMMITTED] Handle NANs in frange::operator== [PR109593]

2023-04-23 Thread Aldy Hernandez via Gcc-patches
This patch...
commit 10e481b154c5fc63e6ce4b449ce86cecb87a6015
Return true from operator== for two identical ranges containing NAN.

removed the check for NANs, which caused us to read from m_min and
m_max which are undefined for NANs.

gcc/ChangeLog:

PR tree-optimization/109593
* value-range.cc (frange::operator==): Handle NANs.
---
 gcc/value-range.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 17f4e1b9f59..97162413727 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -682,6 +682,16 @@ frange::operator== (const frange ) const
   if (varying_p ())
return types_compatible_p (m_type, src.m_type);
 
+  bool nan1 = known_isnan ();
+  bool nan2 = src.known_isnan ();
+  if (nan1 || nan2)
+   {
+ if (nan1 && nan2)
+   return (m_pos_nan == src.m_pos_nan
+   && m_neg_nan == src.m_neg_nan);
+ return false;
+   }
+
   return (real_identical (_min, _min)
  && real_identical (_max, _max)
  && m_pos_nan == src.m_pos_nan
-- 
2.40.0



[PATCH] PoC: add -Wunused-result=strict

2023-04-23 Thread Andrew Church
As requested in ,
this is a proof-of-concept patch to change -Wunused-result to not warn
about return values explicitly discarded by casting to void, and add
-Wunused-result=strict for the current behavior (warn even on void casts).
The core behavior change is based on an earlier patch at
; it appears to
do the correct thing based on the tests added by the patch, but it also
breaks a number of other attribute- and analyzer-related tests, so some
fine-tuning is probably needed (and I don't have the GCC internals
knowledge to know where to start).

I haven't touched [[nodiscard]] behavior, since GCC already ignores
explicit discards in that case; for consistency's sake, it might make
sense to also warn for explicitly discarded [[nodiscard]] values with
-Wunused-result=strict.

Please CC me on any replies, as I am not subscribed to the list.

  --Andrew Church
https://achurch.org/

diff -urN gcc-12.2.0-orig/gcc/c-family/c.opt gcc-12.2.0/gcc/c-family/c.opt
--- gcc-12.2.0-orig/gcc/c-family/c.opt  2022-08-19 17:09:52 +0900
+++ gcc-12.2.0/gcc/c-family/c.opt   2023-04-23 03:39:58 +0900
@@ -1361,6 +1361,10 @@
 C ObjC C++ ObjC++ Var(warn_unused_result) Init(1) Warning
 Warn if a caller of a function, marked with attribute warn_unused_result, does 
not use its return value.
 
+Wunused-result=strict
+C ObjC C++ ObjC++ RejectNegative Var(warn_unused_result,2) Init(1) Warning
+Warn if a caller of a function, marked with attribute warn_unused_result, does 
not use its return value.
+
 Wunused-variable
 C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wunused)
 ; documented in common.opt
diff -urN gcc-12.2.0-orig/gcc/doc/invoke.texi gcc-12.2.0/gcc/doc/invoke.texi
--- gcc-12.2.0-orig/gcc/doc/invoke.texi 2022-08-19 17:09:52 +0900
+++ gcc-12.2.0/gcc/doc/invoke.texi  2023-04-23 03:32:36 +0900
@@ -406,7 +406,7 @@
 -Wunused-const-variable  -Wunused-const-variable=@var{n} @gol
 -Wunused-function  -Wunused-label  -Wunused-local-typedefs @gol
 -Wunused-macros @gol
--Wunused-parameter  -Wno-unused-result @gol
+-Wunused-parameter  -Wunused-result  -Wunused-result=strict @gol
 -Wunused-value  -Wunused-variable @gol
 -Wno-varargs  -Wvariadic-macros @gol
 -Wvector-operation-performance @gol
@@ -7037,12 +7037,20 @@
 To suppress this warning use the @code{unused} attribute
 (@pxref{Variable Attributes}).
 
-@item -Wno-unused-result
+@item -Wunused-result
 @opindex Wunused-result
 @opindex Wno-unused-result
-Do not warn if a caller of a function marked with attribute
+Warn if a caller of a function marked with attribute
 @code{warn_unused_result} (@pxref{Function Attributes}) does not use
-its return value. The default is @option{-Wunused-result}.
+its return value, unless the return value is explicitly discarded with a
+cast to @code{void}. This warning is enabled by default.
+
+@item -Wunused-result=strict
+@opindex Wunused-result=strict
+Warn if a caller of a function marked with attribute
+@code{warn_unused_result} (@pxref{Function Attributes}) does not use
+its return value, even if the return value is explicitly discarded with
+a cast to @code{void}.
 
 @item -Wunused-variable
 @opindex Wunused-variable
diff -urN gcc-12.2.0-orig/gcc/gimplify.cc gcc-12.2.0/gcc/gimplify.cc
--- gcc-12.2.0-orig/gcc/gimplify.cc 2022-08-19 17:09:52 +0900
+++ gcc-12.2.0/gcc/gimplify.cc  2023-04-23 03:34:02 +0900
@@ -15183,10 +15183,18 @@
  || fallback == fb_none)
{
  /* Just strip a conversion to void (or in void context) and
-try again.  */
- *expr_p = TREE_OPERAND (*expr_p, 0);
- ret = GS_OK;
- break;
+try again.  But if this is a function call cast to void
+and strict unused-result warnings are not enabled,
+preserve the cast so that do_warn_unused_result() knows
+not to emit a warning.  */
+ if (!(warn_unused_result == 1
+   && TREE_CODE (TREE_OPERAND (*expr_p, 0)) == CALL_EXPR
+   && VOID_TYPE_P (TREE_TYPE (*expr_p
+   {
+   *expr_p = TREE_OPERAND (*expr_p, 0);
+   ret = GS_OK;
+   break;
+   }
}
 
  ret = gimplify_conversion (expr_p);
diff -urN gcc-12.2.0-orig/gcc/testsuite/c-c++-common/attr-warn-unused-result.c 
gcc-12.2.0/gcc/testsuite/c-c++-common/attr-warn-unused-result.c
--- gcc-12.2.0-orig/gcc/testsuite/c-c++-common/attr-warn-unused-result.c
2022-08-19 17:09:53 +0900
+++ gcc-12.2.0/gcc/testsuite/c-c++-common/attr-warn-unused-result.c 
2023-04-22 20:01:42 +0900
@@ -1,6 +1,6 @@
 /* warn_unused_result attribute tests.  */
 /* { dg-do compile } */
-/* { dg-options "-O -ftrack-macro-expansion=0" } */
+/* { dg-options "-O -ftrack-macro-expansion=0 -Wunused-result=strict" } */
 
 #define WUR 

Re: [match.pd] [SVE] Add pattern to transform svrev(svrev(v)) --> v

2023-04-23 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 21 Apr 2023 at 21:57, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Wed, 19 Apr 2023 at 16:17, Richard Biener  
> > wrote:
> >>
> >> On Wed, Apr 19, 2023 at 11:21 AM Prathamesh Kulkarni
> >>  wrote:
> >> >
> >> > On Tue, 11 Apr 2023 at 19:36, Prathamesh Kulkarni
> >> >  wrote:
> >> > >
> >> > > On Tue, 11 Apr 2023 at 14:17, Richard Biener 
> >> > >  wrote:
> >> > > >
> >> > > > On Wed, Apr 5, 2023 at 10:39 AM Prathamesh Kulkarni via Gcc-patches
> >> > > >  wrote:
> >> > > > >
> >> > > > > Hi,
> >> > > > > For the following test:
> >> > > > >
> >> > > > > svint32_t f(svint32_t v)
> >> > > > > {
> >> > > > >   return svrev_s32 (svrev_s32 (v));
> >> > > > > }
> >> > > > >
> >> > > > > We generate 2 rev instructions instead of nop:
> >> > > > > f:
> >> > > > > rev z0.s, z0.s
> >> > > > > rev z0.s, z0.s
> >> > > > > ret
> >> > > > >
> >> > > > > The attached patch tries to fix that by trying to recognize the 
> >> > > > > following
> >> > > > > pattern in match.pd:
> >> > > > > v1 = VEC_PERM_EXPR (v0, v0, mask)
> >> > > > > v2 = VEC_PERM_EXPR (v1, v1, mask)
> >> > > > > -->
> >> > > > > v2 = v0
> >> > > > > if mask is { nelts - 1, nelts - 2, nelts - 3, ... }
> >> > > > >
> >> > > > > Code-gen with patch:
> >> > > > > f:
> >> > > > > ret
> >> > > > >
> >> > > > > Bootstrap+test passes on aarch64-linux-gnu, and SVE bootstrap in 
> >> > > > > progress.
> >> > > > > Does it look OK for stage-1 ?
> >> > > >
> >> > > > I didn't look at the patch but 
> >> > > > tree-ssa-forwprop.cc:simplify_permutation should
> >> > > > handle two consecutive permutes with the 
> >> > > > is_combined_permutation_identity
> >> > > > which might need tweaking for VLA vectors
> >> > > Hi Richard,
> >> > > Thanks for the suggestions. The attached patch modifies
> >> > > is_combined_permutation_identity
> >> > > to recognize the above pattern.
> >> > > Does it look OK ?
> >> > > Bootstrap+test in progress on aarch64-linux-gnu and x86_64-linux-gnu.
> >> > Hi,
> >> > ping https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615502.html
> >>
> >> Can you instead of def_stmt pass in a bool whether rhs1 is equal to rhs2
> >> and amend the function comment accordingly, say,
> >>
> >>   tem = VEC_PERM ;
> >>   res = VEC_PERM ;
> >>
> >> SAME_P specifies whether op0 and op1 compare equal.  */
> >>
> >> +  if (def_stmt)
> >> +gcc_checking_assert (is_gimple_assign (def_stmt)
> >> +&& gimple_assign_rhs_code (def_stmt) == 
> >> VEC_PERM_EXPR);
> >> this is then unnecessary
> >>
> >>mask = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask1, 
> >> mask2);
> >> +
> >> +  /* For VLA masks, check for the following pattern:
> >> + v1 = VEC_PERM_EXPR (v0, v0, mask)
> >> + v2 = VEC_PERM_EXPR (v1, v1, mask)
> >> + -->
> >> + v2 = v0
> >>
> >> you are not using 'mask' so please defer fold_ternary until after your
> >> special-case.
> >>
> >> +  if (operand_equal_p (mask1, mask2, 0)
> >> +  && !VECTOR_CST_NELTS (mask1).is_constant ()
> >> +  && def_stmt
> >> +  && operand_equal_p (gimple_assign_rhs1 (def_stmt),
> >> + gimple_assign_rhs2 (def_stmt), 0))
> >> +{
> >> +  vec_perm_builder builder;
> >> +  if (tree_to_vec_perm_builder (, mask1))
> >> +   {
> >> + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1));
> >> + vec_perm_indices sel (builder, 1, nelts);
> >> + if (sel.series_p (0, 1, nelts - 1, -1))
> >> +   return 1;
> >> +   }
> >> +  return 0;
> >>
> >> I'm defering to Richard whether this is the correct way to check for a 
> >> vector
> >> reversing mask (I wonder how constructing such mask is even possible)
> > Hi Richard,
> > Thanks for the suggestions, I have updated the patch accordingly.
> >
> > The following hunk from svrev_impl::fold() constructs mask in reverse:
> > /* Permute as { nelts - 1, nelts - 2, nelts - 3, ... }.  */
> > poly_int64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs));
> > vec_perm_builder builder (nelts, 1, 3);
> > for (int i = 0; i < 3; ++i)
> >   builder.quick_push (nelts - i - 1);
> > return fold_permute (f, builder);
> >
> > To see if mask chooses elements in reverse, I borrowed it from function 
> > comment
> > for series_p in vec-perm-indices.cc:
> > /* Return true if index OUT_BASE + I * OUT_STEP selects input
> >element IN_BASE + I * IN_STEP.  For example, the call to test
> >whether a permute reverses a vector of N elements would be:
> >
> >  series_p (0, 1, N - 1, -1)
> >
> >which would return true for { N - 1, N - 2, N - 3, ... }.  */
> >
> > Thanks,
> > Prathamesh
> >>
> >> Richard.
> >>
> >> > Thanks,
> >> > Prathamesh
> >> > >
> >> > > Thanks,
> >> > > Prathamesh
> >> > > >
> >> > > > Richard.
> >> > > >
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Prathamesh
> >
> > gcc/ChangeLog:
> >   * tree-ssa-forwprop.cc