Re: [PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-28 Thread chenglulu



在 2023/12/29 上午12:11, Xi Ruoyao 写道:

The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases.  For example:

 float a[1];
 float t() { return a[0] + a[8000]; }

is compiled to:

 la.local$r13,a
 la.local$r12,a+32768
 fld.s   $f1,$r13,0
 fld.s   $f0,$r12,-768
 fadd.s  $f0,$f1,$f0

by trunk.  But as we've explained in r14-4851, the following would be
better with -mexplicit-relocs=auto:

 pcalau12i   $r13,%pc_hi20(a)
 pcalau12i   $r12,%pc_hi20(a+32000)
 fld.s   $f1,$r13,%pc_lo12(a)
 fld.s   $f0,$r12,%pc_lo12(a+32000)
 fadd.s  $f0,$f1,$f0

However the sliding-window algorithm just won't detect the pcalau12i/fld
pair to be optimized.  Use a define_insn_and_split in combine pass will
work around the issue.

gcc/ChangeLog:

* config/loongarch/predicates.md
(symbolic_pcrel_offset_operand): New define_predicate.
(mem_simple_ldst_operand): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_rewrite_mem_for_simple_ldst): Declare.
* config/loongarch/loongarch.cc
(loongarch_rewrite_mem_for_simple_ldst): Implement.
* config/loongarch/loongarch.md (simple_load): New
define_insn_and_rewrite.
(simple_load_ext): Likewise.
(simple_store): Likewise.
(define_peephole2): Remove la.local/[f]ld peepholes.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c:
New test.
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c:
New test.
---

Changes from [v2]:
- Match (mem (symbol_ref ...)) instead of (symbol_ref ...) to retain the
   attributes of the MEM.
- Add a test to make sure the attributes of the MEM is retained.

[v2]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641430.html

Bootstrapped & regtestd on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch-protos.h   |   1 +
  gcc/config/loongarch/loongarch.cc |  16 +++
  gcc/config/loongarch/loongarch.md | 114 +-
  gcc/config/loongarch/predicates.md|  13 ++
  ...explicit-relocs-auto-single-load-store-2.c |  11 ++
  ...explicit-relocs-auto-single-load-store-3.c |  18 +++
  6 files changed, 86 insertions(+), 87 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h


/* snip */

  
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md

/* snip */

+(define_insn_and_rewrite "simple_load"
+  [(set (match_operand:LD_AT_LEAST_32_BIT 0 "register_operand" "=r,f")
+   (match_operand:LD_AT_LEAST_32_BIT 1 "mem_simple_ldst_operand" ""))]
+  "loongarch_pre_reload_split () \
+   && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
Is the '\' here dispensable? I don't seem to have added it when I wrote 
the conditions.

+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)"
+  "#"
+  "&& true"
{
-emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+operands[1] = loongarch_rewrite_mem_for_simple_ldst (operands[1]);
})

/* snip */

  ;; Synchronization instructions.
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index 83fea08315c..2158fe7538c 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -579,6 +579,19 @@ (define_predicate "symbolic_pcrel_operand"
return loongarch_symbolic_constant_p (op, ) && type == SYMBOL_PCREL;
  })
  
+(define_predicate "symbolic_pcrel_offset_operand"

+  (and (match_code "plus")
+   (match_operand 0 "symbolic_pcrel_operand")
+   (match_operand 1 "const_int_operand")))
+
+(define_predicate "mem_simple_ldst_operand"
+  (match_code "mem")
+{
+  op = XEXP (op, 0);
+  return symbolic_pcrel_operand (op, Pmode) ||
+symbolic_pcrel_offset_operand (op, Pmode);
+})
+
  

Symbol '||' It shouldn't be at the end of the line.

+  return symbolic_pcrel_operand (op, Pmode)
+|| symbolic_pcrel_offset_operand (op, Pmode);

Others LGTM.
Thanks!

/* snip */



Re: [PATCH] RISC-V: Fix misaligned stack offset for interrupt function

2023-12-28 Thread Fei Gao
On 2023-12-25 16:45  Kito Cheng  wrote:

>+++ b/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c
>@@ -0,0 +1,29 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -fno-schedule-insns 
>-fno-schedule-insns2" } */
>+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
>+
>+/*  Make sure no stack offset are misaligned.
>+**  interrupt:
>+**  ...
>+**    sd\tt0,40\(sp\)
>+**    frcsr\tt0
>+**    sw\tt0,32\(sp\)
>+**    sd\tt1,24\(sp\)
>+**    fsd\tft0,8\(sp\)
>+**  ...
>+**    lw\tt0,32\(sp\)
>+**    fscsr\tt0
>+**    ld\tt0,40\(sp\)
>+**    ld\tt1,24\(sp\)
>+**    fld\tft0,8\(sp\)
>+**  ...
>+*/
Hi Kito

The fix is fine but maybe using s0 instead of t0 is better:
1. simpler codes.
2. less stack size

current implementaion:
>+**        sd\tt0,40\(sp\)
>+**        frcsr\tt0
>+**        sw\tt0,32\(sp\)      //save content of frcsr in stack

use s0:
>+**        sd\tt0,40\(sp\)
>+**        frcsr\ts0                //save content of frcsr in s0 instead of 
>stack. If s0 is used as callee saved register, it will be saved again later by 
>legacy codes .

Also adding this change in riscv_expand_prologue & epilogue would be consistent 
with current stack allocation logic.

I can try it if you think necessary. 

BR
Fei
>+
>+
>+void interrupt(void) __attribute__((interrupt));
>+void interrupt(void)
>+{
>+  asm volatile ("# clobber!":::"t0", "t1", "ft0");
>+}
>+
>+/* { dg-final { check-function-bodies "**" "" } } */
>--
>2.40.1

[PATCH v1] LoongArch: testsuite:Add loongarch to gcc.dg/vect/slp-26.c.

2023-12-28 Thread chenxiaolong
In the LoongArch architecture, GCC supports the vectorization function tested
by vect/slp-26.c, but there is no detection of loongarch in dg-finals.  Add
loongarch to the appropriate dg-finals.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-26.c: Add loongarch.
---
 gcc/testsuite/gcc.dg/vect/slp-26.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c 
b/gcc/testsuite/gcc.dg/vect/slp-26.c
index c964635c91c..cfb763bf519 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -47,7 +47,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { mips_msa || { amdgcn-*-* || riscv_v } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
mips_msa || { amdgcn-*-* || riscv_v } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { mips_msa || { amdgcn-*-* || riscv_v } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { mips_msa || { amdgcn-*-* || riscv_v } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */
-- 
2.20.1



[PATCH v1] LoongArch: testsuite:Add loongarch to gcc.dg/vect/slp-21.c.

2023-12-28 Thread chenxiaolong
In the GCC code of LoongArch architecture, IFN_STORE_LANES optimization
operation is not supported, and four SLP statements are used for vectorization
in slp-21.c. So add loongarch*-*-* to the corresponding dg-finals.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-21.c: Add loongarch.
---
 gcc/testsuite/gcc.dg/vect/slp-21.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c 
b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 712a73b69d7..58751688414 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -213,7 +213,7 @@ int main (void)
 
Not all vect_perm targets support that, and it's a bit too specific to have
its own effective-target selector, so we just test targets directly.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { powerpc64*-*-* s390*-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } 
} } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { 
target { ! { vect_strided4 } } } } } */
   
-- 
2.20.1



Re: [PATCH v1 1/8] LoongArch: testsuite:Add detection procedures supported by the target.

2023-12-28 Thread Chenghua Xu


chenxiaolong writes:

> In order to improve and check the function of vector quantization in
> LoongArch architecture, tests on vector instruction set are provided
> in target-support.exp.
>
> gcc/testsuite/ChangeLog:
>
>   * lib/target-supports.exp:Add LoongArch to the list of supported
>   targets.
 ^ Should be a space after ":".
> ---
>  gcc/testsuite/lib/target-supports.exp | 219 +++---
>  1 file changed, 161 insertions(+), 58 deletions(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 14e3e119792..b90aaf8cabe 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3811,7 +3811,11 @@ proc add_options_for_bfloat16 { flags } {
>  # (fma, fms, fnma, and fnms) for both float and double.
>  
>  proc check_effective_target_scalar_all_fma { } {
> -return [istarget aarch64*-*-*]
> +if { [istarget aarch64*-*-*] 

Trailing whitespace.

> +  || [istarget loongarch*-*-*]} {
> + return 1
> +}
> +return 0
>  }
>  
>  # Return 1 if the target supports compiling fixed-point,
> @@ -4017,7 +4021,7 @@ proc check_effective_target_vect_cmdline_needed { } {
>|| ([istarget arm*-*-*] && [check_effective_target_arm_neon])
>|| [istarget aarch64*-*-*]
>|| [istarget amdgcn*-*-*]
> -  || [istarget riscv*-*-*]} {
> +  || [istarget riscv*-*-*] } {

Misses something ?

>   return 0
>   } else {
>   return 1
> @@ -4047,6 +4051,8 @@ proc check_effective_target_vect_int { } {
>&& [check_effective_target_s390_vx])
>|| ([istarget riscv*-*-*]
>&& [check_effective_target_riscv_v])
> +  || ([istarget loongarch*-*-*]
> +  && [check_effective_target_loongarch_sx])
>   }}]
>  }
>  
> @@ -4176,7 +4182,9 @@ proc check_effective_target_vect_intfloat_cvt { } {
>|| ([istarget s390*-*-*]
>&& [check_effective_target_s390_vxe2])
>|| ([istarget riscv*-*-*]
> -  && [check_effective_target_riscv_v]) }}]
> +  && [check_effective_target_riscv_v])
> +  || ([istarget loongarch*-*-*]
> +  && [check_effective_target_loongarch_sx]) }}]
>  }
>  
>  # Return 1 if the target supports signed double->int conversion
> @@ -4197,7 +4205,9 @@ proc check_effective_target_vect_doubleint_cvt { } {
>|| ([istarget s390*-*-*]
>&& [check_effective_target_s390_vx])
>|| ([istarget riscv*-*-*]
> -  && [check_effective_target_riscv_v]) }}]
> +  && [check_effective_target_riscv_v])
> +  || ([istarget loongarch*-*-*]
> +  && [check_effective_target_loongarch_sx]) }}]
>  }
>  
>  # Return 1 if the target supports signed int->double conversion
> @@ -4218,7 +4228,9 @@ proc check_effective_target_vect_intdouble_cvt { } {
>|| ([istarget s390*-*-*]
>&& [check_effective_target_s390_vx])
>|| ([istarget riscv*-*-*]
> -  && [check_effective_target_riscv_v]) }}]
> +  && [check_effective_target_riscv_v])
> +  || ([istarget loongarch*-*-*]
> +  && [check_effective_target_loongarch_sx]) }}]
>  }
>  
>  #Return 1 if we're supporting __int128 for target, 0 otherwise.
> @@ -4251,7 +4263,9 @@ proc check_effective_target_vect_uintfloat_cvt { } {
>|| ([istarget s390*-*-*]
>&& [check_effective_target_s390_vxe2])
>|| ([istarget riscv*-*-*]
> -  && [check_effective_target_riscv_v]) }}]
> +  && [check_effective_target_riscv_v])
> +  || ([istarget loongarch*-*-*]
> +  && [check_effective_target_loongarch_sx]) }}]
>  }
>  
>  
> @@ -4270,7 +4284,9 @@ proc check_effective_target_vect_floatint_cvt { } {
>|| ([istarget s390*-*-*]
>&& [check_effective_target_s390_vxe2])
>|| ([istarget riscv*-*-*]
> -  && [check_effective_target_riscv_v]) }}]
> +  && [check_effective_target_riscv_v])
> +  || ([istarget loongarch*-*-*]
> +  && [check_effective_target_loongarch_sx]) }}]
>  }
>  
>  # Return 1 if the target supports unsigned float->int conversion
> @@ -4287,7 +4303,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
>   || ([istarget s390*-*-*]
>   && [check_effective_target_s390_vxe2])
>   || ([istarget riscv*-*-*]
> - && [check_effective_target_riscv_v]) }}]
> + && [check_effective_target_riscv_v])
> + || ([istarget loongarch*-*-*]
> + && [check_effective_target_loongarch_sx]) }}]
>  }
>  
>  # Return 1 if the target supports vector integer char -> long long extend 
> optab
> @@ -4296,7 +4314,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
>  proc check_effective_target_vect_ext_char_longlong { } {
>  

RE: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

2023-12-28 Thread Li, Pan2
Thanks Jeff.

I think I locate where aarch64 performs the trick here.

1. In the .final we have rtl like

(insn:TI 6 8 29 (set (reg:SF 32 v0)
(const_double:SF -0.0 [-0x0.0p+0])) 
"/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 
{*movsf_aarch64}
 (nil))

2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. 
Aka, it will generate movi\t%0.2s, #0 if
the aarch64_reg_or_fp_zero is true.

1640 (define_insn "*mov_aarch64"
1641   [(set (match_operand:SFD 0 "nonimmediate_operand")
1642   match_operand:SFD 1 "general_operand"))]
1643   "TARGET_FLOAT && (register_operand (operands[0], mode)
1644 || aarch64_reg_or_fp_zero (operands[1], mode))"
1645   {@ [ cons: =0 , 1   ; attrs: type , arch  ]
1646  [ w, Y   ; neon_move   , simd  ] movi\t%0.2s, #0

3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input 
rtl will return true in line 10873 because of no-signed-zero is given.

10863 bool
10864 aarch64_float_const_zero_rtx_p (rtx x
10865 {
10866   /* 0.0 in Decimal Floating Point cannot be represented by #0 or
10867  zr as our callers expect, so no need to check the actual
10868  value if X is of Decimal Floating Point type.  */
10869   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
10870 return false;
10871
10872   if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
10873 return !HONOR_SIGNED_ZEROS (GET_MODE (x));
10874   return real_equal (CONST_DOUBLE_REAL_VALUE (x), );
10875 }

I think that explain why we have +0.0 in aarch64 here.

Pan

-Original Message-
From: Jeff Law  
Sent: Friday, December 29, 2023 9:04 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with 
variable factor



On 12/28/23 17:42, Li, Pan2 wrote:
> Thanks Jeff for comments, and Happy new year!
> 
>> Interesting.  So I'd actually peel one more layer off this onion.  Why
>> do the aarch64 and riscv targets generate different constants (0.0 vs
>> -0.0)?
> 
> Yeah, it surprise me too when debugging the foo function. But didn't dig into 
> it in previous as it may be unrelated to vectorize.
> 
>> Is it possible that the aarch64 is generating 0.0 when asked for -0.0
>> and -fno-signed-zeros is in effect?  That's a valid thing to do when
>> -fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64
>> backend.
> 
> Sure, will have a try for making the -0.0 happen in aarch64.
I would first look at the .optimized dump, then I'd look at the .final 
dump alongside the resulting assembly for aarch64.

I bet we're going to find that the aarch64 target internally converts 
-0.0 to 0.0 when we're not honoring signed zeros.

jeff


[PATCH] Fix gen-vect-26.c testcase after loops with multiple exits [PR113167]

2023-12-28 Thread Andrew Pinski
This fixes the gcc.dg/tree-ssa/gen-vect-26.c testcase by adding
`#pragma GCC novector` in front of the loop that is doing the checking
of the result. We only want to test the first loop to see if it can be
vectorize.

Committed as obvious after testing on x86_64-linux-gnu with -m32.

gcc/testsuite/ChangeLog:

PR testsuite/113167
* gcc.dg/tree-ssa/gen-vect-26.c: Mark the test/check loop
as novector.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
index 710696198bb..fdcec67bde6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
@@ -19,6 +19,7 @@ int main ()
 }
 
   /* check results:  */
+  #pragma GCC novector
   for (i = 1; i <= N; i++)
 {
   if (ia[i] != 5)
-- 
2.39.3



[PATCH v4 6/6] RISC-V: Add support for xtheadvector-specific intrinsics.

2023-12-28 Thread Jun Sha (Joshua)
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: Add new patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 .../riscv/riscv-vector-builtins-shapes.cc | 126 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   3 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 313 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/t-riscv  |  16 +
 .../riscv/thead-vector-builtins-functions.def |  39 +++
 gcc/config/riscv/thead-vector-builtins.cc | 200 +++
 gcc/config/riscv/thead-vector-builtins.h  |  64 
 gcc/config/riscv/thead-vector.md  | 253 ++
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |  68 
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |  68 
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |  68 
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |  68 
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |  68 
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |  68 
 17 files changed, 1545 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/thead-vector-builtins.cc
 create mode 100644 gcc/config/riscv/thead-vector-builtins.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 

[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +
 gcc/config/riscv/riscv-v.cc   |  13 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   3 +
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv-vsetvl.cc  |  10 +
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h|  49 +
 gcc/config/riscv/thead-vector.md  | 142 +
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md|  36 +++-
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 17 files changed, 471 insertions(+), 187 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..1445d98c147 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8b8a92f10a1..1fac56c7095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !TARGET_XTHEADVECTOR"
   {
 

[PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2023-12-28 Thread Jun Sha (Joshua)
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. We only check the
prefix is 'v', so that no extra attribute is needed.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_asm_output_opcode): 
New function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h|  1 +
 gcc/config/riscv/riscv.cc  | 14 ++
 gcc/config/riscv/riscv.h   |  4 
 .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
 4 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..5ea54b45703 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
 };
 
 /* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
 extern int riscv_float_const_rtx_index_for_fli (rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..ea1d59d9cf2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
 }
 
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
+  p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
 /* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
 
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME));\
   } while (0)
 
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
 #define JUMP_TABLES_IN_TEXT_SECTION 0
 #define CASE_VECTOR_MODE SImode
 #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..eee727ef6b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
-- 
2.17.1



[PATCH v4] RISC-V: Introduce XTheadVector as a subset of V1.0.0

2023-12-28 Thread Jun Sha (Joshua)
This patch is to introduce basic XTheadVector support
(march string parsing and a test for __riscv_xtheadvector)
according to https://github.com/T-head-Semi/thead-extension-spec/

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse): Add new vendor extension.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Add test marco.
* config/riscv/riscv.opt:  Add new mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-__riscv_th_v_intrinsic.c: New test.
* gcc.target/riscv/rvv/xtheadvector.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/common/config/riscv/riscv-common.cc   | 23 +++
 gcc/config/riscv/riscv-c.cc   |  8 +--
 gcc/config/riscv/riscv.opt|  2 ++
 .../riscv/predef-__riscv_th_v_intrinsic.c | 11 +
 .../gcc.target/riscv/rvv/xtheadvector.c   | 13 +++
 5 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f20d179568d..66b20c154a9 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -368,6 +368,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadvector", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1251,6 +1252,15 @@ riscv_subset_list::check_conflict_ext ()
   if (lookup ("zcmp"))
error_at (m_loc, "%<-march=%s%>: zcd conflicts with zcmp", m_arch);
 }
+
+  if ((lookup ("v") || lookup ("zve32x")
+|| lookup ("zve64x") || lookup ("zve32f")
+|| lookup ("zve64f") || lookup ("zve64d")
+|| lookup ("zvl32b") || lookup ("zvl64b")
+|| lookup ("zvl128b") || lookup ("zvfh"))
+&& lookup ("xtheadvector"))
+error_at (m_loc, "%<-march=%s%>: xtheadvector conflicts with vector "
+  "extension or its sub-extensions", m_arch);
 }
 
 /* Parsing function for multi-letter extensions.
@@ -1743,6 +1753,19 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadmemidx",  _options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
   {"xtheadmempair", _options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
   {"xtheadsync",_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
+  {"xtheadvector",  _options::x_riscv_xthead_subext, MASK_XTHEADVECTOR},
+  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_32},
+  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_64},
+  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
+  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_64},
+  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
+  {"xtheadvector",  _options::x_riscv_zvl_flags, MASK_ZVL32B},
+  {"xtheadvector",  _options::x_riscv_zvl_flags, MASK_ZVL64B},
+  {"xtheadvector",  _options::x_riscv_zvl_flags, MASK_ZVL128B},
+  {"xtheadvector",  _options::x_riscv_zf_subext, MASK_ZVFHMIN},
+  {"xtheadvector",  _options::x_riscv_zf_subext, MASK_ZVFH},
+  {"xtheadvector",  _options::x_target_flags, MASK_FULL_V},
+  {"xtheadvector",  _options::x_target_flags, MASK_VECTOR},
 
   {"xventanacondops", _options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index d70eb8ed361..d7c63ead147 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -138,6 +138,10 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 riscv_ext_version_value (0, 11));
 }
 
+   if (TARGET_XTHEADVECTOR)
+ builtin_define_with_int_value ("__riscv_th_v_intrinsic",
+riscv_ext_version_value (0, 11));
+
   /* Define architecture extension test macros.  */
   builtin_define_with_int_value ("__riscv_arch_test", 1);
 
@@ -191,8 +195,8 @@ riscv_pragma_intrinsic (cpp_reader *)
 {
   if (!TARGET_VECTOR)
{
- error ("%<#pragma riscv intrinsic%> option %qs needs 'V' extension "
-"enabled",
+ error ("%<#pragma riscv intrinsic%> option %qs needs 'V' or "
+"'XTHEADVECTOR' extension enabled",
 name);
  return;
}
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ede2d655e73..7de5f18e11b 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -449,6 +449,8 @@ Mask(XTHEADMEMPAIR) Var(riscv_xthead_subext)
 
 Mask(XTHEADSYNC)

[PATCH v4] RISC-V: Change csr_operand into vector_length_operand for vsetvl patterns.

2023-12-28 Thread Jun Sha (Joshua)
This patch use vector_length_operand instead of csr_operand for
vsetvl patterns, so that changes for vector will not affect scalar
patterns using csr_operand in riscv.md.

gcc/ChangeLog:

* config/riscv/vector.md:
Use vector_length_operand for vsetvl patterns.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/vector.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f607d768b26..b5a9055cdc4 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1496,7 +1496,7 @@
 
 (define_insn "@vsetvl"
   [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+   (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1542,7 +1542,7 @@
 ;; in vsetvl instruction pattern.
 (define_insn "@vsetvl_discard_result"
   [(set (reg:SI VL_REGNUM)
-   (unspec:SI [(match_operand:P 0 "csr_operand" "rK")
+   (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK")
(match_operand 1 "const_int_operand" "i")
(match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL))
(set (reg:SI VTYPE_REGNUM)
@@ -1564,7 +1564,7 @@
 ;; such pattern can allow us gain benefits of these optimizations.
 (define_insn_and_split "@vsetvl_no_side_effects"
   [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+   (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1608,7 +1608,7 @@
   [(set (match_operand:DI 0 "register_operand")
 (sign_extend:DI
   (subreg:SI
-   (unspec:DI [(match_operand:P 1 "csr_operand")
+   (unspec:DI [(match_operand:P 1 "vector_length_operand")
(match_operand 2 "const_int_operand")
(match_operand 3 "const_int_operand")
(match_operand 4 "const_int_operand")
-- 
2.17.1



[PATCH v4] RISC-V: Change csr_operand into

2023-12-28 Thread Jun Sha (Joshua)
This patch use vector_length_operand instead of csr_operand for
vsetvl patterns, so that changes for vector will not affect scalar
patterns using csr_operand in riscv.md.

gcc/ChangeLog:

* config/riscv/vector.md:
Use vector_length_operand for vsetvl patterns.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/vector.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f607d768b26..b5a9055cdc4 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1496,7 +1496,7 @@
 
 (define_insn "@vsetvl"
   [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+   (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1542,7 +1542,7 @@
 ;; in vsetvl instruction pattern.
 (define_insn "@vsetvl_discard_result"
   [(set (reg:SI VL_REGNUM)
-   (unspec:SI [(match_operand:P 0 "csr_operand" "rK")
+   (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK")
(match_operand 1 "const_int_operand" "i")
(match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL))
(set (reg:SI VTYPE_REGNUM)
@@ -1564,7 +1564,7 @@
 ;; such pattern can allow us gain benefits of these optimizations.
 (define_insn_and_split "@vsetvl_no_side_effects"
   [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+   (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1608,7 +1608,7 @@
   [(set (match_operand:DI 0 "register_operand")
 (sign_extend:DI
   (subreg:SI
-   (unspec:DI [(match_operand:P 1 "csr_operand")
+   (unspec:DI [(match_operand:P 1 "vector_length_operand")
(match_operand 2 "const_int_operand")
(match_operand 3 "const_int_operand")
(match_operand 4 "const_int_operand")
-- 
2.17.1



[PATCH v4] RISC-V: Refactor riscv-vector-builtins-bases.cc

2023-12-28 Thread Jun Sha (Joshua)
This patch moves the definition of the enums lst_type and
frm_op_type into riscv-vector-builtins-bases.h and removes
the static visibility of fold_fault_load(), so these
can be used in other compile units.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (enum lst_type):
(enum frm_op_type): move to riscv-vector-builtins-bases.h
* config/riscv/riscv-vector-builtins-bases.h
(GCC_RISCV_VECTOR_BUILTINS_BASES_H): Add header files.
(enum lst_type): move from
(enum frm_op_type): riscv-vector-builtins-bases.cc
(fold_fault_load): riscv-vector-builtins-bases.cc

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 .../riscv/riscv-vector-builtins-bases.cc  | 18 +-
 .../riscv/riscv-vector-builtins-bases.h   | 19 +++
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..c51affde353 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -48,24 +48,8 @@ using namespace riscv_vector;
 
 namespace riscv_vector {
 
-/* Enumerates types of loads/stores operations.
-   It's only used in here so we don't define it
-   in riscv-vector-builtins-bases.h.  */
-enum lst_type
-{
-  LST_UNIT_STRIDE,
-  LST_STRIDED,
-  LST_INDEXED,
-};
-
-enum frm_op_type
-{
-  NO_FRM,
-  HAS_FRM,
-};
-
 /* Helper function to fold vleff and vlsegff.  */
-static gimple *
+gimple *
 fold_fault_load (gimple_folder )
 {
   /* fold fault_load (const *base, size_t *new_vl, size_t vl)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 131041ea66f..42d0cd17dc1 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -21,8 +21,27 @@
 #ifndef GCC_RISCV_VECTOR_BUILTINS_BASES_H
 #define GCC_RISCV_VECTOR_BUILTINS_BASES_H
 
+#include "gimple.h"
+#include "riscv-vector-builtins.h"
+
 namespace riscv_vector {
 
+/* Enumerates types of loads/stores operations.  */
+enum lst_type
+{
+  LST_UNIT_STRIDE,
+  LST_STRIDED,
+  LST_INDEXED,
+};
+
+enum frm_op_type
+{
+  NO_FRM,
+  HAS_FRM,
+};
+
+extern gimple *fold_fault_load (gimple_folder );
+
 namespace bases {
 extern const function_base *const vsetvl;
 extern const function_base *const vsetvlmax;
-- 
2.17.1



[PATCH v4] RISC-V: Support XTheadVector extension

2023-12-28 Thread Jun Sha (Joshua)
This patch series presents gcc implementation of the XTheadVector
extension [1].

[1] https://github.com/T-head-Semi/thead-extension-spec/

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in order not to
generate instructions that xtheadvector does not support,
causing 36 changes in vector.md.

For the th. prefix issue, we use current_output_insn and
the ASM_OUTPUT_OPCODE hook instead of directly modifying
patterns in vector.md.

We have run the GCC test suite and can confirm that there
are no regressions.

All the test results can be found in the following links,
Run without xtheadvector:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803686.html

Run with xtheadvector:
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803687.html

Furthermore, we have run the tests in 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/main/examples, 
and all the tests passed.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 

RISC-V: Refactor riscv-vector-builtins-bases.cc
RISC-V: Change csr_operand into vector_length_operand for vsetvl patterns
RISC-V: Introduce XTheadVector as a subset of V1.0.0
RISC-V: Adds the prefix "th." for the instructions of XTheadVector
RISC-V: Handle differences between XTheadvector and Vector
RISC-V: Add support for xtheadvector-specific intrinsics
RISC-V: ...

---
 gcc/common/config/riscv/riscv-common.cc   |   23 +
 gcc/config.gcc|4 +-
 gcc/config/riscv/autovec.md   |2 +-
 gcc/config/riscv/predicates.md|8 +-
 gcc/config/riscv/riscv-c.cc   |8 +-
 gcc/config/riscv/riscv-protos.h   |1 +
 gcc/config/riscv/riscv-string.cc  |3 +
 gcc/config/riscv/riscv-v.cc   |   13 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   18 +-
 .../riscv/riscv-vector-builtins-bases.h   |   19 +
 .../riscv/riscv-vector-builtins-shapes.cc |  149 +
 .../riscv/riscv-vector-builtins-shapes.h  |3 +
 .../riscv/riscv-vector-builtins-types.def |  120 +
 gcc/config/riscv/riscv-vector-builtins.cc |  315 +-
 gcc/config/riscv/riscv-vector-builtins.h  |5 +-
 gcc/config/riscv/riscv-vector-switch.def  |  150 +-
 gcc/config/riscv/riscv.cc |   46 +-
 gcc/config/riscv/riscv.h  |4 +
 gcc/config/riscv/riscv.opt|2 +
 gcc/config/riscv/riscv_th_vector.h|   49 +
 gcc/config/riscv/t-riscv  |   16 +
 .../riscv/thead-vector-builtins-functions.def |  659 
 gcc/config/riscv/thead-vector-builtins.cc |  887 ++
 gcc/config/riscv/thead-vector-builtins.h  |  123 +
 gcc/config/riscv/thead-vector.md  | 2827 +
 gcc/config/riscv/vector-iterators.md  |  186 +-
 gcc/config/riscv/vector.md|   44 +-
 .../riscv/predef-__riscv_th_v_intrinsic.c |   11 +
 .../gcc.target/riscv/rvv/base/abi-1.c |2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |2 +-
 .../gcc.target/riscv/rvv/xtheadvector.c   |   13 +
 .../riscv/rvv/xtheadvector/prefix.c   |   12 +
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |   68 +
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |   68 +
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |   68 +
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |   68 +
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |   68 +
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |   68 +
 gcc/testsuite/lib/target-supports.exp |   12 +
 39 files changed, 5931 insertions(+), 213 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/thead-vector-builtins.cc
 create mode 100644 gcc/config/riscv/thead-vector-builtins.h
 create mode 100644 gcc/config/riscv/thead-vector.md
 create mode 100644 
gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c


[PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-28 Thread chenxiaolong
After the detection of maximum reduction is enabled on LoongArch architecture,
the regression test of GCC finds that vect-fmin-3.c fails. Currently, in the
target-supports.exp file, only aarch64,arm,riscv, and LoongArch architectures
are supported. Through analysis, the "-ffast-math" compilation option needs to
be added to the test case in order to successfully reduce using vectorization.
The original patch was submitted by author Richard Sandiford.

The initial patch information submitted is as follows:

commit e32b9eb32d7cd2d39bf9c70497890ac61b9ee14c

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fmin-3.c:Adding an extra "-ffast-math" to the
compilation option ensures that the loop can be reduced to maximum
success.
---
 gcc/testsuite/gcc.dg/vect/vect-fmin-3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c
index 2e282ba6878..edef57925c1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "-ffast-math" } */
 
 #include "tree-vect.h"
 
-- 
2.20.1



Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread joshua
Hi Juzhe,

These vsetvl patterns were written by you with csr_operand initially.
Are you sure it can be repalced by vector_length_operand?

Joshua






--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 10:25
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector


Chnage it into vector_length_operand.


juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2023-12-29 10:25
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector

We do not have vector_length_operand in vsetvl patterns.
 
(define_insn "@vsetvl"
  [(set (match_operand:P 0 "register_operand" "=r")
(unspec:P [(match_operand:P 1 "vector_csr_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
   (match_operand 5 "const_int_operand" "i")] UNSPEC_VSETVL))
   (set (reg:SI VL_REGNUM)
(unspec:SI [(match_dup 1)
    (match_dup 2)
    (match_dup 3)] UNSPEC_VSETVL))
   (set (reg:SI VTYPE_REGNUM)
(unspec:SI [(match_dup 2)
    (match_dup 3)
    (match_dup 4)
    (match_dup 5)] UNSPEC_VSETVL))]
  "TARGET_VECTOR"
  "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"
  [(set_attr "type" "vsetvl")
   (set_attr "mode" "")
   (set (attr "sew") (symbol_ref "INTVAL (operands[2])"))
   (set (attr "vlmul") (symbol_ref "INTVAL (operands[3])"))
   (set (attr "ta") (symbol_ref "INTVAL (operands[4])"))
   (set (attr "ma") (symbol_ref "INTVAL (operands[5])"))])
 
 
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 10:22
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
 
Why add vector_csr_operand ?
Why not use vector_length_operand?
 
 
juzhe.zh...@rivai.ai
 
 
发件人: joshua
发送时间: 2023-12-29 10:17
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
 
Hi Juzhe,
 
For vector_csr_operand, please refer to
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html.
 
Joshua
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 10:14
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
 
No, we should handle this carefully step by step.
 
 
First, after the the first kind of theadvector is merged, then we can talk 
about second kind of theadvector later.
 
 
I am confused by this patch for example:
 
 
 (define_predicate "vector_csr_operand"-  (ior (match_operand 0 
"const_csr_operand")-   (match_operand 0 "register_operand")))+  (ior (and 
(match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")+  
(match_operand 0 "const_csr_operand"))+    (match_operand 0 
"register_operand")))
 
 
I just checked upstream code, we don't have vector_csr_operand.
 
 
So, to make me easily review and trace the codes, plz send the patch better 
organized.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
发件人: joshua
发送时间: 2023-12-29 10:09
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
 
H Juzhe,
 
This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.
 
BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"?It adds support new xtheadvector
instructions. Is it OK to be merged?
 
Joshua
 
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 09:58
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
 
I am confused by the series patches.
 
 
I thought this patch:

Re: [PATCH] MIPS: Implement TARGET_INSN_COSTS

2023-12-28 Thread YunQiang Su
Roger Sayle  于2023年12月29日周五 00:54写道:
>
>
>
> The current (default) behavior is that when the target doesn’t define
>
> TARGET_INSN_COST the middle-end uses the backend’s
>
> TARGET_RTX_COSTS, so multiplications are slower than additions,
>
> but about the same size when optimizing for size (with -Os or -Oz).
>
>
>
> All of this gets disabled with your proposed patch.
>
> [If you don’t check speed, you probably shouldn’t touch insn_cost].
>
>
>
> I agree that a backend can fine tune the (speed and size) costs of
>
> instructions (especially complex !single_set instructions) via
>
> attributes in the machine description, but these should be used
>
> to override/fine-tune rtx_costs, not override/replace/duplicate them.
>
>
>
> Having accurate rtx_costs also helps RTL expansion and the earlier
>
> optimizers, but insn_cost is used by combine and the later RTL
>
> optimization passes, once instructions have been recognized.
>
>

Yes. I find this problem when I try to combine sign_extend and zero_extract.
When I try to add an new define_insn for
(set (reg/v:DI 200 [ val ])
(sign_extend:DI
(ior:SI (and:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
(const_int 16777215 [0xff]))
(ashift:SI (subreg:SI (reg:QI 205 [ MEM[(const unsigned
char *)buf_8(D) + 3B] ]) 0)
(const_int 24 [0x18])

to generate an `ins` instruction.
It is refused by `combine_validate_cost`.
`combine_validate_cost` considers our RTX has cost COSTS_N_INSNS(3)
instead of COSTS_N_INSNS(1).
So we need a method to do so.

I guess for all ports, we need a framework.
`rtx_cost` should also tell me how many instructions it believes this RTX has.
It may help us to accept some more complex RTX_INSNs, and convert them
to 1 or 2 instructions.
We can combine INSNs more aggressively.

If so, we can calculate a ratio: total / insn_count.
For MUL/DIV, the ratio may be a number > COSTS_N_INSNS (1).
For our example above, the ratio will be COSTS_N_INSNS (1).
So we can decide if we should accept this new RTX.

>
> Might I also recommend that instead of insn_count*perf_ratio*4,
>
> or even the slightly better COSTS_N_INSNS (insn_count*perf_ratio),
>
> that encode the relative cost in the attribute, avoiding the multiplication
>
> (at runtime), and allowing fine tuning like “COSTS_N_INSNS(2) – 1”.
>
> Likewise, COSTS_N_BYTES is a very useful macro for a backend to
>
> define/use in rtx_costs.  Conveniently for many RISC machines,
>
> 1 instruction takes about 4 bytes, for COSTS_N_INSNS (1) is
>
> (approximately) comparable to COSTS_N_BYTES (4).
>
>
>
> I hope this helps.  Perhaps something like:
>
>
>
>
>
> static int
>
> mips_insn_cost (rtx_insn *insn, bool speed)
>
> {
>
>   int cost;
>
>   if (recog_memoized (insn) >= 0)
>
> {
>
>   if (speed)
>
> {
>
>   /* Use cost if provided.  */
>
>   cost = get_attr_cost (insn);
>
>   if (cost > 0)
>
> return cost;
>
> }
>
>   else
>
> {
>
>   /* If optimizing for size, we want the insn size.  */
>
>   return get_attr_length (insn);
>
> }
>
> }
>
>
>
>   if (rtx set = single_set (insn))
>
> cost = set_rtx_cost (set, speed);
>
>   else
>
> cost = pattern_cost (PATTERN (insn), speed);
>
>   /* If the cost is zero, then it's likely a complex insn.  We don't
>
>  want the cost of these to be less than something we know about.  */
>
>   return cost ? cost : COSTS_N_INSNS (2);
>
> }
>
>


Re:Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread joshua
We do not have vector_length_operand in vsetvl patterns.

(define_insn "@vsetvl"
  [(set (match_operand:P 0 "register_operand" "=r")
(unspec:P [(match_operand:P 1 "vector_csr_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
   (match_operand 5 "const_int_operand" "i")] UNSPEC_VSETVL))
   (set (reg:SI VL_REGNUM)
(unspec:SI [(match_dup 1)
(match_dup 2)
(match_dup 3)] UNSPEC_VSETVL))
   (set (reg:SI VTYPE_REGNUM)
(unspec:SI [(match_dup 2)
(match_dup 3)
(match_dup 4)
(match_dup 5)] UNSPEC_VSETVL))]
  "TARGET_VECTOR"
  "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"
  [(set_attr "type" "vsetvl")
   (set_attr "mode" "")
   (set (attr "sew") (symbol_ref "INTVAL (operands[2])"))
   (set (attr "vlmul") (symbol_ref "INTVAL (operands[3])"))
   (set (attr "ta") (symbol_ref "INTVAL (operands[4])"))
   (set (attr "ma") (symbol_ref "INTVAL (operands[5])"))])







--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 10:22
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector


Why add vector_csr_operand ?
Why not use vector_length_operand?


juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2023-12-29 10:17
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

Hi Juzhe,
 
For vector_csr_operand, please refer to
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html.
 
Joshua
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 10:14
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
 
No, we should handle this carefully step by step.
 
 
First, after the the first kind of theadvector is merged, then we can talk 
about second kind of theadvector later.
 
 
I am confused by this patch for example:
 
 
 (define_predicate "vector_csr_operand"-  (ior (match_operand 0 
"const_csr_operand")-   (match_operand 0 "register_operand")))+  (ior (and 
(match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")+  
(match_operand 0 "const_csr_operand"))+    (match_operand 0 
"register_operand")))
 
 
I just checked upstream code, we don't have vector_csr_operand.
 
 
So, to make me easily review and trace the codes, plz send the patch better 
organized.
 
 
Thanks.
juzhe.zh...@rivai.ai
 
 
发件人: joshua
发送时间: 2023-12-29 10:09
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
 
H Juzhe,
 
This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.
 
BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"?It adds support new xtheadvector
instructions. Is it OK to be merged?
 
Joshua
 
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 09:58
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
 
I am confused by the series patches.
 
 
I thought this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html 
is enough to support partial theadvector that can leverage directly RVV1.0 ?
 
 
Could clean up and resend the patches base on patch above (supposed it is 
merged already) ?
 
 
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2023-12-29 09:46
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share 

Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread joshua
Hi Juzhe,

For vector_csr_operand, please refer to
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html.

Joshua





--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 10:14
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
jinma; "cooper.qu"
主 题:Re: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector


No, we should handle this carefully step by step.


First, after the the first kind of theadvector is merged, then we can talk 
about second kind of theadvector later.


I am confused by this patch for example:


 (define_predicate "vector_csr_operand"-  (ior (match_operand 0 
"const_csr_operand")-   (match_operand 0 "register_operand")))+  (ior (and 
(match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")+  
(match_operand 0 "const_csr_operand"))+(match_operand 0 
"register_operand")))


I just checked upstream code, we don't have vector_csr_operand.


So, to make me easily review and trace the codes, plz send the patch better 
organized.


Thanks.
juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2023-12-29 10:09
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

H Juzhe,
 
This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.
 
BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"?It adds support new xtheadvector
instructions. Is it OK to be merged?
 
Joshua
 
 
 
 
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 09:58
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
 
I am confused by the series patches.
 
 
I thought this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html 
is enough to support partial theadvector that can leverage directly RVV1.0 ?
 
 
Could clean up and resend the patches base on patch above (supposed it is 
merged already) ?
 
 
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2023-12-29 09:46
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.

[PATCH v1 8/8] LoongArch: testsuite:Modify the result check in the FMA file.

2023-12-28 Thread chenxiaolong
When gcc enabled the vectorization of the common layer, some FAIL items
appeared in GCC regression tests, such as gcc.dg/fma-{3,4,6,7}.c. On LoongArch
architecture, for example, the result of fmsub.s instruction is a*b-c, and
there is a problem of positive and negative zero inequality between the result
of c-a*b expected to be calculated, so the detection of such problems in
LoongArch architecture needs to be set to unsupported state.

gcc/testsuite/ChangeLog:

* gcc.dg/fma-3.c:The intermediate file corresponding to the
function does not produce the corresponding FNMA symbol, so the test
rules should be skipped when testing.
* gcc.dg/fma-4.c:The intermediate file corresponding to the
function does not produce the corresponding FNMS symbol, so skip the
test rules when testing.
* gcc.dg/fma-6.c:The cause is the same as fma-3.c.
* gcc.dg/fma-7.c:The cause is the same as fma-4.c
---
 gcc/testsuite/gcc.dg/fma-3.c | 2 +-
 gcc/testsuite/gcc.dg/fma-4.c | 2 +-
 gcc/testsuite/gcc.dg/fma-6.c | 2 +-
 gcc/testsuite/gcc.dg/fma-7.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/fma-3.c b/gcc/testsuite/gcc.dg/fma-3.c
index 699aa2c9530..6649b54b6f9 100644
--- a/gcc/testsuite/gcc.dg/fma-3.c
+++ b/gcc/testsuite/gcc.dg/fma-3.c
@@ -12,4 +12,4 @@ f2 (double a, double b, double c)
   return c - a * b;
 }
 
-/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target 
scalar_all_fma } } } */
+/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target 
{ scalar_all_fma && { ! loongarch*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/fma-4.c b/gcc/testsuite/gcc.dg/fma-4.c
index bff928f1fac..f1701c1961a 100644
--- a/gcc/testsuite/gcc.dg/fma-4.c
+++ b/gcc/testsuite/gcc.dg/fma-4.c
@@ -12,4 +12,4 @@ f2 (double a, double b, double c)
   return -(a * b) - c;
 }
 
-/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target 
scalar_all_fma } } } */
+/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target 
{ scalar_all_fma && { ! loongarch*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/fma-6.c b/gcc/testsuite/gcc.dg/fma-6.c
index 87258cec4a2..9e49b62b6de 100644
--- a/gcc/testsuite/gcc.dg/fma-6.c
+++ b/gcc/testsuite/gcc.dg/fma-6.c
@@ -64,4 +64,4 @@ f10 (double a, double b, double c)
   return -__builtin_fma (a, b, -c);
 }
 
-/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target 
scalar_all_fma } } } */
+/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target { 
scalar_all_fma && { ! loongarch*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/fma-7.c b/gcc/testsuite/gcc.dg/fma-7.c
index f409cc8ee3c..86aacad7b90 100644
--- a/gcc/testsuite/gcc.dg/fma-7.c
+++ b/gcc/testsuite/gcc.dg/fma-7.c
@@ -64,4 +64,4 @@ f10 (double a, double b, double c)
   return -__builtin_fma (a, b, c);
 }
 
-/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target 
scalar_all_fma } } } */
+/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target { 
scalar_all_fma && { ! loongarch*-*-* } } } } } */
-- 
2.20.1



[PATCH v1 4/8] LoongArch: testsuite:Fix FAIL in file bind_c_array_params_2.f90.

2023-12-28 Thread chenxiaolong
In the GCC regression test result, it is found that the
bind_c_array_params_2.f90 test fails. After analysis, it is found that the
reason why the test fails is that the regular expression in the test result
cannot correctly detect the correct assembly code (such as bl %plt(myBindC))
generated on the LoongArch architecture, such as the assembly code generated
on the x86 function call (call myBindC).

gcc/testsuite/ChangeLog:

* gfortran.dg/bind_c_array_params_2.f90:Add code test rules to
support testing of the loongArch architecture.
---
 gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 
b/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90
index 0825efc7a2f..aa6a37b4850 100644
--- a/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90
+++ b/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90
@@ -2,6 +2,7 @@
 ! { dg-options "-std=f2008ts -fdump-tree-original" }
 ! { dg-additional-options "-mno-explicit-relocs" { target alpha*-*-* } }
 ! { dg-additional-options "-mno-relax-pic-calls" { target mips*-*-* } }
+! { dg-additional-options "-fplt -mcmodel=normal" { target loongarch*-*-* } }
 !
 ! Check that assumed-shape variables are correctly passed to BIND(C)
 ! as defined in TS 29913
@@ -16,7 +17,8 @@ integer :: aa(4,4)
 call test(aa)
 end
 
-! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC" 1 { target { ! 
{ hppa*-*-* s390*-*-* *-*-cygwin* amdgcn*-*-* powerpc-ibm-aix* *-*-ming* } } } 
} }
+! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC" 1 { target { ! 
{ hppa*-*-* s390*-*-* *-*-cygwin* amdgcn*-*-* powerpc-ibm-aix* *-*-ming* 
loongarch*-*-* } } } } }
+! { dg-final { scan-assembler-times "bl\t%plt\\(myBindC\\)" 1 { target 
loongarch*-*-* } } }
 ! { dg-final { scan-assembler-times "myBindC,%r2" 1 { target { hppa*-*-* } } } 
}
 ! { dg-final { scan-assembler-times "call\tmyBindC" 1 { target { *-*-cygwin* 
*-*-ming* } } } }
 ! { dg-final { scan-assembler-times "brasl\t%r\[0-9\]*,myBindC" 1 { target { 
s390*-*-* } } } }
-- 
2.20.1



[PATCH v1 3/8] LoongArch: testsuite:Added test support for vect-{82, 83}.c.

2023-12-28 Thread chenxiaolong
When gcc enables the file test under gcc.dg/vect, it is found that vect-{82,
83}.c does not support the test. Through analysis, LoongArch architecture
supports the detection function of this test case. Therefore, the detection
of LoongArch architecture is added to the test rules to solve the situation
that the test is not supported.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-82.c:Add the LoongArch architecture to the
object detection framework.
* gcc.dg/vect/vect-83.c:Dito.
---
 gcc/testsuite/gcc.dg/vect/vect-82.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-83.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-82.c 
b/gcc/testsuite/gcc.dg/vect/vect-82.c
index 4b2d5a8a464..5c761e92a3a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-82.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-82.c
@@ -1,4 +1,4 @@
-/* { dg-skip-if "powerpc and integer vectorization only" { ! { powerpc*-*-* && 
vect_int } }  } */
+/* { dg-skip-if "powerpc/loongarch and integer vectorization only" { ! { { 
powerpc*-*-* || loongarch*-*-* } && vect_int } }  } */
 /* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-83.c 
b/gcc/testsuite/gcc.dg/vect/vect-83.c
index 1a173daa140..7fe1b050cee 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-83.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-83.c
@@ -1,4 +1,4 @@
-/* { dg-skip-if "powerpc and integer vectorization only" { ! { powerpc*-*-* && 
vect_int } }  } */
+/* { dg-skip-if "powerpc/loongarch and integer vectorization only" { ! { { 
powerpc*-*-* || loongarch*-*-* } && vect_int } }  } */
 /* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */
 
 #include 
-- 
2.20.1



[PATCH v1 6/8] LoongArch: testsuite:Added additional vectorization "-mlasx" compilation option.

2023-12-28 Thread chenxiaolong
After the detection procedure under the gcc.dg/vect directory was added to
GCC, FAIL entries of vector multiplication transformations of different types
appeared in the gcc regression test results. After debugging analysis, the main
problem is that the 128-bit vector of LoongArch architecture does not realize
this function. To solve this problem, the "-mlasx" option is used to enable the
256-bit vectorization implementation.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pattern-1.c:If you are testing on the
LoongArch architecture, you need to add the "-mlasx" compilation
option to generate vectorized code.
* gcc.dg/vect/slp-widen-mult-half.c:Dito.
* gcc.dg/vect/vect-widen-mult-const-s16.c:Dito.
* gcc.dg/vect/vect-widen-mult-const-u16.c:Dito.
* gcc.dg/vect/vect-widen-mult-half-u8.c:Dito.
* gcc.dg/vect/vect-widen-mult-half.c:Dito.
* gcc.dg/vect/vect-widen-mult-u16.c:Dito.
* gcc.dg/vect/vect-widen-mult-u8-s16-s32.c:Dito.
* gcc.dg/vect/vect-widen-mult-u8-u32.c:Dito.
* gcc.dg/vect/vect-widen-mult-u8.c:Dito.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c   | 1 +
 gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c| 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c  | 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c  | 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c| 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c   | 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c| 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c | 1 +
 10 files changed, 10 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
index a3ff0f5b3da..5ae99225273 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-mlasx" { target loongarch*-*-* } } */
 
 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c 
b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
index 72811eb852e..b69ade33886 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
@@ -1,6 +1,7 @@
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-mlasx" { target loongarch*-*-* } } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c 
b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
index dfbb2171c00..53c9b84ca01 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
@@ -2,6 +2,7 @@
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_int } */
 /* { dg-additional-options "-fno-ipa-icf" } */
+/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c 
b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
index c2ad58f69e7..e9db8285b66 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
@@ -2,6 +2,7 @@
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_int } */
 /* { dg-additional-options "-fno-ipa-icf" } */
+/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c 
b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
index bfdcbaa09fb..607f3178f90 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
@@ -2,6 +2,7 @@
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_int } */
 /* { dg-additional-options "-fno-ipa-icf" } */
+/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c 
b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
index e46b0cc3135..cd13d826937 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
@@ -1,6 +1,7 @@
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */
 
 #include "tree-vect.h"
 
diff --git 

[PATCH v1 7/8] LoongArch: testsuite:Added additional vectorization "-mlsx" compilation option.

2023-12-28 Thread chenxiaolong
When GCC is able to detect vectorized test cases in the common layer, FAIL
entries appear in some test cases after regression testing. The cause of the
error is that the vectorization option was not set when testing the program,
and the vectorization code could not be generated, so additional support for
the "-mlsx" option needed to be added back on the LoongArch architecture.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-2.c:Added additional "-mlsx"
compilation options.
* gcc.dg/tree-ssa/scev-16.c:Dito.
* gfortran.dg/graphite/vect-pr40979.f90:Dito.
* gfortran.dg/vect/fast-math-mgrid-resid.f:Dito.
---
 gcc/testsuite/gcc.dg/signbit-2.c   | 1 +
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c| 1 +
 gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90| 1 +
 gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
 4 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index 62bb4047d74..2f65df16e43 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
x86_64-*-* } } } */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
 /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
+/* { dg-additional-options "-mlsx" { target loongarch*-*-* } } */
 /* { dg-skip-if "no fallback for MVE" { arm_mve } } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
index 120f40c0b6c..acaa1156419 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
 /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mlsx" { target loongarch*-*-* } } */
 
 int A[1024 * 2];
 
diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 
b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
index a42290948c4..4c251aacbe3 100644
--- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
@@ -1,6 +1,7 @@
 ! { dg-do compile }
 ! { dg-require-effective-target vect_double }
 ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && ilp32 
} } }
+! { dg-additional-options "-mlsx" { target loongarch*-*-* } }
 
 module mqc_m
 integer, parameter, private :: longreal = selected_real_kind(15,90)
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 08965cc5e20..97b88821731 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -2,6 +2,7 @@
 ! { dg-require-effective-target vect_double }
 ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
 ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } 
}
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
 ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
 
 *** RESID COMPUTES THE RESIDUAL:  R = V - AU
-- 
2.20.1



[PATCH v1 5/8] LoongArch: testsuite:Modify the test behavior in file pr60510.f.

2023-12-28 Thread chenxiaolong
When using binutils that does not support vectorization and gcc compiler
toolchain that supports vectorization, regression tests found that pr60510.f
had a FAIL entry. The reason is that the default setting of the program is
the execution state, which will cause problems in the assembly stage when the
vector instructions cannot be identified. In order to solve this problem, the
default behavior set to run was removed, and the behavior of the program
depends on whether the software supports vectorization or not.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/pr60510.f:Delete the default behavior of the
program.
---
 gcc/testsuite/gfortran.dg/vect/pr60510.f | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/pr60510.f 
b/gcc/testsuite/gfortran.dg/vect/pr60510.f
index 6cae82acece..d4fd42a664a 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr60510.f
+++ b/gcc/testsuite/gfortran.dg/vect/pr60510.f
@@ -1,4 +1,3 @@
-! { dg-do run }
 ! { dg-require-effective-target vect_double }
 ! { dg-require-effective-target vect_intdouble_cvt }
 ! { dg-additional-options "-fno-inline -ffast-math" }
-- 
2.20.1



[PATCH v1 2/8] LoongArch: testsuite:Modify the test behavior of the vect-bic-bitmask-{12, 23}.c file.

2023-12-28 Thread chenxiaolong
When the toolchain is built using binutils that does not support vectorization
and gcc that supports vectorization, the regression test results of GCC show
that the vect-bic-bitmask-{12,23}.c file fails.  The reason is that it carries
out two stages of compilation and assembly test, in the assembly stage there is
no identification of vector instructions, but in fact only need to carry out
the compilation stage.  To solve this problem, change the default set of
assembly to compile only, so that other architectures do not have similar
problems.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-bic-bitmask-12.c:Change the default
setting of assembly to compile.
* gcc.dg/vect/vect-bic-bitmask-23.c:Dito.
---
 gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c
index 36ec5a8b19b..213e4c2a418 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c
@@ -1,5 +1,5 @@
 /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
-/* { dg-do assemble } */
+/* { dg-do compile } */
 /* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c
index 5b4c3b6e19b..5dceb4bbcb6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c
@@ -1,5 +1,5 @@
 /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
-/* { dg-do assemble } */
+/* { dg-do compile } */
 /* { dg-additional-options "-O1 -fdump-tree-dce -w" } */
 
 #include 
-- 
2.20.1



[PATCH v1 1/8] LoongArch: testsuite:Add detection procedures supported by the target.

2023-12-28 Thread chenxiaolong
In order to improve and check the function of vector quantization in
LoongArch architecture, tests on vector instruction set are provided
in target-support.exp.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp:Add LoongArch to the list of supported
targets.
---
 gcc/testsuite/lib/target-supports.exp | 219 +++---
 1 file changed, 161 insertions(+), 58 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 14e3e119792..b90aaf8cabe 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3811,7 +3811,11 @@ proc add_options_for_bfloat16 { flags } {
 # (fma, fms, fnma, and fnms) for both float and double.
 
 proc check_effective_target_scalar_all_fma { } {
-return [istarget aarch64*-*-*]
+if { [istarget aarch64*-*-*] 
+|| [istarget loongarch*-*-*]} {
+   return 1
+}
+return 0
 }
 
 # Return 1 if the target supports compiling fixed-point,
@@ -4017,7 +4021,7 @@ proc check_effective_target_vect_cmdline_needed { } {
 || ([istarget arm*-*-*] && [check_effective_target_arm_neon])
 || [istarget aarch64*-*-*]
 || [istarget amdgcn*-*-*]
-|| [istarget riscv*-*-*]} {
+|| [istarget riscv*-*-*] } {
return 0
} else {
return 1
@@ -4047,6 +4051,8 @@ proc check_effective_target_vect_int { } {
 && [check_effective_target_s390_vx])
 || ([istarget riscv*-*-*]
 && [check_effective_target_riscv_v])
+|| ([istarget loongarch*-*-*]
+&& [check_effective_target_loongarch_sx])
}}]
 }
 
@@ -4176,7 +4182,9 @@ proc check_effective_target_vect_intfloat_cvt { } {
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vxe2])
 || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
+&& [check_effective_target_riscv_v])
+|| ([istarget loongarch*-*-*]
+&& [check_effective_target_loongarch_sx]) }}]
 }
 
 # Return 1 if the target supports signed double->int conversion
@@ -4197,7 +4205,9 @@ proc check_effective_target_vect_doubleint_cvt { } {
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vx])
 || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
+&& [check_effective_target_riscv_v])
+|| ([istarget loongarch*-*-*]
+&& [check_effective_target_loongarch_sx]) }}]
 }
 
 # Return 1 if the target supports signed int->double conversion
@@ -4218,7 +4228,9 @@ proc check_effective_target_vect_intdouble_cvt { } {
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vx])
 || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
+&& [check_effective_target_riscv_v])
+|| ([istarget loongarch*-*-*]
+&& [check_effective_target_loongarch_sx]) }}]
 }
 
 #Return 1 if we're supporting __int128 for target, 0 otherwise.
@@ -4251,7 +4263,9 @@ proc check_effective_target_vect_uintfloat_cvt { } {
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vxe2])
 || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
+&& [check_effective_target_riscv_v])
+|| ([istarget loongarch*-*-*]
+&& [check_effective_target_loongarch_sx]) }}]
 }
 
 
@@ -4270,7 +4284,9 @@ proc check_effective_target_vect_floatint_cvt { } {
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vxe2])
 || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
+&& [check_effective_target_riscv_v])
+|| ([istarget loongarch*-*-*]
+&& [check_effective_target_loongarch_sx]) }}]
 }
 
 # Return 1 if the target supports unsigned float->int conversion
@@ -4287,7 +4303,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
|| ([istarget s390*-*-*]
&& [check_effective_target_s390_vxe2])
|| ([istarget riscv*-*-*]
-   && [check_effective_target_riscv_v]) }}]
+   && [check_effective_target_riscv_v])
+   || ([istarget loongarch*-*-*]
+   && [check_effective_target_loongarch_sx]) }}]
 }
 
 # Return 1 if the target supports vector integer char -> long long extend optab
@@ -4296,7 +4314,9 @@ proc check_effective_target_vect_floatuint_cvt { } {
 proc check_effective_target_vect_ext_char_longlong { } {
 return [check_cached_effective_target_indexed vect_ext_char_longlong {
   expr { ([istarget riscv*-*-*]
- && [check_effective_target_riscv_v]) }}]
+ && [check_effective_target_riscv_v])
+ || ([istarget 

Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread joshua
H Juzhe,

This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.

BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"? It adds support for new xtheadvector
instructions. Is it OK to be merged?

Joshua






--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 09:58
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector


I am confused by the series patches.


I thought this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html 
is enough to support partial theadvector that can leverage directly RVV1.0 ?


Could clean up and resend the patches base on patch above (supposed it is 
merged already) ?


juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2023-12-29 09:46
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md    |   8 +-
 gcc/config/riscv/riscv-string.cc  |   3 +
 gcc/config/riscv/riscv-v.cc   |  13 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   3 +
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv-vsetvl.cc  |  10 +
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h    |  49 +
 gcc/config/riscv/thead-vector.md  | 142 +
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md    |  36 +++-
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 

回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread joshua
H Juzhe,

This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.

BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"?It adds support new xtheadvector
instructions. Is it OK to be merged?

Joshua






--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月29日(星期五) 09:58
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector


I am confused by the series patches.


I thought this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html 
is enough to support partial theadvector that can leverage directly RVV1.0 ?


Could clean up and resend the patches base on patch above (supposed it is 
merged already) ?


juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2023-12-29 09:46
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and 
Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md    |   8 +-
 gcc/config/riscv/riscv-string.cc  |   3 +
 gcc/config/riscv/riscv-v.cc   |  13 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   3 +
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv-vsetvl.cc  |  10 +
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h    |  49 +
 gcc/config/riscv/thead-vector.md  | 142 +
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md    |  36 +++-
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 

[PATCH v1 0/8] LoongArch:Enable testing for common

2023-12-28 Thread chenxiaolong
When using binutils, which does not support vectorization, and the gcc compiler
toolchain, which does support vectorization, the following two types of error
problems occur in gcc regression testing.

1.Failure of common tests in the gcc.dg/vect directory???

Regression testing of GCC has found that vect-bic-bitmask-{12/23}.c has errors
at compile time, and similar problems exist on various architectures (e.g. x86,
aarch64,riscv, etc.). The reason is that the behavior of the program is the
assembly state, and the vector instruction cannot be recognized in the assembly
stage and an error occurs.

2.FAIL items of common vectorization tests are supported.

When LoongArch architecture supports common vector test cases, GCC regression
testing has many failures. Reasons include a lack of detection of targets
Rules, lack of vectorization options, lack of specific compilation options,
check for instruction set differences and test behavior for program Settings,
etc. For details, see the following patches:

chenxiaolong (8):
  LoongArch: testsuite:Add detection procedures supported by the target.
  LoongArch: testsuite:Modify the test behavior of the
vect-bic-bitmask-{12,23}.c file.
  LoongArch: testsuite:Added test support for vect-{82,83}.c.
  LoongArch: testsuite:Fix FAIL in file bind_c_array_params_2.f90.
  LoongArch: testsuite:Modify the test behavior in file pr60510.f.
  LoongArch: testsuite:Added additional vectorization "-mlasx"
compilation option.
  LoongArch: testsuite:Added additional vectorization "-mlsx"
compilation option.
  LoongArch: testsuite:Modify the result check in the FMA file.

 gcc/testsuite/gcc.dg/fma-3.c  |   2 +-
 gcc/testsuite/gcc.dg/fma-4.c  |   2 +-
 gcc/testsuite/gcc.dg/fma-6.c  |   2 +-
 gcc/testsuite/gcc.dg/fma-7.c  |   2 +-
 gcc/testsuite/gcc.dg/signbit-2.c  |   1 +
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c   |   1 +
 gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c  |   1 +
 .../gcc.dg/vect/slp-widen-mult-half.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-82.c   |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-83.c   |   2 +-
 .../gcc.dg/vect/vect-bic-bitmask-12.c |   2 +-
 .../gcc.dg/vect/vect-bic-bitmask-23.c |   2 +-
 .../gcc.dg/vect/vect-widen-mult-const-s16.c   |   1 +
 .../gcc.dg/vect/vect-widen-mult-const-u16.c   |   1 +
 .../gcc.dg/vect/vect-widen-mult-half-u8.c |   1 +
 .../gcc.dg/vect/vect-widen-mult-half.c|   1 +
 .../gcc.dg/vect/vect-widen-mult-u16.c |   1 +
 .../gcc.dg/vect/vect-widen-mult-u8-s16-s32.c  |   1 +
 .../gcc.dg/vect/vect-widen-mult-u8-u32.c  |   1 +
 .../gcc.dg/vect/vect-widen-mult-u8.c  |   1 +
 .../gfortran.dg/bind_c_array_params_2.f90 |   4 +-
 .../gfortran.dg/graphite/vect-pr40979.f90 |   1 +
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |   1 +
 gcc/testsuite/gfortran.dg/vect/pr60510.f  |   1 -
 gcc/testsuite/lib/target-supports.exp | 219 +-
 25 files changed, 186 insertions(+), 68 deletions(-)

-- 
2.20.1



Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread YunQiang Su
In general, I agree with this change.
When gcc12 on RV64, more than one `sext.w` will be produced with our test.
(Note, use -O1).

>
> There are two things that help here.  The first is that the most significant
> bit never appears in the middle of a field, so we don't have to worry about
> overlapping, nor writes to the paradoxical bits of the SUBREG.  And secondly,
> bits are numbered from zero for least significant, to MODE_BITSIZE (mode) - 1
> for most significant, irrespective of the endian-ness.  So the code only needs

I am worrying that the higher bits than MODE_BITSIZE (mode) - 1 are also
modified. In this case, we also need do truncate/sign_extend.
While I cannot produce this C code yet.

> to check the highest value bitpos + bitsize is the maximum value for the mode.
> The above logic stays the same, but which byte insert requires extension will
> change between mips64be and mips64le.  i.e. we test that the most significant
> bit of the field/byte being written in the most significant bit of the SUBREG
> target. [That's my understanding/rationalization, I could wrong].
>

The bit higher than MODE_BITSIZE (mode) - 1 also matters.
Since MIPS ISA claims that the src register of SImode instructions should
be sign_extended, otherwise UNPREDICTABLE.
It means,
   li $r2, 0xfff0   0001
   #  ^
   addu $r1, $r0, $r2
is not allowed.

> One thing I could be more cautious about is using maybe_eq instead of
> known_eq, but the rest of the code (including truly_noop_truncation) assumes
> scalar integer modes, so variable length vectors aren't (yet) a concern.
> Would using maybe_eq be better coding style?
>
>
> Cheers,
> Roger
> --
>
>


Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread YunQiang Su
Jeff Law  于2023年12月29日周五 02:23写道:
>
>
>
> On 12/28/23 07:59, Roger Sayle wrote:
> >
> > This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
> > that fields are written into a pseudo register that needs to be kept sign
> > extended.
> Well, I think "fixes" is a bit of a stretch.  We're avoiding the issue
> by changing the early RTL generation, but if I understand what's going
> on in the RTL optimizers and MIPS backend correctly, the core bug still
> remains.  Admittedly I haven't put it under a debugger, but that MIPS
> definition of NOOP_TRUNCATION just seems badly wrong and is just waiting
> to pop it's ugly head up again.
>

Yes. I am trying to get rid of it from MIPS64.
It may reduce our maintain workload.

>
>
> >
> > The motivating example from the bugzilla PR is:
> >
> > extern void ext(int);
> > void foo(const unsigned char *buf) {
> >int val;
> >((unsigned char*))[0] = *buf++;
> >((unsigned char*))[1] = *buf++;
> >((unsigned char*))[2] = *buf++;
> >((unsigned char*))[3] = *buf++;
> >if(val > 0)
> >  ext(1);
> >else
> >  ext(0);
> > }
> >
> > which at the end of the tree optimization passes looks like:
> >
> > void foo (const unsigned char * buf)
> > {
> >int val;
> >unsigned char _1;
> >unsigned char _2;
> >unsigned char _3;
> >unsigned char _4;
> >int val.5_5;
> >
> > [local count: 1073741824]:
> >_1 = *buf_7(D);
> >MEM[(unsigned char *)] = _1;
> >_2 = MEM[(const unsigned char *)buf_7(D) + 1B];
> >MEM[(unsigned char *) + 1B] = _2;
> >_3 = MEM[(const unsigned char *)buf_7(D) + 2B];
> >MEM[(unsigned char *) + 2B] = _3;
> >_4 = MEM[(const unsigned char *)buf_7(D) + 3B];
> >MEM[(unsigned char *) + 3B] = _4;
> >val.5_5 = val;
> >if (val.5_5 > 0)
> >  goto ; [59.00%]
> >else
> >  goto ; [41.00%]
> >
> > [local count: 633507681]:
> >ext (1);
> >goto ; [100.00%]
> >
> > [local count: 440234144]:
> >ext (0);
> >
> > [local count: 1073741824]:
> >val ={v} {CLOBBER(eol)};
> >return;
> >
> > }
> >
> > Here four bytes are being sequentially written into the SImode value
> > val.  On some platforms, such as MIPS64, this SImode value is kept in
> > a 64-bit register, suitably sign-extended.  The function expand_assignment
> > contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
> > in expr.cc) which outputs an explicit extension operation after each
> > store_field (typically insv) to such promoted/extended pseudos.
> >
> > The first observation is that there's no need to perform sign extension
> > after each byte in the example above; the extension is only required
> > after changes to the most significant byte (i.e. to a field that overlaps
> > the most significant bit).
> True.
>
>
> >
> > The bug fix is actually a bit more subtle, but at this point during
> > code expansion it's not safe to use a SUBREG when sign-extending this
> > field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
> > but combine (and other RTL optimizers) later realize that because SImode
> > values are always sign-extended in their 64-bit hard registers that
> > this is a no-op and eliminates it.  The trouble is that it's unsafe to
> > refer to the SImode lowpart of a 64-bit register using SUBREG at those
> > critical points when temporarily the value isn't correctly sign-extended,
> > and the usual backend invariants don't hold.  At these critical points,
> > the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
> > TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
> > (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.
>
>
> >
> > Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
> > imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
> > patch that I'll post in a few minutes.
> >
> >
> > This middle-end patch has been tested on x86_64-pc-linux-gnu with
> > make bootstrap and make -k check, both with and without
> > --target_board=unix{-m32} with no new failures.  The cc1 from a
> > cross-compiler to mips64 appears to generate much better code for
> > the above test case.  Ok for mainline?
> >
> >
> > 2023-12-28  Roger Sayle  
> >
> > gcc/ChangeLog
> >  PR rtl-optimization/104914
> >  * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
> >  a sign or zero extension is only required if the modified field
> >  overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
> >  targets, don't refer to the temporarily incorrectly extended value
> >  using a SUBREG, but instead generate an explicit TRUNCATE rtx.
> [ ... ]
>
>
> > +   /* Check if the field overlaps the MSB, requiring extension.  */
> > +   else if (known_eq (bitpos + bitsize,
> > +  GET_MODE_BITSIZE (GET_MODE (to_rtx
> Do you need to look at the size of the 

[PATCH v4 6/6] RISC-V: Add support for xtheadvector-specific intrinsics.

2023-12-28 Thread Jun Sha (Joshua)
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: Add new patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 .../riscv/riscv-vector-builtins-shapes.cc | 126 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   3 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 313 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/t-riscv  |  16 +
 .../riscv/thead-vector-builtins-functions.def |  39 +++
 gcc/config/riscv/thead-vector-builtins.cc | 200 +++
 gcc/config/riscv/thead-vector-builtins.h  |  64 
 gcc/config/riscv/thead-vector.md  | 253 ++
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |  68 
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |  68 
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |  68 
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |  68 
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |  68 
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |  68 
 17 files changed, 1545 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/thead-vector-builtins.cc
 create mode 100644 gcc/config/riscv/thead-vector-builtins.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 

[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-28 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   8 +-
 gcc/config/riscv/riscv-string.cc  |   3 +
 gcc/config/riscv/riscv-v.cc   |  13 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   3 +
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv-vsetvl.cc  |  10 +
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h|  49 +
 gcc/config/riscv/thead-vector.md  | 142 +
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md|  36 +++-
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 17 files changed, 474 insertions(+), 189 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..1445d98c147 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8b8a92f10a1..1fac56c7095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !TARGET_XTHEADVECTOR"
   {
 

[PATCH v1] LoongArch: testsuite:Fix FAIL in lasx-xvstelm.c file.

2023-12-28 Thread chenxiaolong
After implementing the cost model on the LoongArch architecture, the GCC
compiler code has this feature turned on by default, which causes the
lasx-xvstelm.c file test to fail. Through analysis, this test case can
generate vectorization instructions required for detection only after
disabling the functionality of the cost model with the "-fno-vect-cost-model"
compilation option.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvstelm.c:Add compile
option "-fno-vect-cost-model" to dg-options.
---
 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c
index 1a7b0e86f8b..4b846204a65 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mlasx" } */
+/* { dg-options "-O3 -mlasx -fno-vect-cost-model" } */
 /* { dg-final { scan-assembler-times "xvstelm.w" 8} } */
 
 #define LEN 256
-- 
2.20.1



回复:[PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc

2023-12-28 Thread joshua
Hi Jeff,

Perhaps fold_fault_load cannot be moved to riscv-protos.h since
gimple_folder is declared in riscv-vector-builtins.h. It's not reasonable
to include riscv-vector-builtins.h in riscv-protos.h. 

In fact, fold_fault_load is defined specially for some builtin functions, and
it would be better to just prototype in riscv-vector-builtins-bases.h.

Joshua






--
发件人:Jeff Law 
发送时间:2023年12月21日(星期四) 02:14
收件人:"Jun Sha (Joshua)"; 
"gcc-patches"
抄 送:"jim.wilson.gcc"; palmer; 
andrew; "philipp.tomsich"; 
"christoph.muellner"; 
"juzhe.zhong"; Jin Ma; Xianmiao 
Qu
主 题:Re: [PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc




On 12/20/23 05:25, Jun Sha (Joshua) wrote:
> This patch moves the definition of the enums lst_type and
> frm_op_type into riscv-vector-builtins-bases.h and removes
> the static visibility of fold_fault_load(), so these
> can be used in other compile units.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vector-builtins-bases.cc (enum lst_type):
>  (enum frm_op_type): move to riscv-vector-builtins-bases.h
>  * config/riscv/riscv-vector-builtins-bases.h
>  (GCC_RISCV_VECTOR_BUILTINS_BASES_H): Add header files.
>  (enum lst_type): move from
>  (enum frm_op_type): riscv-vector-builtins-bases.cc
>  (fold_fault_load): riscv-vector-builtins-bases.cc
I'm largely hoping to leave the heavy review lifting here to Juzhe who 
knows GCC's RV vector bits as well as anyone.

Just one small issue.  Would it be better to prototype fold_fault_load 
elsewhere and avoid the gimple.h inclusion in 
riscv-vector-builtins-bases.h?  Perhaps riscv-protos.h?

You might consider prefixing the function name with riscv_.  It's not 
strictly necessary, but it appears to be relatively common in risc-v port.

Thanks,
Jeff

[Committed] RISC-V: Robostify testcase pr113112-1.c

2023-12-28 Thread Juzhe-Zhong
The redudant dump check is fragile and easily changed, not necessary.

Tested on both RV32/RV64 no regression.

Remove it and committed.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks.

---
 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
index 95df7809d49..2dc39ad8e8b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
@@ -24,6 +24,3 @@ foo (int n){
 /* { dg-final { scan-assembler-not {jr} } } */
 /* { dg-final { scan-assembler-times {ret} 1 } } */
 /* { dg-final { scan-tree-dump-times "Preferring smaller LMUL loop because it 
has unexpected spills" 1 "vect" } } */
-/* { dg-final { scan-tree-dump "At most 8 number of live V_REG at program 
point 1 for bb 4" "vect" } } */
-/* { dg-final { scan-tree-dump "At most 40 number of live V_REG at program 
point 1 for bb 3" "vect" } } */
-/* { dg-final { scan-tree-dump "At most 8 number of live V_REG at program 
point 1 for bb 5" "vect" } } */
-- 
2.36.3



[PATCH] RISC-V: Count pointer type SSA into RVV regs liveness for dynamic LMUL cost model

2023-12-28 Thread Juzhe-Zhong
This patch fixes the following choosing unexpected big LMUL which cause 
register spillings.

Before this patch, choosing LMUL = 4:

addisp,sp,-160
addiw   t1,a2,-1
li  a5,7
bleut1,a5,.L16
vsetivlizero,8,e64,m4,ta,ma
vmv.v.x v4,a0
vs4r.v  v4,0(sp)---> spill to the stack.
vmv.v.x v4,a1
addia5,sp,64
vs4r.v  v4,0(a5)---> spill to the stack.

The root cause is the following codes:

  if (poly_int_tree_p (var)
  || (is_gimple_val (var)
 && !POINTER_TYPE_P (TREE_TYPE (var

We count the variable as consuming a RVV reg group when it is not POINTER_TYPE.

It is right for load/store STMT for example:

_1 = (MEM)*addr -->  addr won't be allocated an RVV vector group.

However, we find it is not right for non-load/store STMT:

_3 = _1 == x_8(D);

_1 is pointer type too but we does allocate a RVV register group for it.

So after this patch, we are choosing the perfect LMUL for the testcase in this 
patch:

ble a2,zero,.L17
addiw   a7,a2,-1
li  a5,3
bleua7,a5,.L15
srliw   a5,a7,2
sllia6,a5,1
add a6,a6,a5
lui a5,%hi(replacements)
addit1,a5,%lo(replacements)
sllia6,a6,5
lui t4,%hi(.LANCHOR0)
lui t3,%hi(.LANCHOR0+8)
lui a3,%hi(.LANCHOR0+16)
lui a4,%hi(.LC1)
vsetivlizero,4,e16,mf2,ta,ma
addit4,t4,%lo(.LANCHOR0)
addit3,t3,%lo(.LANCHOR0+8)
addia3,a3,%lo(.LANCHOR0+16)
addia4,a4,%lo(.LC1)
add a6,t1,a6
addia5,a5,%lo(replacements)
vle16.v v18,0(t4)
vle16.v v17,0(t3)
vle16.v v16,0(a3)
vmsgeu.vi   v25,v18,4
vadd.vi v24,v18,-4
vmsgeu.vi   v23,v17,4
vadd.vi v22,v17,-4
vlm.v   v21,0(a4)
vmsgeu.vi   v20,v16,4
vadd.vi v19,v16,-4
vsetvli zero,zero,e64,m2,ta,mu
vmv.v.x v12,a0
vmv.v.x v14,a1
.L4:
vlseg3e64.v v6,(a5)
vmseq.vvv2,v6,v12
vmseq.vvv0,v8,v12
vmsne.vvv1,v8,v12
vmand.mmv1,v1,v2
vmerge.vvm  v2,v8,v14,v0
vmv1r.v v0,v1
addia4,a5,24
vmerge.vvm  v6,v6,v14,v0
vmerge.vim  v2,v2,0,v0
vrgatherei16.vv v4,v6,v18
vmv1r.v v0,v25
vrgatherei16.vv v4,v2,v24,v0.t
vs1r.v  v4,0(a5)
addia3,a5,48
vmv1r.v v0,v21
vmv2r.v v4,v2
vcompress.vmv4,v6,v0
vs1r.v  v4,0(a4)
vmv1r.v v0,v23
addia4,a5,72
vrgatherei16.vv v4,v6,v17
vrgatherei16.vv v4,v2,v22,v0.t
vs1r.v  v4,0(a3)
vmv1r.v v0,v20
vrgatherei16.vv v4,v6,v16
addia5,a5,96
vrgatherei16.vv v4,v2,v19,v0.t
vs1r.v  v4,0(a4)
bne a6,a5,.L4

No spillings, no "sp" register used.

Tested on both RV32 and RV64, no regression.

Ok for trunk ?

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Fix 
pointer type liveness count.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 12 ++--
 .../vect/costmodel/riscv/rvv/pr113112-4.c | 28 +++
 2 files changed, 37 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 0c485dc4f29..b41a79429d4 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -277,9 +277,12 @@ compute_local_live_ranges (
{
  unsigned int point = program_point.point;
  gimple *stmt = program_point.stmt;
+ stmt_vec_info stmt_info = program_point.stmt_info;
  tree lhs = gimple_get_lhs (stmt);
  if (lhs != NULL_TREE && is_gimple_reg (lhs)
- && !POINTER_TYPE_P (TREE_TYPE (lhs)))
+ && (!POINTER_TYPE_P (TREE_TYPE (lhs))
+ || STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
+  != store_vec_info_type))
{
  biggest_mode = get_biggest_mode (biggest_mode,
   TYPE_MODE (TREE_TYPE (lhs)));
@@ -305,7 +308,10 @@ compute_local_live_ranges (
 the future.  */
  if (poly_int_tree_p (var)
  || (is_gimple_val (var)
- && !POINTER_TYPE_P (TREE_TYPE (var
+ && (!POINTER_TYPE_P (TREE_TYPE (var))
+ || STMT_VINFO_TYPE (
+ 

Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

2023-12-28 Thread Jeff Law




On 12/28/23 17:42, Li, Pan2 wrote:

Thanks Jeff for comments, and Happy new year!


Interesting.  So I'd actually peel one more layer off this onion.  Why
do the aarch64 and riscv targets generate different constants (0.0 vs
-0.0)?


Yeah, it surprise me too when debugging the foo function. But didn't dig into 
it in previous as it may be unrelated to vectorize.


Is it possible that the aarch64 is generating 0.0 when asked for -0.0
and -fno-signed-zeros is in effect?  That's a valid thing to do when
-fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64
backend.


Sure, will have a try for making the -0.0 happen in aarch64.
I would first look at the .optimized dump, then I'd look at the .final 
dump alongside the resulting assembly for aarch64.


I bet we're going to find that the aarch64 target internally converts 
-0.0 to 0.0 when we're not honoring signed zeros.


jeff


RE: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

2023-12-28 Thread Li, Pan2
Thanks Jeff for comments, and Happy new year!

> Interesting.  So I'd actually peel one more layer off this onion.  Why 
> do the aarch64 and riscv targets generate different constants (0.0 vs 
> -0.0)?

Yeah, it surprise me too when debugging the foo function. But didn't dig into 
it in previous as it may be unrelated to vectorize.

> Is it possible that the aarch64 is generating 0.0 when asked for -0.0 
> and -fno-signed-zeros is in effect?  That's a valid thing to do when 
> -fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64 
> backend.

Sure, will have a try for making the -0.0 happen in aarch64.

Pan


-Original Message-
From: Jeff Law  
Sent: Friday, December 29, 2023 12:39 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with 
variable factor



On 12/26/23 02:34, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to XFAIL the test case pr30957-1.c for the RVV when
> build the elf with some configurations (list at the end of the log)
> It will be vectorized during vect_transform_loop with a variable factor.
> It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
> Of course, it will do nothing during unroll_loops when loop->unroll is 1.
> 
> The aarch64_sve may have the similar issue but it initialize the const
> `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
> Then it will pass the execution test.
> 
> aarch64:
> moviv0.2s, #0x0
> stp x29, x30, [sp, #-16]!
> mov w0, #0xa
> mov x29, sp
> bl  400280  <== s0 is +0.0
> 
> Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
> `-0.0`, and then pass it to the function foo. Of course it the execution
> test will fail.
> 
> riscv:
> flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
> addisp,sp,-16
> li  a0,10
> sd  ra,8(sp)
> jal 101fc   <== fa0 is -0.0
> 
> After this patch the loops vectorized with a variable factor of the RVV
> will be treated as XFAIL by the tree dump when riscv_v and
> variable_vect_length.
> 
> The below configurations are validated as XFAIL for RV64.
Interesting.  So I'd actually peel one more layer off this onion.  Why 
do the aarch64 and riscv targets generate different constants (0.0 vs 
-0.0)?

Is it possible that the aarch64 is generating 0.0 when asked for -0.0 
and -fno-signed-zeros is in effect?  That's a valid thing to do when 
-fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64 
backend.



Jeff


Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

2023-12-28 Thread Harald Anlauf

Hi Rimvydas!

Am 28.12.23 um 08:09 schrieb Rimvydas Jasinskas:

On Wed, Dec 27, 2023 at 10:34 PM Harald Anlauf  wrote:

The patch is almost fine, except for a strange wording here:

+@smallexample
+gfortran -save-temps -c foo.F90
+@end smallexample
+
+preprocesses to in @file{foo.fii}, compiles to an intermediate
+@file{foo.s}, and then assembles to the (implied) output file
+@file{foo.o}, whereas:

I understand the formulation is copied from gcc/doc/invoke.texi,
where it does not fully make sense to me either.

How about:

"preprocesses input file @file{foo.F90} to @file{foo.fii}, ..."

Furthermore,

+@smallexample
+gfortran -save-temps -S foo.F
+@end smallexample
+
+saves the (no longer) temporary preprocessed file in @file{foo.fi}, and
+then compiles to the (implied) output file @file{foo.s}.

Even if this is copied from the gcc texinfo file, how about:

"saves the preprocessor output in @file{foo.fi}, ..."

which I find easier to read.

Can you also add a reference to the PR number in the commit message?

I agree, wording sounds a lot better, included in v2 together with PR number.


Yes, this is OK.

Pushed: https://gcc.gnu.org/g:2cb93e6686e4af5725d8c919cf19f535a7f3aa33

Thanks for the patch!


Is there a specific reason thy -fc-prototypes (Interoperability
Options section) is excluded from manpage?


Can you be more specific?  I get here (since gcc-9):

% man /opt/gcc/14/share/man/man1/gfortran.1 |grep -A 1 "Interoperability
Options"
 Interoperability Options
 -fc-prototypes -fc-prototypes-external

although no detailed explanation (-> gfortran.info).

The https://gcc.gnu.org/onlinedocs/gfortran/Invoking-GNU-Fortran.html
does contain a working link to
https://gcc.gnu.org/onlinedocs/gfortran/Interoperability-Options.html
However the manpage has Interoperability section explicitly disabled
with "@c man end" ... "@c man begin ENVIRONMENT".
After digging into git log it seems that Interoperability section was
unintentionally added after this comment mark in
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e655a6cc43


Yes, that might have been unintentional.

Can you open a PR, and if you have a fix, attach it there?

Thanks,
Harald


Best regards,
Rimvydas




RE: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread Roger Sayle


Hi Jeff,
Thanks for the speedy review.

> On 12/28/23 07:59, Roger Sayle wrote:
> > This patch fixes PR rtl-optmization/104914 by tweaking/improving the
> > way that fields are written into a pseudo register that needs to be
> > kept sign extended.
> Well, I think "fixes" is a bit of a stretch.  We're avoiding the issue by 
> changing the
> early RTL generation, but if I understand what's going on in the RTL 
> optimizers
> and MIPS backend correctly, the core bug still remains.  Admittedly I haven't 
> put it
> under a debugger, but that MIPS definition of NOOP_TRUNCATION just seems
> badly wrong and is just waiting to pop it's ugly head up again.

I think this really is the/a correct fix. The MIPS backend defines 
NOOP_TRUNCATION
to false, so it's not correct to use a SUBREG to convert from DImode to SImode.
The problem then is where in the compiler (middle-end or backend) is this 
invalid
SUBREG being created and how can it be fixed.  In this particular case, the 
fault
is in RTL expansion.  There may be other places where a SUBREG is 
inappropriately
used instead of a TRUNCATE, but this is the place where things go wrong for
PR rtl-optimization/104914.

Once an inappropriate SImode SUBREG is in the RTL stream, it can remain
harmlessly latent (most of the time), unless it gets split, simplified or 
spilled.
Copying this SImode expression into it's own pseudo, results in incorrect code.
One approach might be to use an UNSPEC for places where backend
invariants are temporarily invalid, but in this case it's machine independent
middle-end code that's using SUBREGs as though the target was an x86/pdp11.

So I agree that on the surface, both of these appear to be identical:
> (set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI
> (set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI

But should they get split or spilled by reload:

(set (reg_tmp:SI) (subreg:SI (reg:DI))
(set (reg:DI) (sign_extend:DI (reg_tmp:SI))

is invalid as the reg_tmp isn't correctly sign-extended for SImode.
But,

(set (reg_tmp:SI) (truncate:SI (reg:DI))
(set (reg:DI) (sign_extend:DI (reg_tmp:SI))

is fine.  The difference is the instant in time, when the SUBREG's invariants
aren't yet valid (and its contents shouldn't be thought of as SImode).

On nvptx, where truly_noop_truncation is always "false", it'd show the same
bug/failure, if it were not for that fact that nvptx doesn't attempt to store
values in "mode extended" (SUBREG_PROMOTED_VAR_P) registers.
The bug is really in MODE_REP_EXTENDED support.

> > The motivating example from the bugzilla PR is:
> >
> > extern void ext(int);
> > void foo(const unsigned char *buf) {
> >int val;
> >((unsigned char*))[0] = *buf++;
> >((unsigned char*))[1] = *buf++;
> >((unsigned char*))[2] = *buf++;
> >((unsigned char*))[3] = *buf++;
> >if(val > 0)
> >  ext(1);
> >else
> >  ext(0);
> > }
> >
> > which at the end of the tree optimization passes looks like:
> >
> > void foo (const unsigned char * buf)
> > {
> >int val;
> >unsigned char _1;
> >unsigned char _2;
> >unsigned char _3;
> >unsigned char _4;
> >int val.5_5;
> >
> > [local count: 1073741824]:
> >_1 = *buf_7(D);
> >MEM[(unsigned char *)] = _1;
> >_2 = MEM[(const unsigned char *)buf_7(D) + 1B];
> >MEM[(unsigned char *) + 1B] = _2;
> >_3 = MEM[(const unsigned char *)buf_7(D) + 2B];
> >MEM[(unsigned char *) + 2B] = _3;
> >_4 = MEM[(const unsigned char *)buf_7(D) + 3B];
> >MEM[(unsigned char *) + 3B] = _4;
> >val.5_5 = val;
> >if (val.5_5 > 0)
> >  goto ; [59.00%]
> >else
> >  goto ; [41.00%]
> >
> > [local count: 633507681]:
> >ext (1);
> >goto ; [100.00%]
> >
> > [local count: 440234144]:
> >ext (0);
> >
> > [local count: 1073741824]:
> >val ={v} {CLOBBER(eol)};
> >return;
> >
> > }
> >
> > Here four bytes are being sequentially written into the SImode value
> > val.  On some platforms, such as MIPS64, this SImode value is kept in
> > a 64-bit register, suitably sign-extended.  The function
> > expand_assignment contains logic to handle this via
> > SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an
> > explicit extension operation after each store_field (typically insv) to such
> promoted/extended pseudos.
> >
> > The first observation is that there's no need to perform sign
> > extension after each byte in the example above; the extension is only
> > required after changes to the most significant byte (i.e. to a field
> > that overlaps the most significant bit).
> True.
> 
> > The bug fix is actually a bit more subtle, but at this point during
> > code expansion it's not safe to use a SUBREG when sign-extending this
> > field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI)
> > 0)) but combine (and other RTL optimizers) later realize that because
> > SImode values are always sign-extended in their 64-bit hard registers
> > that this is a no-op and eliminates 

Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-28 Thread Jeff Law




On 12/24/23 05:24, Roger Sayle wrote:



What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't
actually a truncation!  The output precision is first, the input
precision is second.  The docs explicitly state the output precision
should be smaller than the input precision (which makes sense for truncation).

That's where I'd start with trying to untangle this mess.


Thanks (both) for correcting my misunderstanding.
At the very least might I suggest that we introduce a new
TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS can use for this
purpose?  It'd help reduce confusion, and keep the
documentation/function naming correct.



Yes. It is good for me.
T_N_T_M_P is a really confusion naming.


Ignore my suggestion for a new target hook.  GCC already has one.
You shouldn't be using TRULY_NOOP_TRUNCATION_MODES_P
with incorrectly ordered arguments. The correct target hook is
TARGET_MODE_REP_EXTENDED, which the MIPS backend correctly
defines via mips_mode_rep_extended.

It's MIPS definition of (and interpretation of) mips_truly_noop_truncation
that's suspect.

My latest theory is that these sign extensions should be:
(set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI
and not
(set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI
In isolation these are the same.  I think the fact that the MIPS backend 
wipes out the sign extension turning the result into a NOP is what makes 
them different.


Of course that's kind of the point behind the TRULY_NOOP_TRUNCATION 
macro.  That's what allows the MIPS target to wipe out the sign extension.


ISTM this might be worth noting in the docs for TRULY_NOOP_TRUNCATION.

Jeff


Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread Jeff Law




On 12/28/23 07:59, Roger Sayle wrote:


This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.
Well, I think "fixes" is a bit of a stretch.  We're avoiding the issue 
by changing the early RTL generation, but if I understand what's going 
on in the RTL optimizers and MIPS backend correctly, the core bug still 
remains.  Admittedly I haven't put it under a debugger, but that MIPS 
definition of NOOP_TRUNCATION just seems badly wrong and is just waiting 
to pop it's ugly head up again.






The motivating example from the bugzilla PR is:

extern void ext(int);
void foo(const unsigned char *buf) {
   int val;
   ((unsigned char*))[0] = *buf++;
   ((unsigned char*))[1] = *buf++;
   ((unsigned char*))[2] = *buf++;
   ((unsigned char*))[3] = *buf++;
   if(val > 0)
 ext(1);
   else
 ext(0);
}

which at the end of the tree optimization passes looks like:

void foo (const unsigned char * buf)
{
   int val;
   unsigned char _1;
   unsigned char _2;
   unsigned char _3;
   unsigned char _4;
   int val.5_5;

[local count: 1073741824]:
   _1 = *buf_7(D);
   MEM[(unsigned char *)] = _1;
   _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
   MEM[(unsigned char *) + 1B] = _2;
   _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
   MEM[(unsigned char *) + 2B] = _3;
   _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
   MEM[(unsigned char *) + 3B] = _4;
   val.5_5 = val;
   if (val.5_5 > 0)
 goto ; [59.00%]
   else
 goto ; [41.00%]

[local count: 633507681]:
   ext (1);
   goto ; [100.00%]

[local count: 440234144]:
   ext (0);

[local count: 1073741824]:
   val ={v} {CLOBBER(eol)};
   return;

}

Here four bytes are being sequentially written into the SImode value
val.  On some platforms, such as MIPS64, this SImode value is kept in
a 64-bit register, suitably sign-extended.  The function expand_assignment
contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
in expr.cc) which outputs an explicit extension operation after each
store_field (typically insv) to such promoted/extended pseudos.

The first observation is that there's no need to perform sign extension
after each byte in the example above; the extension is only required
after changes to the most significant byte (i.e. to a field that overlaps
the most significant bit).

True.




The bug fix is actually a bit more subtle, but at this point during
code expansion it's not safe to use a SUBREG when sign-extending this
field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
but combine (and other RTL optimizers) later realize that because SImode
values are always sign-extended in their 64-bit hard registers that
this is a no-op and eliminates it.  The trouble is that it's unsafe to
refer to the SImode lowpart of a 64-bit register using SUBREG at those
critical points when temporarily the value isn't correctly sign-extended,
and the usual backend invariants don't hold.  At these critical points,
the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
(sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.





Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
patch that I'll post in a few minutes.


This middle-end patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  The cc1 from a
cross-compiler to mips64 appears to generate much better code for
the above test case.  Ok for mainline?


2023-12-28  Roger Sayle  

gcc/ChangeLog
 PR rtl-optimization/104914
 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
 a sign or zero extension is only required if the modified field
 overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
 targets, don't refer to the temporarily incorrectly extended value
 using a SUBREG, but instead generate an explicit TRUNCATE rtx.

[ ... ]



+ /* Check if the field overlaps the MSB, requiring extension.  */
+ else if (known_eq (bitpos + bitsize,
+GET_MODE_BITSIZE (GET_MODE (to_rtx
Do you need to look at the size of the field as well?  ie, the starting 
position might be before the sign bit, but the width of the field might 
cover the mode's sign bit?


I'm not real good in the RTL expansion code, so if I'm offbase on this, 
just let me know.


jeff


Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-28 Thread Jeff Law




On 12/24/23 01:11, YunQiang Su wrote:

Yes. I also guess so.  Any new idea?

Well, I see multiple intertwined issues and I think MIPS has largely
mucked this up.

At a high level DI -> SI truncation is not a nop on MIPS64.  We must
explicitly sign extend the value from SI->DI to preserve the invariant
that SI mode objects are extended to DImode.  If we fail to do that,
then the SImode conditional branch patterns simply aren't going to work.



MIPS64 never claims DI -> SI is nop, instead it claims SI -> DI is nop.
And that just seems wrong, at least for truncation which implies the 
input precision must be larger than the output precision.


If you adjust the mips implementation of TARGET_TRULY_NOOP_TRUNCATION to 
return false when the input precision is smaller than the output 
precision, does that fix this problem?




And for MIPS64, it has only one type of branch. it works for both SI and DI.
Agreed, but the SImode variant is really just a DImode comparison that 
relies on the sign extending property of the MIPS architecture.  I'm not 
100% sure that's safe in the presence of bit manipulation instructions 
which do not preserve the sign extending property.  We actually don't 
allow some bit manipulations on RV64 for a similar underlying reason.





Converting from 32 to 64 does be nop, IF the 32 is properly sign extended.

But that's not a *truncation*, that's an *extension*.

Jeff


[PATCH] MIPS: Implement TARGET_INSN_COSTS

2023-12-28 Thread Roger Sayle
 

The current (default) behavior is that when the target doesn't define

TARGET_INSN_COST the middle-end uses the backend's

TARGET_RTX_COSTS, so multiplications are slower than additions,

but about the same size when optimizing for size (with -Os or -Oz).

 

All of this gets disabled with your proposed patch.

[If you don't check speed, you probably shouldn't touch insn_cost].

 

I agree that a backend can fine tune the (speed and size) costs of

instructions (especially complex !single_set instructions) via 

attributes in the machine description, but these should be used

to override/fine-tune rtx_costs, not override/replace/duplicate them.

 

Having accurate rtx_costs also helps RTL expansion and the earlier

optimizers, but insn_cost is used by combine and the later RTL

optimization passes, once instructions have been recognized.

 

Might I also recommend that instead of insn_count*perf_ratio*4,

or even the slightly better COSTS_N_INSNS (insn_count*perf_ratio),

that encode the relative cost in the attribute, avoiding the multiplication

(at runtime), and allowing fine tuning like "COSTS_N_INSNS(2) - 1".

Likewise, COSTS_N_BYTES is a very useful macro for a backend to

define/use in rtx_costs.  Conveniently for many RISC machines,

1 instruction takes about 4 bytes, for COSTS_N_INSNS (1) is

(approximately) comparable to COSTS_N_BYTES (4).

 

I hope this helps.  Perhaps something like:

 

 

static int

mips_insn_cost (rtx_insn *insn, bool speed)

{

  int cost;

  if (recog_memoized (insn) >= 0)

{

  if (speed)

{

  /* Use cost if provided.  */

  cost = get_attr_cost (insn);

  if (cost > 0)

return cost;

}

  else

{

  /* If optimizing for size, we want the insn size.  */

  return get_attr_length (insn);

}

}

 

  if (rtx set = single_set (insn))

cost = set_rtx_cost (set, speed);

  else

cost = pattern_cost (PATTERN (insn), speed);

  /* If the cost is zero, then it's likely a complex insn.  We don't

 want the cost of these to be less than something we know about.  */

  return cost ? cost : COSTS_N_INSNS (2);

}

 



Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

2023-12-28 Thread Jeff Law




On 12/26/23 02:34, pan2...@intel.com wrote:

From: Pan Li 

This patch would like to XFAIL the test case pr30957-1.c for the RVV when
build the elf with some configurations (list at the end of the log)
It will be vectorized during vect_transform_loop with a variable factor.
It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
Of course, it will do nothing during unroll_loops when loop->unroll is 1.

The aarch64_sve may have the similar issue but it initialize the const
`0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
Then it will pass the execution test.

aarch64:
moviv0.2s, #0x0
stp x29, x30, [sp, #-16]!
mov w0, #0xa
mov x29, sp
bl  400280  <== s0 is +0.0

Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
`-0.0`, and then pass it to the function foo. Of course it the execution
test will fail.

riscv:
flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
addisp,sp,-16
li  a0,10
sd  ra,8(sp)
jal 101fc   <== fa0 is -0.0

After this patch the loops vectorized with a variable factor of the RVV
will be treated as XFAIL by the tree dump when riscv_v and
variable_vect_length.

The below configurations are validated as XFAIL for RV64.
Interesting.  So I'd actually peel one more layer off this onion.  Why 
do the aarch64 and riscv targets generate different constants (0.0 vs 
-0.0)?


Is it possible that the aarch64 is generating 0.0 when asked for -0.0 
and -fno-signed-zeros is in effect?  That's a valid thing to do when 
-fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64 
backend.




Jeff


Re: [PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0,31]

2023-12-28 Thread Jeff Law




On 12/26/23 19:38, Juzhe-Zhong wrote:

Notice we have this following situation:

 vsetivlizero,4,e32,m1,ta,ma
 vlseg4e32.v v4,(a5)
 vlseg4e32.v v12,(a3)
 vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since 
VLMAX AVL = 4 when it is fixed-vlmax
 vfadd.vfv3,v13,fa0
 vfadd.vfv1,v12,fa1
 vfmul.vvv17,v3,v5
 vfmul.vvv16,v1,v5

The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == 
NUNITS blindly.
However, we don't need to transform all of them since when len is range of 
[0,31], we don't need to
consume scalar registers.

After this patch:

vsetivlizero,4,e32,m1,tu,ma
addia4,a5,400
vlseg4e32.v v12,(a3)
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vlseg4e32.v v4,(a4)
vfadd.vfv2,v14,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5

Tested on both RV32 and RV64 no regression.
So it looks like the two fragments above are from different sources, 
though I guess it's also possible one of the cut-n-pastes just got 
truncated.  Note the differing number of vfadd intructions.  That 
doesn't invalidate the patch, but does make it slightly harder to reason 
about what you're doing.





Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (is_vlmax_len_p): New function.
(expand_load_store): Disallow transformation into VLMAX when len is in 
range of [0,31]
(expand_cond_len_op): Ditto.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.
(expand_fold_extract_last): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test.
* gcc.target/riscv/rvv/base/vf_avl-2.c: New test.

---
  gcc/config/riscv/riscv-v.cc   | 21 +--
  .../riscv/rvv/autovec/post-ra-avl.c   |  2 +-
  .../gcc.target/riscv/rvv/base/vf_avl-2.c  | 21 +++
  3 files changed, 37 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 038ab084a37..0cc7af58da6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode)
   : false;
  }
  
+/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31].  */

Perhaps "that is out of the range [0, 31]."?

OK with the comment nit fixed.
jeff


[PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-28 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases.  For example:

float a[1];
float t() { return a[0] + a[8000]; }

is compiled to:

la.local$r13,a
la.local$r12,a+32768
fld.s   $f1,$r13,0
fld.s   $f0,$r12,-768
fadd.s  $f0,$f1,$f0

by trunk.  But as we've explained in r14-4851, the following would be
better with -mexplicit-relocs=auto:

pcalau12i   $r13,%pc_hi20(a)
pcalau12i   $r12,%pc_hi20(a+32000)
fld.s   $f1,$r13,%pc_lo12(a)
fld.s   $f0,$r12,%pc_lo12(a+32000)
fadd.s  $f0,$f1,$f0

However the sliding-window algorithm just won't detect the pcalau12i/fld
pair to be optimized.  Use a define_insn_and_split in combine pass will
work around the issue.

gcc/ChangeLog:

* config/loongarch/predicates.md
(symbolic_pcrel_offset_operand): New define_predicate.
(mem_simple_ldst_operand): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_rewrite_mem_for_simple_ldst): Declare.
* config/loongarch/loongarch.cc
(loongarch_rewrite_mem_for_simple_ldst): Implement.
* config/loongarch/loongarch.md (simple_load): New
define_insn_and_rewrite.
(simple_load_ext): Likewise.
(simple_store): Likewise.
(define_peephole2): Remove la.local/[f]ld peepholes.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c:
New test.
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c:
New test.
---

Changes from [v2]:
- Match (mem (symbol_ref ...)) instead of (symbol_ref ...) to retain the
  attributes of the MEM.
- Add a test to make sure the attributes of the MEM is retained.

[v2]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641430.html

Bootstrapped & regtestd on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch.cc |  16 +++
 gcc/config/loongarch/loongarch.md | 114 +-
 gcc/config/loongarch/predicates.md|  13 ++
 ...explicit-relocs-auto-single-load-store-2.c |  11 ++
 ...explicit-relocs-auto-single-load-store-3.c |  18 +++
 6 files changed, 86 insertions(+), 87 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 7bf21a45c69..024f3117604 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -163,6 +163,7 @@ extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, 
HOST_WIDE_INT);
 extern bool loongarch_check_zero_div_p (void);
 extern bool loongarch_pre_reload_split (void);
 extern int loongarch_use_bstrins_for_ior_with_mask (machine_mode, rtx *);
+extern rtx loongarch_rewrite_mem_for_simple_ldst (rtx);
 
 union loongarch_gen_fn_ptrs
 {
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 1d4d8f0b256..9f2b3e98bf0 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5717,6 +5717,22 @@ loongarch_use_bstrins_for_ior_with_mask (machine_mode 
mode, rtx *op)
   return 0;
 }
 
+/* Rewrite a MEM for simple load/store under -mexplicit-relocs=auto
+   -mcmodel={normal/medium}.  */
+rtx
+loongarch_rewrite_mem_for_simple_ldst (rtx mem)
+{
+  rtx addr = XEXP (mem, 0);
+  rtx hi = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
+  UNSPEC_PCALAU12I_GR);
+  rtx new_mem;
+
+  addr = gen_rtx_LO_SUM (Pmode, force_reg (Pmode, hi), addr);
+  new_mem = gen_rtx_MEM (GET_MODE (mem), addr);
+  MEM_COPY_ATTRIBUTES (new_mem, mem);
+  return new_mem;
+}
+
 /* Print the text for PRINT_OPERAND punctation character CH to FILE.
The punctuation characters are:
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index ce8fcd5b572..0de7e516d56 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4135,101 +4135,41 @@ (define_insn "loongarch_crcc_w__w"
 ;;
 ;; And if the pseudo op cannot be relaxed, we'll get a worse result (with
 ;; 3 instructions).
-(define_peephole2
-  [(set (match_operand:P 0 "register_operand")
-   (match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
-   (mem:LD_AT_LEAST_32_BIT (match_dup 0)))]
-  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
-   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
-   && (peep2_reg_dead_p (2, operands[0]) \
-   || REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2)
-   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
-  {
-emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
-  })
-

Re: [ARC PATCH] Table-driven ashlsi implementation for better code/rtx_costs.

2023-12-28 Thread Jeff Law




On 12/23/23 16:37, Roger Sayle wrote:


One of the cool features of the H8 backend is its use of tables to select
optimal shift implementations for different CPU variants.  This patch
borrows (plagiarizes) that idiom for SImode left shifts in the ARC backend
(for CPUs without a barrel-shifter).  This provides a convenient mechanism
for both selecting the best implementation strategy (for speed vs. size),
and providing accurate rtx_costs [without duplicating a lot of logic].
Left shift RTX costs are especially important for use in synth_mult.

An example improvement is:

int foo(int x) { return 32768*x; }

which is now generated with -O2 -mcpu=em -mswap as:

foo:bmsk_s  r0,r0,16
 swapr0,r0
 j_s.d   [blink]
 ror r0,r0

where previously the ARC backend would generate a loop:

foo:mov lp_count,15
 lp  2f
 add r0,r0,r0
 nop
2:  # end single insn loop
 j_s [blink]


Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's and/or Jeff's testing?
[Thanks again to Jeff for finding the typo in my last ARC patch]
So just an FYI.  There's no upstream gdbsim for the arc, so my tester 
just uses a dummy simulator which says everything passes.


So I could include your patch to test that the compiler doesn't ICE, 
produces results that will assemble/link, but it won't test the 
correctness of the resulting code.


Jeff


Re: [PATCH] RISC-V: Fix misaligned stack offset for interrupt function

2023-12-28 Thread Jeff Law




On 12/25/23 01:45, Kito Cheng wrote:

`interrupt` function will backup fcsr register, but it fixed to SImode,
it's not big issue since fcsr only used 8 bits so far, however the
offset should still using UNITS_PER_WORD to prevent the stack offset
become non 8 byte aligned, it will cause problem for RV64.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_for_each_saved_reg): Adjust the
offset of fcsr.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-misaligned.c: New.

OK
jeff


Re: [PATCH] RISC-V: Add crypto machine descriptions

2023-12-28 Thread Jeff Law




On 12/26/23 19:47, Kito Cheng wrote:

Thanks Feng, the patch is LGTM from my side, I am happy to accept
vector crypto stuffs for GCC 14, it's mostly intrinsic stuff, and the
only few non-intrinsic stuff also low risk enough (e.g. vrol, vctz)
I won't object.  I'm disappointed that we're in a similar situation as 
last year, but at least the scope is smaller.


jeff


Re: 回复:[PATCH v3 2/6] RISC-V: Split csr_operand in predicates.md for vector patterns.

2023-12-28 Thread Jeff Law




On 12/26/23 19:49, joshua wrote:

Hi Jeff,

Yes, I will change soemthing in vector_csr_operand in the following
patches.




Constraints will be added that the AVL cannot be encoded as an
immediate for xtheadvecotr vsetvl.

Ah.  Thanks.  Makes sense.

jeff


[middle-end PATCH] Only call targetm.truly_noop_truncation for truncations.

2023-12-28 Thread Roger Sayle

The truly_noop_truncation target hook is documented, in target.def, as
"true if it is safe to convert a value of inprec bits to one of outprec
bits (where outprec is smaller than inprec) by merely operating on it
as if it had only outprec bits", i.e. the middle-end can use a SUBREG
instead of a TRUNCATE.

What's perhaps potentially a little ambiguous in the above description is
whether it is the caller or the callee that's responsible for ensuring or
checking whether "outprec < inprec".  The name TRULY_NOOP_TRUNCATION_P,
like SUBREG_PROMOTED_P, may be prone to being understood as a predicate
that confirms that something is a no-op truncation or a promoted subreg,
when in fact the caller must first confirm this is a truncation/subreg and
only then call the "classification" macro.

Alas making the following minor tweak (for testing) to the i386 backend:

static bool
ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
{
  gcc_assert (outprec < inprec);
  return true;
}

#undef TARGET_TRULY_NOOP_TRUNCATION
#define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation

reveals that there are numerous callers in middle-end that rely on the
default behaviour of silently returning true for any (invalid) input.
These are fixed below.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-12-28  Roger Sayle  

gcc/ChangeLog
* combine.cc (make_extraction): Confirm that OUTPREC is less than
INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P.
* expmed.cc (store_bit_field_using_insv): Likewise.
(extract_bit_field_using_extv): Likewise.
(extract_bit_field_as_subreg): Likewise.
* optabs-query.cc (get_best_extraction_insn): Likewise.
* optabs.cc (expand_parity): Likewise.
* rtlhooks.cc (gen_lowpart_general): Likewise.
* simplify-rtx.cc (simplify_truncation): Disallow truncations
to the same precision.
(simplify_unary_operation_1) : Move optimization
of truncations to the same mode earlier.


Thanks in advance,
Roger
--

diff --git a/gcc/combine.cc b/gcc/combine.cc
index f2c64a9..5aa2f57 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7613,7 +7613,8 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   && (pos == 0 || REG_P (inner))
   && (inner_mode == tmode
   || !REG_P (inner)
-  || TRULY_NOOP_TRUNCATION_MODES_P (tmode, inner_mode)
+  || (known_lt (GET_MODE_SIZE (tmode), GET_MODE_SIZE (inner_mode))
+  && TRULY_NOOP_TRUNCATION_MODES_P (tmode, inner_mode))
   || reg_truncated_to_mode (tmode, inner))
   && (! in_dest
   || (REG_P (inner)
@@ -7856,6 +7857,8 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   /* On the LHS, don't create paradoxical subregs implicitely truncating
 the register unless TARGET_TRULY_NOOP_TRUNCATION.  */
   if (in_dest
+ && known_lt (GET_MODE_SIZE (GET_MODE (inner)),
+  GET_MODE_SIZE (wanted_inner_mode))
  && !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (inner),
 wanted_inner_mode))
return NULL_RTX;
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0bba93f..8940d47 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26707,6 +26707,16 @@ ix86_libm_function_max_error (unsigned cfn, 
machine_mode mode,
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
 #endif /* #if CHECKING_P */
 
+static bool
+ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
+{
+  gcc_assert (outprec < inprec);
+  return true;
+}
+
+#undef TARGET_TRULY_NOOP_TRUNCATION
+#define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-i386.h"
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 05331dd..6398bf9 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -651,6 +651,7 @@ store_bit_field_using_insv (const extraction_insn *insv, 
rtx op0,
  X) 0)) is (reg:N X).  */
   if (GET_CODE (xop0) == SUBREG
   && REG_P (SUBREG_REG (xop0))
+  && paradoxical_subreg_p (xop0)
   && !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (SUBREG_REG (xop0)),
 op_mode))
 {
@@ -1585,7 +1586,11 @@ extract_bit_field_using_extv (const extraction_insn 
*extv, rtx op0,
 mode.  Instead, create a temporary and use convert_move to set
 the target.  */
   if (REG_P (target)
- && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
+ && (known_lt (GET_MODE_SIZE (GET_MODE (target)),
+   GET_MODE_SIZE (ext_mode))
+ ? TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
+ : known_eq (GET_MODE_SIZE 

[PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread Roger Sayle

This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.

The motivating example from the bugzilla PR is:

extern void ext(int);
void foo(const unsigned char *buf) {
  int val;
  ((unsigned char*))[0] = *buf++;
  ((unsigned char*))[1] = *buf++;
  ((unsigned char*))[2] = *buf++;
  ((unsigned char*))[3] = *buf++;
  if(val > 0)
ext(1);
  else
ext(0);
}

which at the end of the tree optimization passes looks like:

void foo (const unsigned char * buf)
{
  int val;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  unsigned char _4;
  int val.5_5;

   [local count: 1073741824]:
  _1 = *buf_7(D);
  MEM[(unsigned char *)] = _1;
  _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
  MEM[(unsigned char *) + 1B] = _2;
  _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
  MEM[(unsigned char *) + 2B] = _3;
  _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
  MEM[(unsigned char *) + 3B] = _4;
  val.5_5 = val;
  if (val.5_5 > 0)
goto ; [59.00%]
  else
goto ; [41.00%]

   [local count: 633507681]:
  ext (1);
  goto ; [100.00%]

   [local count: 440234144]:
  ext (0);

   [local count: 1073741824]:
  val ={v} {CLOBBER(eol)};
  return;

}

Here four bytes are being sequentially written into the SImode value
val.  On some platforms, such as MIPS64, this SImode value is kept in
a 64-bit register, suitably sign-extended.  The function expand_assignment
contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
in expr.cc) which outputs an explicit extension operation after each
store_field (typically insv) to such promoted/extended pseudos.

The first observation is that there's no need to perform sign extension
after each byte in the example above; the extension is only required
after changes to the most significant byte (i.e. to a field that overlaps
the most significant bit).

The bug fix is actually a bit more subtle, but at this point during
code expansion it's not safe to use a SUBREG when sign-extending this
field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
but combine (and other RTL optimizers) later realize that because SImode
values are always sign-extended in their 64-bit hard registers that
this is a no-op and eliminates it.  The trouble is that it's unsafe to
refer to the SImode lowpart of a 64-bit register using SUBREG at those
critical points when temporarily the value isn't correctly sign-extended,
and the usual backend invariants don't hold.  At these critical points,
the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
(sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.

Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
patch that I'll post in a few minutes.


This middle-end patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  The cc1 from a
cross-compiler to mips64 appears to generate much better code for
the above test case.  Ok for mainline?


2023-12-28  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/104914
* expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
a sign or zero extension is only required if the modified field
overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
targets, don't refer to the temporarily incorrectly extended value
using a SUBREG, but instead generate an explicit TRUNCATE rtx.


Thanks in advance,
Roger
--

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9fef2bf6585..1a34b48e38f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6272,19 +6272,32 @@ expand_assignment (tree to, tree from, bool nontemporal)
  && known_eq (bitpos, 0)
  && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (to_rtx
result = store_expr (from, to_rtx, 0, nontemporal, false);
- else
+ /* Check if the field overlaps the MSB, requiring extension.  */
+ else if (known_eq (bitpos + bitsize,
+GET_MODE_BITSIZE (GET_MODE (to_rtx
{
- rtx to_rtx1
-   = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
- SUBREG_REG (to_rtx),
- subreg_promoted_mode (to_rtx));
+ scalar_int_mode imode = subreg_unpromoted_mode (to_rtx);
+ scalar_int_mode omode = subreg_promoted_mode (to_rtx);
+ rtx to_rtx1 = lowpart_subreg (imode, SUBREG_REG (to_rtx),
+   omode);
  result = store_field (to_rtx1, bitsize, bitpos,
bitregion_start, 

[PATCH] MIPS: Implement TARGET_INSN_COSTS

2023-12-28 Thread YunQiang Su
MIPS backend had some information about INSN, including length,
count etc.

And since some instructions are more costly, let's add a new
attr `perf_ratio`.  It's default value is (const_int 1).

The return value of mips_insn_cost is
  insn_count * perf_ratio * 4.

The magic `4` here, is due to that `rtx_cost` returns 4
for simple instructions.

gcc
* config/mips/mips.cc (mips_insn_cost): New function.
TARGET_INSN_COST: defined to mips_insn_cost.
* config/mips/mips.md (perf_ratio): New attr.
---
 gcc/config/mips/mips.cc | 14 ++
 gcc/config/mips/mips.md |  4 
 2 files changed, 18 insertions(+)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 9180dbbf843..fddb1519d76 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -4170,6 +4170,18 @@ mips_set_reg_reg_cost (machine_mode mode)
 }
 }
 
+/* Implement TARGET_INSN_COSTS.  */
+
+static int
+mips_insn_cost (rtx_insn *x, bool speed ATTRIBUTE_UNUSED)
+{
+  if (GET_CODE (PATTERN (x)) != SET)
+return pattern_cost (PATTERN (x), speed);
+  return get_attr_insn_count (x)
+ * get_attr_perf_ratio (x)
+ * 4;
+}
+
 /* Implement TARGET_RTX_COSTS.  */
 
 static bool
@@ -23069,6 +23081,8 @@ mips_bit_clear_p (enum machine_mode mode, unsigned 
HOST_WIDE_INT m)
 #define TARGET_RTX_COSTS mips_rtx_costs
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST mips_address_cost
+#undef  TARGET_INSN_COST
+#define TARGET_INSN_COST mips_insn_cost
 
 #undef TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 #define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P 
mips_no_speculation_in_delay_slots_p
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 0666310734e..d6c4ba13f47 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -312,6 +312,10 @@ (define_attr "sync_insn2" "nop,and,xor,not"
 ;; "11" specifies MEMMODEL_ACQUIRE.
 (define_attr "sync_memmodel" "" (const_int 10))
 
+;; Performance ratio. Used by mips_insn_cost: it returns 
insn_count*perf_ratio*4.
+;; Add this attr to the slow INSNs.
+(define_attr "perf_ratio" "" (const_int 1))
+
 ;; Accumulator operand for madd patterns.
 (define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none"))
 
-- 
2.39.2



[PATCH] aarch64: fortran: Adjust vect-8.f90 for libmvec

2023-12-28 Thread Szabolcs Nagy
With new glibc one more loop can be vectorized via simd exp in libmvec.

Found by the Linaro TCWG CI.

gcc/testsuite/ChangeLog:

* gfortran/vect/vect-8.f90: Accept more vectorized loops.
---
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
index ca72ddcffca..938dfc29754 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
@@ -704,7 +704,7 @@ CALL track('KERNEL  ')
 RETURN
 END SUBROUTINE kernel
 
-! { dg-final { scan-tree-dump-times "vectorized 25 loops" 1 "vect" { target 
aarch64_sve } } }
-! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { target { 
aarch64*-*-* && { ! aarch64_sve } } } } }
+! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { 
target aarch64_sve } } }
+! { dg-final { scan-tree-dump-times "vectorized 2\[45\] loops" 1 "vect" { 
target { aarch64*-*-* && { ! aarch64_sve } } } } }
 ! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect" { 
target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
 ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target { 
{ ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }
-- 
2.25.1



Re: Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread 李威
I also have the same doubts about vector instructions.
Sorry i can't prove it, so i used simplify_gen_subreg instead to make sure 
there won't be problems (i submitted the v2 version), my oversight.

> -原始邮件-
> 发件人: "Xi Ruoyao" 
> 发送时间:2023-12-28 18:55:01 (星期四)
> 收件人: "Li Wei" , gcc-patches@gcc.gnu.org
> 抄送: i...@xen0n.name, xucheng...@loongson.cn, chengl...@loongson.cn
> 主题: Re: [PATCH v1] LoongArch: Merge constant vector permuatation 
> implementations.
> 
> On Thu, 2023-12-28 at 14:59 +0800, Li Wei wrote:
> > There are currently two versions of the implementations of constant
> > vector permutation: loongarch_expand_vec_perm_const_1 and
> > loongarch_expand_vec_perm_const_2.  The implementations of the two
> > versions are different. Currently, only the implementation of
> > loongarch_expand_vec_perm_const_1 is used for 256-bit vectors.  We
> > hope to streamline the code as much as possible while retaining the
> > better-performing implementation of the two.  By repeatedly testing
> > spec2006 and spec2017, we got the following Merged version.
> > Compared with the pre-merger version, the number of lines of code
> > in loongarch.cc has been reduced by 888 lines.  At the same time,
> > the performance of SPECint2006 under Ofast has been improved by 0.97%,
> > and the performance of SPEC2017 fprate has been improved by 0.27%.
> 
> /* snip */
> 
> > - * 3. What LASX permutation instruction does:
> > - * In short, it just execute two independent 128bit vector permuatation, 
> > and
> > - * it's the reason that we need to do the jobs below.  We will explain it.
> > - * op0, op1, target, and selector will be separate into high 128bit and low
> > - * 128bit, and do permutation as the description below:
> > - *
> > - *  a) op0's low 128bit and op1's low 128bit "combines" into a 256bit temp
> > - * vector storage (TVS1), elements are indexed as below:
> > - *     0 ~ nelt / 2 - 1  nelt / 2 ~ nelt - 1
> > - * |-|-| TVS1
> > - *     op0's low 128bit  op1's low 128bit
> > - *    op0's high 128bit and op1's high 128bit are "combined" into TVS2 in 
> > the
> > - *    same way.
> > - *     0 ~ nelt / 2 - 1  nelt / 2 ~ nelt - 1
> > - * |-|-| TVS2
> > - *     op0's high 128bit   op1's high 128bit
> > - *  b) Selector's low 128bit describes which elements from TVS1 will fit 
> > into
> > - *  target vector's low 128bit.  No TVS2 elements are allowed.
> > - *  c) Selector's high 128bit describes which elements from TVS2 will fit 
> > into
> > - *  target vector's high 128bit.  No TVS1 elements are allowed.
> 
> Just curious: why the hardware engineers created such a bizarre
> instruction? :)
> 
> /* snip */
> 
> > +     rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0);
> > +     rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);
> 
> Can we prove d->op0, d->op1, and d->target are never SUBREGs?  Otherwise
> I'd use lowpart_subreg (E_V4DImode, d->xxx, d->vmode) here to avoid
> creating a nested SUBREG (nested SUBREG will cause an ICE and it has
> happened several times before).
> 
> /* snip */
> 
> > +     switch (d->vmode)
> >         {
> > -     remapped[i] = d->perm[i];
> > +       case E_V4DFmode:
> > +     sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (d-
> > >nelt,
> > +     
> > rperm));
> > +     tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
> 
> Likewise.
> 
> > +     emit_move_insn (tmp, sel);
> > +     break;
> > +       case E_V8SFmode:
> > +     sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d-
> > >nelt,
> > +     
> > rperm));
> > +     tmp = gen_rtx_SUBREG (E_V8SImode, d->target, 0);
> 
> Likewise.
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。
 
This email and its attachments contain confidential information from Loongson 
Technology , which is intended only for the person or entity whose address is 
listed above. Any use of the information contained herein in any way 
(including, but not limited to, total or partial disclosure, reproduction or 
dissemination) by persons other than the intended recipient(s) is prohibited. 
If you receive this email in error, please notify the sender by phone or email 
immediately and delete it. 

本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。
 
This email and its attachments contain confidential information from Loongson 
Technology , which is intended only for the person or entity whose address is 
listed above. Any use of the information contained herein in any way 
(including, but not limited to, total or partial disclosure, reproduction or 
dissemination) by persons other than 

[PATCH v2] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Li Wei
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2.  The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used for 256-bit vectors.  We
hope to streamline the code as much as possible while retaining the
better-performing implementation of the two.  By repeatedly testing
spec2006 and spec2017, we got the following Merged version.
Compared with the pre-merger version, the number of lines of code
in loongarch.cc has been reduced by 888 lines.  At the same time,
the performance of SPECint2006 under Ofast has been improved by 0.97%,
and the performance of SPEC2017 fprate has been improved by 0.27%.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_is_odd_extraction):
Remove useless forward declaration.
(loongarch_is_even_extraction): Remove useless forward declaration.
(loongarch_try_expand_lsx_vshuf_const): Removed.
(loongarch_expand_vec_perm_const_1): Merged.
(loongarch_is_double_duplicate): Removed.
(loongarch_is_center_extraction): Ditto.
(loongarch_is_reversing_permutation): Ditto.
(loongarch_is_di_misalign_extract): Ditto.
(loongarch_is_si_misalign_extract): Ditto.
(loongarch_is_lasx_lowpart_extract): Ditto.
(loongarch_is_op_reverse_perm): Ditto.
(loongarch_is_single_op_perm): Ditto.
(loongarch_is_divisible_perm): Ditto.
(loongarch_is_triple_stride_extract): Ditto.
(loongarch_expand_vec_perm_const_2): Merged.
(loongarch_expand_vec_perm_const): New.
(loongarch_vectorize_vec_perm_const): Adjust.
---
 gcc/config/loongarch/loongarch.cc | 1308 +
 1 file changed, 210 insertions(+), 1098 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 1d4d8f0b256..d5bf6a02a12 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8769,143 +8769,6 @@ loongarch_expand_vec_perm (rtx target, rtx op0, rtx 
op1, rtx sel)
 }
 }
 
-static bool
-loongarch_is_odd_extraction (struct expand_vec_perm_d *);
-
-static bool
-loongarch_is_even_extraction (struct expand_vec_perm_d *);
-
-static bool
-loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d)
-{
-  int i;
-  rtx target, op0, op1, sel, tmp;
-  rtx rperm[MAX_VECT_LEN];
-
-  if (d->vmode == E_V2DImode || d->vmode == E_V2DFmode
-   || d->vmode == E_V4SImode || d->vmode == E_V4SFmode
-   || d->vmode == E_V8HImode || d->vmode == E_V16QImode)
-{
-  target = d->target;
-  op0 = d->op0;
-  op1 = d->one_vector_p ? d->op0 : d->op1;
-
-  if (GET_MODE (op0) != GET_MODE (op1)
- || GET_MODE (op0) != GET_MODE (target))
-   return false;
-
-  if (d->testing_p)
-   return true;
-
-  /* If match extract-even and extract-odd permutations pattern, use
-   * vselect much better than vshuf.  */
-  if (loongarch_is_odd_extraction (d)
- || loongarch_is_even_extraction (d))
-   {
- if (loongarch_expand_vselect_vconcat (d->target, d->op0, d->op1,
-   d->perm, d->nelt))
-   return true;
-
- unsigned char perm2[MAX_VECT_LEN];
- for (i = 0; i < d->nelt; ++i)
-   perm2[i] = (d->perm[i] + d->nelt) & (2 * d->nelt - 1);
-
- if (loongarch_expand_vselect_vconcat (d->target, d->op1, d->op0,
-   perm2, d->nelt))
-   return true;
-   }
-
-  for (i = 0; i < d->nelt; i += 1)
-   {
- rperm[i] = GEN_INT (d->perm[i]);
-   }
-
-  if (d->vmode == E_V2DFmode)
-   {
- sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm));
- tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0);
- emit_move_insn (tmp, sel);
-   }
-  else if (d->vmode == E_V4SFmode)
-   {
- sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm));
- tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0);
- emit_move_insn (tmp, sel);
-   }
-  else
-   {
- sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm));
- emit_move_insn (d->target, sel);
-   }
-
-  switch (d->vmode)
-   {
-   case E_V2DFmode:
- emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0));
- break;
-   case E_V2DImode:
- emit_insn (gen_lsx_vshuf_d (target, target, op1, op0));
- break;
-   case E_V4SFmode:
- emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0));
- break;
-   case E_V4SImode:
- emit_insn (gen_lsx_vshuf_w (target, target, op1, op0));
- break;
-   case E_V8HImode:
- emit_insn (gen_lsx_vshuf_h (target, target, op1, op0));
-

[committed] i386: Cleanup ix86_expand_{unary|binary}_operator issues

2023-12-28 Thread Uros Bizjak
Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange
prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD.

No functional changes.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_unary_operator_ok): Move from here...
* config/i386/i386-expand.cc (ix86_unary_operator_ok): ... to here.
* config/i386/i386-protos.h: Re-arrange ix86_{unary|binary}_operator_ok
and ix86_expand_{unary|binary}_operator prototypes.
* config/i386/i386.md: Cosmetic changes with the usage of
TARGET_APX_NDD in ix86_expand_{unary|binary}_operator
and ix86_{unary|binary}_operator_ok function calls.

Bootstrapped and regression tested on x86_64-linux-gnu {-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 57a108ae4a7..fd1b2a9ff36 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1537,6 +1537,23 @@ ix86_expand_unary_operator (enum rtx_code code, 
machine_mode mode,
 emit_move_insn (operands[0], dst);
 }
 
+/* Return TRUE or FALSE depending on whether the unary operator meets the
+   appropriate constraints.  */
+
+bool
+ix86_unary_operator_ok (enum rtx_code,
+   machine_mode,
+   rtx operands[2],
+   bool use_ndd)
+{
+  /* If one of operands is memory, source and destination must match.  */
+  if ((MEM_P (operands[0])
+   || (!use_ndd && MEM_P (operands[1])))
+  && ! rtx_equal_p (operands[0], operands[1]))
+return false;
+  return true;
+}
+
 /* Predict just emitted jump instruction to be taken with probability PROB.  */
 
 static void
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 56349064a6c..9ee08d8ecc0 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -108,15 +108,20 @@ extern void ix86_expand_clear (rtx);
 extern void ix86_expand_move (machine_mode, rtx[]);
 extern void ix86_expand_vector_move (machine_mode, rtx[]);
 extern void ix86_expand_vector_move_misalign (machine_mode, rtx[]);
-extern rtx ix86_fixup_binary_operands (enum rtx_code,
-  machine_mode, rtx[], bool = false);
-extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
-   machine_mode, rtx[], bool = 
false);
-extern void ix86_expand_binary_operator (enum rtx_code,
-machine_mode, rtx[], bool = false);
+extern rtx ix86_fixup_binary_operands (enum rtx_code, machine_mode,
+  rtx[], bool = false);
+extern void ix86_fixup_binary_operands_no_copy (enum rtx_code, machine_mode,
+   rtx[], bool = false);
+extern void ix86_expand_binary_operator (enum rtx_code, machine_mode,
+rtx[], bool = false);
+extern bool ix86_binary_operator_ok (enum rtx_code, machine_mode,
+rtx[3], bool = false);
+extern void ix86_expand_unary_operator (enum rtx_code, machine_mode,
+   rtx[], bool = false);
+extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode,
+   rtx[2], bool = false);
 extern void ix86_expand_vector_logical_operator (enum rtx_code,
 machine_mode, rtx[]);
-extern bool ix86_binary_operator_ok (enum rtx_code, machine_mode, rtx[3], bool 
= false);
 extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]);
 extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]);
 extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
@@ -126,12 +131,9 @@ extern int ix86_last_zero_store_uid;
 extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
-extern void ix86_expand_unary_operator (enum rtx_code, machine_mode,
-   rtx[], bool = false);
 extern rtx ix86_build_const_vector (machine_mode, bool, rtx);
 extern rtx ix86_build_signbit_mask (machine_mode, bool, bool);
-extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx,
-  machine_mode);
+extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx, machine_mode);
 extern void ix86_split_convert_uns_si_sse (rtx[]);
 extern void ix86_expand_convert_uns_didf_sse (rtx, rtx);
 extern void ix86_expand_convert_uns_sixf_sse (rtx, rtx);
@@ -147,8 +149,6 @@ extern void ix86_split_fp_absneg_operator (enum rtx_code, 
machine_mode,
   rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_expand_xorsign (rtx []);
-extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2],
-   bool = false);
 extern bool 

Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Xi Ruoyao
On Thu, 2023-12-28 at 14:59 +0800, Li Wei wrote:
> There are currently two versions of the implementations of constant
> vector permutation: loongarch_expand_vec_perm_const_1 and
> loongarch_expand_vec_perm_const_2.  The implementations of the two
> versions are different. Currently, only the implementation of
> loongarch_expand_vec_perm_const_1 is used for 256-bit vectors.  We
> hope to streamline the code as much as possible while retaining the
> better-performing implementation of the two.  By repeatedly testing
> spec2006 and spec2017, we got the following Merged version.
> Compared with the pre-merger version, the number of lines of code
> in loongarch.cc has been reduced by 888 lines.  At the same time,
> the performance of SPECint2006 under Ofast has been improved by 0.97%,
> and the performance of SPEC2017 fprate has been improved by 0.27%.

/* snip */

> - * 3. What LASX permutation instruction does:
> - * In short, it just execute two independent 128bit vector permuatation, and
> - * it's the reason that we need to do the jobs below.  We will explain it.
> - * op0, op1, target, and selector will be separate into high 128bit and low
> - * 128bit, and do permutation as the description below:
> - *
> - *  a) op0's low 128bit and op1's low 128bit "combines" into a 256bit temp
> - * vector storage (TVS1), elements are indexed as below:
> - *       0 ~ nelt / 2 - 1  nelt / 2 ~ nelt - 1
> - *   |-|-| TVS1
> - *       op0's low 128bit  op1's low 128bit
> - *    op0's high 128bit and op1's high 128bit are "combined" into TVS2 in the
> - *    same way.
> - *       0 ~ nelt / 2 - 1  nelt / 2 ~ nelt - 1
> - *   |-|-| TVS2
> - *       op0's high 128bit   op1's high 128bit
> - *  b) Selector's low 128bit describes which elements from TVS1 will fit into
> - *  target vector's low 128bit.  No TVS2 elements are allowed.
> - *  c) Selector's high 128bit describes which elements from TVS2 will fit 
> into
> - *  target vector's high 128bit.  No TVS1 elements are allowed.

Just curious: why the hardware engineers created such a bizarre
instruction? :)

/* snip */

> +   rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0);
> +   rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);

Can we prove d->op0, d->op1, and d->target are never SUBREGs?  Otherwise
I'd use lowpart_subreg (E_V4DImode, d->xxx, d->vmode) here to avoid
creating a nested SUBREG (nested SUBREG will cause an ICE and it has
happened several times before).

/* snip */

> +   switch (d->vmode)
>       {
> -   remapped[i] = d->perm[i];
> +     case E_V4DFmode:
> +   sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (d-
> >nelt,
> +   
> rperm));
> +   tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0);

Likewise.

> +   emit_move_insn (tmp, sel);
> +   break;
> +     case E_V8SFmode:
> +   sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d-
> >nelt,
> +   
> rperm));
> +   tmp = gen_rtx_SUBREG (E_V8SImode, d->target, 0);

Likewise.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [x86_PATCH] peephole2 to resolve failure of gcc.target/i386/pr43644-2.c

2023-12-28 Thread Uros Bizjak
On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle  wrote:
>
>
> This patch resolves the failure of pr43644-2.c in the testsuite, a code
> quality test I added back in July, that started failing as the code GCC
> generates for 128-bit values (and their parameter passing) has been in
> flux.  After a few attempts at tweaking pattern constraints in the hope
> of convincing reload to produce a more aggressive (but potentially
> unsafe) register allocation, I think the best solution is to use a
> peephole2 to catch/clean-up this specific case.
>
> Specifically, the function:
>
> unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
>   return x+y;
> }
>
> currently generates:
>
> foo:movq%rdx, %rcx
> movq%rdi, %rax
> movq%rsi, %rdx
> addq%rcx, %rax
> adcq$0, %rdx
> ret
>
> and with this patch/peephole2 now generates:
>
> foo:movq%rdx, %rax
> movq%rsi, %rdx
> addq%rdi, %rax
> adcq$0, %rdx
> ret
>
> which I believe is optimal.

How about simply moving the assignment to the MSB in the split pattern
after the LSB calculation:

  [(set (match_dup 0) (match_dup 4))
-   (set (match_dup 5) (match_dup 2))
   (parallel [(set (reg:CCC FLAGS_REG)
  (compare:CCC
(plus:DWIH (match_dup 0) (match_dup 1))
(match_dup 0)))
 (set (match_dup 0)
  (plus:DWIH (match_dup 0) (match_dup 1)))])
+   (set (match_dup 5) (match_dup 2))
   (parallel [(set (match_dup 5)
  (plus:DWIH
(plus:DWIH

There is an earlyclobber on the output operand, so we are sure that
assignments to (op 0) and (op 5) won't clobber anything.
cprop_hardreg pass will then do the cleanup for us, resulting in:

foo: movq%rdi, %rax
   addq%rdx, %rax
   movq%rsi, %rdx
   adcq$0, %rdx

Uros.

>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-12-21  Roger Sayle  
>
> gcc/ChangeLog
> PR target/43644
> * config/i386/i386.md (define_peephole2): Tweak register allocation
> of *add3_doubleword_concat_zext.
>
> gcc/testsuite/ChangeLog
> PR target/43644
> * gcc.target/i386/pr43644-2.c: Expect 2 movq instructions.
>
>
> Thanks in advance, and for your patience with this testsuite noise.
> Roger
> --
>
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4c6368bf3b7..9f97d407975 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6411,13 +6411,13 @@ (define_insn_and_split 
"*add3_doubleword_concat_zext"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 4))
-   (set (match_dup 5) (match_dup 2))
(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 0) (match_dup 1))
 (match_dup 0)))
  (set (match_dup 0)
   (plus:DWIH (match_dup 0) (match_dup 1)))])
+   (set (match_dup 5) (match_dup 2))
(parallel [(set (match_dup 5)
   (plus:DWIH
 (plus:DWIH


[PATCH v1] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Li Wei
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2.  The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used for 256-bit vectors.  We
hope to streamline the code as much as possible while retaining the
better-performing implementation of the two.  By repeatedly testing
spec2006 and spec2017, we got the following Merged version.
Compared with the pre-merger version, the number of lines of code
in loongarch.cc has been reduced by 888 lines.  At the same time,
the performance of SPECint2006 under Ofast has been improved by 0.97%,
and the performance of SPEC2017 fprate has been improved by 0.27%.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_is_odd_extraction):
Remove useless forward declaration.
(loongarch_is_even_extraction): Remove useless forward declaration.
(loongarch_try_expand_lsx_vshuf_const): Removed.
(loongarch_expand_vec_perm_const_1): Merged.
(loongarch_is_double_duplicate): Removed.
(loongarch_is_center_extraction): Ditto.
(loongarch_is_reversing_permutation): Ditto.
(loongarch_is_di_misalign_extract): Ditto.
(loongarch_is_si_misalign_extract): Ditto.
(loongarch_is_lasx_lowpart_extract): Ditto.
(loongarch_is_op_reverse_perm): Ditto.
(loongarch_is_single_op_perm): Ditto.
(loongarch_is_divisible_perm): Ditto.
(loongarch_is_triple_stride_extract): Ditto.
(loongarch_expand_vec_perm_const_2): Merged.
(loongarch_expand_vec_perm_const): New.
(loongarch_vectorize_vec_perm_const): Adjust.
---
 gcc/config/loongarch/loongarch.cc | 1302 +
 1 file changed, 207 insertions(+), 1095 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 1d4d8f0b256..12408042d48 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8769,143 +8769,6 @@ loongarch_expand_vec_perm (rtx target, rtx op0, rtx 
op1, rtx sel)
 }
 }
 
-static bool
-loongarch_is_odd_extraction (struct expand_vec_perm_d *);
-
-static bool
-loongarch_is_even_extraction (struct expand_vec_perm_d *);
-
-static bool
-loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d)
-{
-  int i;
-  rtx target, op0, op1, sel, tmp;
-  rtx rperm[MAX_VECT_LEN];
-
-  if (d->vmode == E_V2DImode || d->vmode == E_V2DFmode
-   || d->vmode == E_V4SImode || d->vmode == E_V4SFmode
-   || d->vmode == E_V8HImode || d->vmode == E_V16QImode)
-{
-  target = d->target;
-  op0 = d->op0;
-  op1 = d->one_vector_p ? d->op0 : d->op1;
-
-  if (GET_MODE (op0) != GET_MODE (op1)
- || GET_MODE (op0) != GET_MODE (target))
-   return false;
-
-  if (d->testing_p)
-   return true;
-
-  /* If match extract-even and extract-odd permutations pattern, use
-   * vselect much better than vshuf.  */
-  if (loongarch_is_odd_extraction (d)
- || loongarch_is_even_extraction (d))
-   {
- if (loongarch_expand_vselect_vconcat (d->target, d->op0, d->op1,
-   d->perm, d->nelt))
-   return true;
-
- unsigned char perm2[MAX_VECT_LEN];
- for (i = 0; i < d->nelt; ++i)
-   perm2[i] = (d->perm[i] + d->nelt) & (2 * d->nelt - 1);
-
- if (loongarch_expand_vselect_vconcat (d->target, d->op1, d->op0,
-   perm2, d->nelt))
-   return true;
-   }
-
-  for (i = 0; i < d->nelt; i += 1)
-   {
- rperm[i] = GEN_INT (d->perm[i]);
-   }
-
-  if (d->vmode == E_V2DFmode)
-   {
- sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm));
- tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0);
- emit_move_insn (tmp, sel);
-   }
-  else if (d->vmode == E_V4SFmode)
-   {
- sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm));
- tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0);
- emit_move_insn (tmp, sel);
-   }
-  else
-   {
- sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm));
- emit_move_insn (d->target, sel);
-   }
-
-  switch (d->vmode)
-   {
-   case E_V2DFmode:
- emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0));
- break;
-   case E_V2DImode:
- emit_insn (gen_lsx_vshuf_d (target, target, op1, op0));
- break;
-   case E_V4SFmode:
- emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0));
- break;
-   case E_V4SImode:
- emit_insn (gen_lsx_vshuf_w (target, target, op1, op0));
- break;
-   case E_V8HImode:
- emit_insn (gen_lsx_vshuf_h (target, target, op1, op0));
-