[PATCH] LoongArch: Remove unused code

2024-04-02 Thread Jiahao Xu
For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support them now and in the future, so this patch removes these unused code. gcc/ChangeLog: * config/loongarch/lasx.md: Remove unused code. * config/loongarch/loongarch-protos.h (loongarch_split_lsx_copy_d):

[PATCH] LoongArch: Remove unused code and add sign/zero-extend for vpickve2gr.d

2024-03-22 Thread Jiahao Xu
For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support them now and in the future, so this patch removes these unused code. This patch also adds sign/zero-extend operations to vpickve2gr.d to match the actual instruction behavior, and integrates the template definition of

Re: [PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread Jiahao Xu
在 2024/1/25 下午3:46, chenglulu 写道: Jiahao:  Note that the LoongArch 'a' in the title needs to be capitalized.  I modified this patch and incorporated it first. Thanks, I'll pay attention next time. 在 2024/1/24 下午5:19, Jiahao Xu 写道: It is incorrect to use vld/vori to implement

Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Jiahao Xu
在 2024/1/24 下午5:48, Xi Ruoyao 写道: On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote: gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. gcc

[PATCH] Loongarch: Remove vec_concatz pattern

2024-01-24 Thread Jiahao Xu
It is incorrect to use vld/vori to implement the vec_concatz because when the LSX instruction is used to update the value of the vector register, the upper 128 bits of the vector register will not be zeroed. gcc/ChangeLog: * config/loongarch/lasx.md (@vec_concatz): Remove this

[PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Jiahao Xu
gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.

Re: [PATCH v3] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT

2024-01-19 Thread Jiahao Xu
-tree-dump-times ch2 "Will duplicate bb" 2 +FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid sum" 0 在 2024/1/16 上午10:32, Jiahao Xu 写道: Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation in

[PATCH v3] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT

2024-01-15 Thread Jiahao Xu
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. SPEC2017 performance evaluation shows 1% performance improvement for fprate GEOMEAN and no obvious regression for others. Especially, 526.blender_r

[PATCH] LoongArch: Fix pattern vec_concatz

2024-01-15 Thread Jiahao Xu
In r14-7022-34d339bbd0c1f5b4ad9587e7ae8387c912cb028b I implement pattern vec_concatz, the reg+reg addressing mode is not supported in vec_concatz. This patch fixes that. gcc/ChangeLog: * config/loongarch/lasx.md (vec_concatz): Fix pattern to support reg+reg addressing mode.

[PATCH] LoongArch: Split vec_selects of bottom elements into simple move

2024-01-15 Thread Jiahao Xu
For below pattern, can be treated as a simple move because floating point and vector share a common register on loongarch64. (set (reg/v:SF 32 $f0 [orig:93 res ] [93]) (vec_select:SF (reg:V8SF 32 $f0 [115]) (parallel [ (const_int 0 [0]) ])))

[PATCH] LoongArch: Implenment vec_init where N is a LSX vector mode

2024-01-04 Thread Jiahao Xu
This patch implenments more vec_init optabs that can handle two LSX vectors producing a LASX vector by concatenating them. When an lsx vector is concatenated with an LSX const_vector of zeroes, the vec_concatz pattern can be used effectively. For example as below typedef short v8hi

[PATCH] LoongArch: Optimize zero_extendqisi2 and zero_extendqidi2 patterns

2024-01-04 Thread Jiahao Xu
For zero_extendqisi2 and zero_extendqidi2, use andi instead of bstrpick.w, because andi is 6 times faster than bstrpick.w. gcc/ChangeLog: * config/loongarch/loongarch.md: (zero_extend2): Rename to .. (zero_extendhi2): .. this, use hi. (zero_extendqihi2): Rename to

[PATCH] LoongArch: Improve lasx_xvpermi_q_ insn pattern

2024-01-04 Thread Jiahao Xu
For instruction xvpermi.q, unused bits in operands[3] need be set to 0 to avoid causing undefined behavior on LA464. gcc/ChangeLog: * config/loongarch/lasx.md: Set the unused bits in operand[3] to 0. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-xvpremi.c:

Re: [PATCH v2] LoongArch: Implement FCCmode reload and cstore4

2023-12-21 Thread Jiahao Xu
SPECCPU 2017 and SPECCPU 2006 successfully built and tested, and this patch gives a 1.3% improvement in SPECCPU 2017 fprate on 3A6000, no performance regression was found. This is an effective optimization and looks good. 在 2023/12/15 下午4:57, Xi Ruoyao 写道: We used a branch to load

[PATCH v2] LoongArch: Fix incorrect code generation for sad pattern

2023-12-14 Thread Jiahao Xu
When I attempt to enable vect_usad_char effective target for LoongArch, slp-reduc-sad.c and vect-reduc-sad*.c tests fail. These tests fail because the sad pattern generates bad code. This patch to fixed them, for sad patterns, use zero expansion instead of sign expansion for reduction.

[PATCH] LoongArch: Fix incorrect code generation for sad pattern

2023-12-14 Thread Jiahao Xu
When I attempt to enable vect_usad_char effective target for LoongArch, some tests fail. These tests fail because the sad pattern generates bad code. This patch to fixed them, for sad patterns, use zero expansion instead of sign expansion for reduction. Currently, we are fixing failed vectorized

Re: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4

2023-12-13 Thread Jiahao Xu
The implementation of this patch has some issues. When I compile 521.wrf with -Ofast -mlasx -flto -muse-movcf2gr, it results in an ICE: during RTL pass: reload module_mp_fast_sbm.fppized.f90: In function 'fast_sbm.constprop': module_mp_fast_sbm.fppized.f90:1369:25: internal compiler error:

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Jiahao Xu
在 2023/12/13 下午2:21, Xi Ruoyao 写道: On Wed, 2023-12-13 at 14:17 +0800, Jiahao Xu wrote: This test was extracted from the hot functions of 526.blender_r. Setting LOGICAL_OP_NON_SHORT_CIRCUIT to 0 resulted in a 26% decrease in dynamic instruction count and a 13.4% performance improvement. After

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Jiahao Xu
在 2023/12/13 上午2:27, Xi Ruoyao 写道: On Tue, 2023-12-12 at 20:39 +0800, Xi Ruoyao wrote: On Tue, 2023-12-12 at 19:59 +0800, Jiahao Xu wrote: I guess here the problem is floating-point compare instruction is much more costly than other instructions but the fact is not correctly modeled yet

Re: [PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Jiahao Xu
在 2023/12/12 下午7:26, Xi Ruoyao 写道: On Tue, 2023-12-12 at 19:14 +0800, Jiahao Xu wrote: Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. This gives a 1.8% improvement in SPECCPU 2017 fprate

[PATCH v2] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Jiahao Xu
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000. gcc/ChangeLog: * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT):

Re: [PATCH] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Jiahao Xu
在 2023/12/12 下午6:05, Xi Ruoyao 写道: On Tue, 2023-12-12 at 17:50 +0800, Jiahao Xu wrote: diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c b/gcc/testsuite/gcc.target/loongarch/short-circuit.c new file mode 100644 index 000..2cef0193466 --- /dev/null +++ b/gcc/testsuite

[PATCH] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT.

2023-12-12 Thread Jiahao Xu
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000. gcc/ChangeLog: * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT):

Re: [PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

2023-12-06 Thread Jiahao Xu
在 2023/12/6 下午3:04, Jiahao Xu 写道: LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional Newton-Raphson steps implement single precision floating-point division, square root and reciprocal square root operations for better throughput. The patches

[PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-12-05 Thread Jiahao Xu
When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations, for

[PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.

2023-12-05 Thread Jiahao Xu
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to .. (rsqrt2): .. this. *

[PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled.

2023-12-05 Thread Jiahao Xu
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue instructions per cycle of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: *

[PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.

2023-12-05 Thread Jiahao Xu
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to .. (recip3): .. this. *

[PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions.

2023-12-05 Thread Jiahao Xu
This patch adds define_insn/builtins/intrinsics for these instructions, and add option -mfrecipe to control instruction generation. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (fecipe): Add. * config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.

[PATCH v3 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

2023-12-05 Thread Jiahao Xu
on the patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html Jiahao Xu (5): LoongArch: Add support for LoongArch V1.1 approximate instructions. LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. LoongArch: Redefine pattern for xvfrecip/vfrecip instructions

[PATCH v2 0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrt operations.

2023-12-04 Thread Jiahao Xu
on the patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html. Jiahao Xu (5): LoongArch: Add support for LoongArch V1.1 approximate instructions. LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. LoongArch: Redefine pattern for xvfrecip/vfrecip

[PATCH v2 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled.

2023-12-04 Thread Jiahao Xu
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue instructions per cycle of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: *

[PATCH v2 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-12-04 Thread Jiahao Xu
When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations, for

[PATCH v2 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.

2023-12-04 Thread Jiahao Xu
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to .. (recip3): .. this. *

[PATCH v2 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions.

2023-12-04 Thread Jiahao Xu
This patch adds define_insn/builtins/intrinsics for these instructions, and add option -mfrecipe to control instruction generation. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (fecipe): Add. * config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.

[PATCH v2 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.

2023-12-04 Thread Jiahao Xu
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to .. (rsqrt2): .. this. *

[PATCH] LoongArch: Fix ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG directly.

2023-11-28 Thread Jiahao Xu
loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are not supported in gcc, it causes an ICE: ice.c:55:1: error: unrecognizable insn: 55 | } | ^ (insn 63 62 64 8 (set (reg:V4DI 278) (subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1

[PATCH] LoongArch: Fix lsx-vshuf.c and lasx-xvshuf_b.c tests fail on LA664 [PR112611]

2023-11-28 Thread Jiahao Xu
For [x]vshuf instructions, if the index value in the selector exceeds 63, it triggers undefined behavior on LA464, but not on LA664. To ensure compatibility of these two tests on both LA464 and LA664, we have modified both tests to ensure that the index value in the selector does not exceed 63.

Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Jiahao Xu
在 2023/11/29 上午10:33, Xi Ruoyao 写道: On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote: 在 2023/11/29 上午10:08, Xi Ruoyao 写道: On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote: diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f7796da10b2..9e9ce58cb53

Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Jiahao Xu
在 2023/11/29 上午10:08, Xi Ruoyao 写道: On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote: diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f7796da10b2..9e9ce58cb53 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md

[PATCH 0/5] LoongArch: Add -mrecip option support

2023-11-27 Thread Jiahao Xu
instructions by implementing '-mrecip' and '-mrecip='. Jiahao Xu (5): LoongArch: Add support for approximate instructions. LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. LoongArch: Redefine pattern for xvfrecip/vfrecip instructions. LoongArch: New options

[PATCH 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.

2023-11-27 Thread Jiahao Xu
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to .. (recip3): .. this. *

[PATCH 5/5] LoongArch: Vectorized loop unrolling is not performed on divf/sqrtf/rsqrtf with turns on -mrecip.

2023-11-27 Thread Jiahao Xu
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: * config/loongarch/loongarch.cc

[PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-27 Thread Jiahao Xu
When -mrecip option is turned on, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations for better throughput.

[PATCH 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions.

2023-11-27 Thread Jiahao Xu
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard pattern name. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to .. (*rsqrt2): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrsqrt_d): Redefine to standard

[PATCH 1/5] LoongArch: Add support for approximate instructions.

2023-11-27 Thread Jiahao Xu
LA664 introduces new instructions for reciprocal approximation and reciprocal square root approximation. It includes the scalar instructions frecipe and frsrte, as well as their corresponding vector instructions [x]vfrecipe and [x]vfrsqrte. This patch adds define_insn/builtins/intrinsics for

Re: [pushed][PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-23 Thread Jiahao Xu
在 2023/11/19 上午2:25, Xi Ruoyao 写道: On Fri, 2023-11-17 at 10:21 +0800, chenglulu wrote: Pushed to r14-5545. 在 2023/11/16 下午4:44, Jiahao Xu 写道: Based on SPEC2017 performance evaluation results, it's better to make them equal to the cost of unaligned store/load so as to avoid odd alignment

[PATCH] LoongArch: Add support for xorsign.

2023-11-19 Thread Jiahao Xu
This patch adds support for xorsign pattern to scalar fp and vector. With the new expands, uniformly using vector bitwise logical operations to handle xorsign. On LoongArch64, floating-point registers and vector registers share the same register, so this patch also allows conversion between LSX

[PATCH] LoongArch: Add support for xorsign.

2023-11-19 Thread Jiahao Xu
This patch adds support for xorsign pattern to scalar fp and vector. With the new expands, uniformly using vector bitwise logical operations to handle xorsign. On LoongArch64, floating-point registers and vector registers share the same register, so this patch also allows conversion between LSX

[PATCH] LoongArch: Add support for xorsign.

2023-11-19 Thread Jiahao Xu
This patch adds support for xorsign pattern to scalar fp and vector. With the new expands, uniformly using vector bitwise logical operations to handle xorsign. On LoongArch64, floating-point registers and vector registers share the same register, so this patch also allows conversion between LSX

[PATCH] LoongArch: Fix scan-assembler-times of lasx/lsx test case.

2023-11-16 Thread Jiahao Xu
These tests fail when they are first added,this patch adjusts the scan-assembler-times to fix them. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler times. * gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto. *

Re: [PATCH] LoongArch: Fix scan-assembler-times of lasx/lsx test case.

2023-11-16 Thread Jiahao Xu
after they were added?) On Thu, 2023-11-16 at 20:08 +0800, Jiahao Xu wrote: gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler times. * gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto. * gcc.target/loongarch/vector/lsx/lsx

[PATCH] LoongArch: Fix scan-assembler-times of lasx/lsx test case.

2023-11-16 Thread Jiahao Xu
gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler times. * gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto. * gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: Ditto. * gcc.target/loongarch/vector/lsx/lsx-vcond-2.c:

[PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-16 Thread Jiahao Xu
Based on SPEC2017 performance evaluation results, it's better to make them equal to the cost of unaligned store/load so as to avoid odd alignment peeling. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): Adjust. diff --git

[PATCH 1/2] LoongArch: Increase cost of vector aligned store/load.

2023-11-16 Thread Jiahao Xu
Based on SPEC2017 performance evaluation results, it's better to make them equal to the cost of unaligned store/load so as to avoid odd alignment peeling. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): Adjust. diff --git

[PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-15 Thread Jiahao Xu
Based on SPEC2017 performance evaluation results, making them equal to the cost of unaligned store/load to avoid odd alignment peeling is better. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): Adjust. diff --git

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu
If the vcond_mask patterns don't support fp modes, the vector FP comparison instructions will not be generated. gcc/ChangeLog: * config/loongarch/lasx.md (vcond_mask_): Change to (vcond_mask_): this. * config/loongarch/lsx.md (vcond_mask_): Change to

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu
If the vcond_mask patterns don't support fp modes, the vector FP comparison instructions will not be generated. gcc/ChangeLog: * config/loongarch/lasx.md (vcond_mask_): Change to (vcond_mask_): this. * config/loongarch/lsx.md (vcond_mask_): Change to

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu
If the vcond_mask patterns don't support fp modes, the vector FP comparison instructions will not be generated. gcc/ChangeLog: * config/loongarch/lasx.md (vcond_mask_): Change to (vcond_mask_): this. * config/loongarch/lsx.md (vcond_mask_): Change to

[PATCH] LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

2023-10-23 Thread Jiahao Xu
If the vcond_mask patterns don't support fp modes, the vector FP comparison instructions will not be generated. gcc/ChangeLog: * config/loongarch/lasx.md (vcond_mask_): Change to (vcond_mask_): this. * config/loongarch/lsx.md (vcond_mask_): Change to

[PATCH 2/3] LoongArch:Implement vec_widen standard names.

2023-10-15 Thread Jiahao Xu
Add support for vec_widen lo/hi patterns. These do not directly match on Loongarch lasx instructions but can be emulated with even/odd + vector merge. gcc/ChangeLog: * config/loongarch/lasx.md (vec_widen_add_hi_, vec_widen_add_lo_, vec_widen_sub_hi_, vec_widen_sub_lo_,

[PATCH 1/3] LoongArch:Implement avg and sad standard names.

2023-10-15 Thread Jiahao Xu
gcc/ChangeLog: * config/loongarch/lasx.md (avg3_floor, uavg3_floor, avg3_ceil, uavg3_ceil, ssadv16qi, usadv16qi): New patterns. * config/loongarch/lsx.md (avg3_floor, uavg3_floor, avg3_ceil, uavg3_ceil, ssadv16qi, usadv16qi): New patterns. gcc/testsuite/ChangeLog:

[PATCH 3/3] LoongArch:Implement the new vector cost model framework.

2023-10-15 Thread Jiahao Xu
This patch make loongarch use the new vector hooks and implements the costing function determine_suggested_unroll_factor, to make it be able to suggest the unroll factor for a given loop being vectorized base vec_ops analysis during vector costing and the available issue information. Referring to

[PATCH 0/3] Optimize loongarch vector implementation.

2023-10-15 Thread Jiahao Xu
determine_suggested_unroll_factor, to make it be able to suggest the unroll factor for a given loop being vectorized base vec_ops analysis during vector costing and the available issue information.The patch also adjusts cost model through performance analysis. Jiahao Xu (3): LoongArch:Implement