For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support
them now
and in the future, so this patch removes these unused code.
gcc/ChangeLog:
* config/loongarch/lasx.md: Remove unused code.
* config/loongarch/loongarch-protos.h (loongarch_split_lsx_copy_d):
For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support
them now
and in the future, so this patch removes these unused code.
This patch also adds sign/zero-extend operations to vpickve2gr.d to match
the actual
instruction behavior, and integrates the template definition of
在 2024/1/25 下午3:46, chenglulu 写道:
Jiahao:
Note that the LoongArch 'a' in the title needs to be capitalized.
I modified this patch and incorporated it first.
Thanks, I'll pay attention next time.
在 2024/1/24 下午5:19, Jiahao Xu 写道:
It is incorrect to use vld/vori to implement
在 2024/1/24 下午5:48, Xi Ruoyao 写道:
On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote:
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc
It is incorrect to use vld/vori to implement the vec_concatz because when
the LSX
instruction is used to update the value of the vector register, the upper 128
bits of
the vector register will not be zeroed.
gcc/ChangeLog:
* config/loongarch/lasx.md (@vec_concatz): Remove this
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
-tree-dump-times ch2 "Will duplicate
bb" 2
+FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid
sum" 0
在 2024/1/16 上午10:32, Jiahao Xu 写道:
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation in
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r
In r14-7022-34d339bbd0c1f5b4ad9587e7ae8387c912cb028b I implement pattern
vec_concatz, the reg+reg addressing mode is not supported in
vec_concatz. This patch fixes that.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_concatz): Fix pattern to
support reg+reg addressing mode.
For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.
(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
(vec_select:SF (reg:V8SF 32 $f0 [115])
(parallel [
(const_int 0 [0])
])))
This patch implenments more vec_init optabs that can handle two LSX vectors
producing a LASX
vector by concatenating them. When an lsx vector is concatenated with an LSX
const_vector of
zeroes, the vec_concatz pattern can be used effectively. For example as below
typedef short v8hi
For zero_extendqisi2 and zero_extendqidi2, use andi instead of bstrpick.w,
because andi is 6 times faster than bstrpick.w.
gcc/ChangeLog:
* config/loongarch/loongarch.md:
(zero_extend2): Rename to ..
(zero_extendhi2): .. this, use hi.
(zero_extendqihi2): Rename to
For instruction xvpermi.q, unused bits in operands[3] need be set to 0 to avoid
causing undefined behavior on LA464.
gcc/ChangeLog:
* config/loongarch/lasx.md: Set the unused bits in operand[3] to 0.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-xvpremi.c:
SPECCPU 2017 and SPECCPU 2006 successfully built and tested, and this
patch gives a 1.3% improvement in SPECCPU 2017 fprate on 3A6000, no
performance regression was found. This is an effective optimization and
looks good.
在 2023/12/15 下午4:57, Xi Ruoyao 写道:
We used a branch to load
When I attempt to enable vect_usad_char effective target for LoongArch,
slp-reduc-sad.c
and vect-reduc-sad*.c tests fail. These tests fail because the sad pattern
generates bad
code. This patch to fixed them, for sad patterns, use zero expansion instead of
sign
expansion for reduction.
When I attempt to enable vect_usad_char effective target for LoongArch, some
tests fail. These tests fail because the sad pattern generates bad code. This
patch to fixed them, for sad patterns, use zero expansion instead of sign
expansion for reduction.
Currently, we are fixing failed vectorized
The implementation of this patch has some issues. When I compile 521.wrf
with -Ofast -mlasx -flto -muse-movcf2gr, it results in an ICE:
during RTL pass: reload
module_mp_fast_sbm.fppized.f90: In function 'fast_sbm.constprop':
module_mp_fast_sbm.fppized.f90:1369:25: internal compiler error:
在 2023/12/13 下午2:21, Xi Ruoyao 写道:
On Wed, 2023-12-13 at 14:17 +0800, Jiahao Xu wrote:
This test was extracted from the hot functions of 526.blender_r. Setting
LOGICAL_OP_NON_SHORT_CIRCUIT to 0 resulted in a 26% decrease in dynamic
instruction count and a 13.4% performance improvement. After
在 2023/12/13 上午2:27, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 20:39 +0800, Xi Ruoyao wrote:
On Tue, 2023-12-12 at 19:59 +0800, Jiahao Xu wrote:
I guess here the problem is floating-point compare instruction is much
more costly than other instructions but the fact is not correctly
modeled yet
在 2023/12/12 下午7:26, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 19:14 +0800, Jiahao Xu wrote:
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
This gives a 1.8% improvement in SPECCPU 2017 fprate
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000.
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT):
在 2023/12/12 下午6:05, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 17:50 +0800, Jiahao Xu wrote:
diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c
b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
new file mode 100644
index 000..2cef0193466
--- /dev/null
+++ b/gcc/testsuite
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000.
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT):
在 2023/12/6 下午3:04, Jiahao Xu 写道:
LoongArch V1.1 adds support for approximate instructions, which are utilized
along with additional
Newton-Raphson steps implement single precision floating-point division, square
root and reciprocal
square root operations for better throughput.
The patches
When both the -mrecip and -mfrecipe options are enabled, use approximate
reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division,
square
root and reciprocal square root operations, for
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(rsqrt2): .. this.
*
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions
per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.
gcc/ChangeLog:
*
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec,
and enable
[x]vfrecip instructions to be generated during auto-vectorization.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
*
This patch adds define_insn/builtins/intrinsics for these instructions, and add
option
-mfrecipe to control instruction generation.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html
Jiahao Xu (5):
LoongArch: Add support for LoongArch V1.1 approximate instructions.
LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
LoongArch: Redefine pattern for xvfrecip/vfrecip instructions
on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html.
Jiahao Xu (5):
LoongArch: Add support for LoongArch V1.1 approximate instructions.
LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
LoongArch: Redefine pattern for xvfrecip/vfrecip
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions
per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.
gcc/ChangeLog:
*
When both the -mrecip and -mfrecipe options are enabled, use approximate
reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division,
square
root and reciprocal square root operations, for
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec,
and enable
[x]vfrecip instructions to be generated during auto-vectorization.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
*
This patch adds define_insn/builtins/intrinsics for these instructions, and add
option
-mfrecipe to control instruction generation.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(rsqrt2): .. this.
*
loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are
not supported
in gcc, it causes an ICE:
ice.c:55:1: error: unrecognizable insn:
55 | }
| ^
(insn 63 62 64 8 (set (reg:V4DI 278)
(subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1
For [x]vshuf instructions, if the index value in the selector exceeds 63, it
triggers
undefined behavior on LA464, but not on LA664. To ensure compatibility of these
two
tests on both LA464 and LA664, we have modified both tests to ensure that the
index
value in the selector does not exceed 63.
在 2023/11/29 上午10:33, Xi Ruoyao 写道:
On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:
在 2023/11/29 上午10:08, Xi Ruoyao 写道:
On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53
在 2023/11/29 上午10:08, Xi Ruoyao 写道:
On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
instructions by implementing '-mrecip' and '-mrecip='.
Jiahao Xu (5):
LoongArch: Add support for approximate instructions.
LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
LoongArch: New options
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec,
and enable
[x]vfrecip instructions to be generated during auto-vectorization.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
*
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue of the
LoongArch, so vectorized
loop unrolling is not performed on them.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
When -mrecip option is turned on, use approximate reciprocal instructions and
approximate
reciprocal square root instructions with additional Newton-Raphson steps to
implement
single precision floating-point division, square root and reciprocal square
root operations
for better throughput.
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(*rsqrt2): .. this.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vfrsqrt_d): Redefine to standard
LA664 introduces new instructions for reciprocal approximation and reciprocal
square
root approximation. It includes the scalar instructions frecipe and frsrte, as
well
as their corresponding vector instructions [x]vfrecipe and [x]vfrsqrte. This
patch
adds define_insn/builtins/intrinsics for
在 2023/11/19 上午2:25, Xi Ruoyao 写道:
On Fri, 2023-11-17 at 10:21 +0800, chenglulu wrote:
Pushed to r14-5545.
在 2023/11/16 下午4:44, Jiahao Xu 写道:
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle
xorsign.
On LoongArch64, floating-point registers and vector registers share the same
register,
so this patch also allows conversion between LSX
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle
xorsign.
On LoongArch64, floating-point registers and vector registers share the same
register,
so this patch also allows conversion between LSX
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle
xorsign.
On LoongArch64, floating-point registers and vector registers share the same
register,
so this patch also allows conversion between LSX
These tests fail when they are first added,this patch adjusts the
scan-assembler-times
to fix them.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
*
after they were added?)
On Thu, 2023-11-16 at 20:08 +0800, Jiahao Xu wrote:
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c:
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
diff --git
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
diff --git
Based on SPEC2017 performance evaluation results, making them equal to the
cost of unaligned store/load to avoid odd alignment peeling is better.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
diff --git
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
Add support for vec_widen lo/hi patterns. These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_widen_add_hi_,
vec_widen_add_lo_,
vec_widen_sub_hi_, vec_widen_sub_lo_,
gcc/ChangeLog:
* config/loongarch/lasx.md (avg3_floor, uavg3_floor,
avg3_ceil, uavg3_ceil, ssadv16qi, usadv16qi): New patterns.
* config/loongarch/lsx.md (avg3_floor, uavg3_floor,
avg3_ceil, uavg3_ceil, ssadv16qi, usadv16qi): New patterns.
gcc/testsuite/ChangeLog:
This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to
determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information.The patch also adjusts cost
model through performance analysis.
Jiahao Xu (3):
LoongArch:Implement
64 matches
Mail list logo