Re: [PATCH] LoongArch: Increase cost of vector aligned store/load.
On 11/16/23 14:17, Jiahao Xu wrote: Based on SPEC2017 performance evaluation results, making them equal to the cost of unaligned store/load to avoid odd alignment peeling is better. Paraphrasing a bit to shorten the subject of the sentence: "it's better to make them equal to ... so as to avoid odd-alignment peeling" gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): Adjust. diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 738911661d7..d05743bec87 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3893,11 +3893,9 @@ loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, case scalar_stmt: case scalar_load: case vector_stmt: - case vector_load: case vec_to_scalar: case scalar_to_vec: case scalar_store: - case vector_store: return 1; case vec_promote_demote: @@ -3905,6 +3903,8 @@ loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, return LASX_SUPPORTED_MODE_P (mode) && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1; + case vector_load: + case vector_store: case unaligned_load: case unaligned_store: return 2;
Re: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc
On 10/17/23 22:06, Xi Ruoyao wrote: During the review of a LLVM change [1], on LA464 we found that zeroing "an" LLVM change (because the word LLVM is pronounced letter-by-letter) a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0. Similarly, "an" fcc [1]: https://github.com/llvm/llvm-project/pull/69300 gcc/ChangeLog: * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for zeroing a fcc. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 68897799505..743e75907a6 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2151,7 +2151,7 @@ (define_insn "movfcc" [(set (match_operand:FCC 0 "register_operand" "=z") (const_int 0))] "" - "movgr2cf\t%0,$r0") + "fcmp.caf.s\t%0,$f0,$f0") ;; Conditional move instructions. Trivial enough, so this LGTM apart from the grammatical nits. (Whoever pushing this patch could simply amend it themselves so maybe there's no need for a v2.) Thanks!
Re: [PATCH] LoongArch: Fix lo_sum rtx cost
Hi, On 9/16/23 17:16, mengqinggang wrote: The cost of lo_sum rtx for addi.d instruction my be a very big number if computed by common function. It may cause some symbols saving to stack and loading from stack if there no enough registers during loop optimization. Thanks for the patch! It seems though this change is done in order to optimize some previously pathetic codegen, am I right? If so, it's appreciated to have a minimal test case attached, in order to ensure that codegen never regresses. (You can have your teammates help you if you're not familiar with that.) gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost. --- gcc/config/loongarch/loongarch.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 845fad5a8e8..0e57f09379c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = COSTS_N_INSNS (4); return false; +case LO_SUM: + *total = set_src_cost (XEXP (x, 0), mode, speed); + return true; + In order for the code to be more maintainable, it may be better to duplicate some of the change reasons here, just in case someone in the future questions this piece of code that's without any explanation, and regresses things (because there's no test case). case LT: case LTU: case LE:
Re: [PATCH 1/2] LoongArch: Optimize switch with sign-extended index.
On 9/2/23 14:24, Lulu Cheng wrote: The patch refers to the submission of RISCV 7bbce9b50302959286381d9177818642bceaf301. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_extend_comparands): In unsigned QImode test, check for sign extended subreg and/or constant operands, and do a sign extend in that case. "do a sign extension" * config/loongarch/loongarch.md (TARGET_64BIT): Define template cbranchqi4. gcc/testsuite/ChangeLog: * gcc.target/loongarch/switch-qi.c: New test. --- gcc/config/loongarch/loongarch.cc | 14 -- gcc/config/loongarch/loongarch.md | 8 ++-- gcc/testsuite/gcc.target/loongarch/switch-qi.c | 16 3 files changed, 34 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/switch-qi.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index c72229cad87..7e300c826cf 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -4228,8 +4228,18 @@ loongarch_extend_comparands (rtx_code code, rtx *op0, rtx *op1) /* Comparisons consider all XLEN bits, so extend sub-XLEN values. */ if (GET_MODE_SIZE (word_mode) > GET_MODE_SIZE (GET_MODE (*op0))) { - /* TODO: checkout It is more profitable to zero-extend QImode values. */ - if (unsigned_condition (code) == code && GET_MODE (*op0) == QImode) + /* It is more profitable to zero-extend QImode values. But not if the +first operand has already been sign-extended, and the second one is +is a constant or has already been sign-extended also. */ + if (unsigned_condition (code) == code + && (GET_MODE (*op0) == QImode + && ! (GET_CODE (*op0) == SUBREG + && SUBREG_PROMOTED_VAR_P (*op0) + && SUBREG_PROMOTED_SIGNED_P (*op0) + && (CONST_INT_P (*op1) + || (GET_CODE (*op1) == SUBREG + && SUBREG_PROMOTED_VAR_P (*op1) + && SUBREG_PROMOTED_SIGNED_P (*op1)) { *op0 = gen_rtx_ZERO_EXTEND (word_mode, *op0); if (CONST_INT_P (*op1)) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index b37e070660f..1bb4e461b38 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2733,11 +2733,15 @@ (define_insn "*branch_equality_inverted" [(set_attr "type" "branch")]) +;; Branches operate on XLEN-sized quantities, but for LoongArch64 we accept LoongArch literature refers to "XLEN" as "GRLEN". Otherwise the patch is fine, thanks ;-) +;; QImode values so we can force zero-extension. +(define_mode_iterator BR [(QI "TARGET_64BIT") SI (DI "TARGET_64BIT")]) + (define_expand "cbranch4" [(set (pc) (if_then_else (match_operator 0 "comparison_operator" - [(match_operand:GPR 1 "register_operand") -(match_operand:GPR 2 "nonmemory_operand")]) + [(match_operand:BR 1 "register_operand") +(match_operand:BR 2 "nonmemory_operand")]) (label_ref (match_operand 3 "")) (pc)))] "" diff --git a/gcc/testsuite/gcc.target/loongarch/switch-qi.c b/gcc/testsuite/gcc.target/loongarch/switch-qi.c new file mode 100644 index 000..dd192fd497f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/switch-qi.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-march=loongarch64 -mabi=lp64d" } */ +/* { dg-final { scan-assembler-not "bstrpick" } } */ + +/* Test for loongarch_extend_comparands patch. */ +extern void asdf (int); +void +foo (signed char x) { +switch (x) { + case 0: asdf (10); break; + case 1: asdf (11); break; + case 2: asdf (12); break; + case 3: asdf (13); break; + case 4: asdf (14); break; +} +}
Re: [PATCH v1] LoongArch: Remove the symbolic extension instruction due to the SLT directive.
On 8/25/23 12:01, Lulu Cheng wrote: Since the slt instruction does not distinguish between 32-bit and 64-bit operations under the LoongArch 64-bit architecture, if the operands of slt are of SImode, symbol expansion is required before operation. Hint:“符号扩展” is "sign extension" (as noun) or "sign-extend" (as verb), not "symbol expansion". But similar to the following test case, symbol expansion can be omitted: extern int src1, src2, src3; int test (void) { int data1 = src1 + src2; int data2 = src1 + src3; return test1 > test2 ? test1 : test2; } Assembly code before optimization: ... add.w $r4,$r4,$r14 add.w $r13,$r13,$r14 slli.w $r12,$r4,0 slli.w $r14,$r13,0 slt $r12,$r12,$r14 masknez $r4,$r4,$r12 maskeqz $r12,$r13,$r12 or $r4,$r4,$r12 slli.w $r4,$r4,0 ... After optimization: ... add.w $r12,$r12,$r14 add.w $r13,$r13,$r14 slt $r4,$r12,$r13 masknez $r12,$r12,$r4 maskeqz $r4,$r13,$r4 or $r4,$r12,$r4 ... Similar to this test example, the two operands of SLT are obtained by the addition operation, and the addition operation "add.w" is an implicit symbolic extension function, so the two operands of SLT do not require more naturally: "and add.w implicitly sign-extends" -- brevity are often desired and clearer ;-) symbolic expansion. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_conditional_move): Optimize the function implementation. gcc/testsuite/ChangeLog: * gcc.target/loongarch/slt-sign-extend.c: New test. --- gcc/config/loongarch/loongarch.cc | 53 +-- .../gcc.target/loongarch/slt-sign-extend.c| 14 + 2 files changed, 63 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/slt-sign-extend.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 86d58784113..1905599b9e8 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -4384,14 +4384,30 @@ loongarch_expand_conditional_move (rtx *operands) enum rtx_code code = GET_CODE (operands[1]); rtx op0 = XEXP (operands[1], 0); rtx op1 = XEXP (operands[1], 1); + rtx op0_extend = op0; + rtx op1_extend = op1; + + /* Record whether operands[2] and operands[3] modes are promoted to word_mode. */ + bool promote_p = false; + machine_mode mode = GET_MODE (operands[0]); if (FLOAT_MODE_P (GET_MODE (op1))) loongarch_emit_float_compare (, , ); else { + if ((REGNO (op0) == REGNO (operands[2]) + || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx))) + && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)) + { + mode = word_mode; + promote_p = true; + } + loongarch_extend_comparands (code, , ); op0 = force_reg (word_mode, op0); + op0_extend = op0; + op1_extend = force_reg (word_mode, op1); if (code == EQ || code == NE) { @@ -4418,23 +4434,52 @@ loongarch_expand_conditional_move (rtx *operands) && register_operand (operands[2], VOIDmode) && register_operand (operands[3], VOIDmode)) { - machine_mode mode = GET_MODE (operands[0]); + rtx op2 = operands[2]; + rtx op3 = operands[3]; + + if (promote_p) + { + if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2])) + op2 = op0_extend; + else + { + loongarch_extend_comparands (code, , _rtx); + op2 = force_reg (mode, op2); + } + + if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3])) + op3 = op1_extend; + else + { + loongarch_extend_comparands (code, , _rtx); + op3 = force_reg (mode, op3); + } + } + rtx temp = gen_reg_rtx (mode); rtx temp2 = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (temp, gen_rtx_IF_THEN_ELSE (mode, cond, - operands[2], const0_rtx))); + op2, const0_rtx))); /* Flip the test for the second operand. */ cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, GET_MODE (op0), op0, op1); emit_insn (gen_rtx_SET (temp2, gen_rtx_IF_THEN_ELSE (mode, cond, - operands[3], const0_rtx))); + op3, const0_rtx))); /* Merge the two results, at least one is guaranteed to be zero. */ - emit_insn (gen_rtx_SET (operands[0], gen_rtx_IOR (mode, temp, temp2))); + if (promote_p) + { + rtx temp3 = gen_reg_rtx (mode); +
Re: [PATCH v1 2/6] LoongArch: Added Loongson SX base instruction support.
On 2023/6/30 10:16, Chenghui Pan wrote: [snip] --- gcc/config/loongarch/constraints.md| 128 +- gcc/config/loongarch/loongarch-builtins.cc | 10 + gcc/config/loongarch/loongarch-modes.def | 38 + gcc/config/loongarch/loongarch-protos.h| 31 + gcc/config/loongarch/loongarch.cc | 2235 +- gcc/config/loongarch/loongarch.h | 65 +- gcc/config/loongarch/loongarch.md | 44 +- gcc/config/loongarch/lsx.md| 4490 gcc/config/loongarch/predicates.md | 333 +- 9 files changed, 7184 insertions(+), 190 deletions(-) create mode 100644 gcc/config/loongarch/lsx.md diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md index 7a38cd07ae9..1dd56af07c4 100644 --- a/gcc/config/loongarch/constraints.md +++ b/gcc/config/loongarch/constraints.md @@ -30,8 +30,7 @@ ;; "h" <-unused ;; "i" "Matches a general integer constant." (Global non-architectural) ;; "j" SIBCALL_REGS -;; "k" "A memory operand whose address is formed by a base register and -;; (optionally scaled) index register." +;; "k" <-unused ;; "l" "A signed 16-bit constant." ;; "m" "A memory operand whose address is formed by a base register and offset ;; that is suitable for use in instructions with the same addressing mode @@ -80,13 +79,14 @@ ;; "N" <-unused ;; "O" <-unused ;; "P" <-unused -;; "Q" <-unused +;; "Q" "A signed 12-bit constant" ;; "R" <-unused ;; "S" <-unused ;; "T" <-unused ;; "U" <-unused ;; "V" "Matches a non-offsettable memory reference." (Global non-architectural) -;; "W" <-unused +;; "W" "A memory address based on a member of @code{BASE_REG_CLASS}. This is +;; true for all references." ;; "X" "Matches anything." (Global non-architectural) ;; "Y" - ;;"Yd" @@ -214,6 +214,63 @@ (define_constraint "Le" (and (match_code "const_int") (match_test "loongarch_addu16i_imm12_operand_p (ival, SImode)"))) +(define_constraint "M" + "A constant that cannot be loaded using @code{lui}, @code{addiu} + or @code{ori}." + (and (match_code "const_int") + (not (match_test "IMM12_OPERAND (ival)")) + (not (match_test "IMM12_OPERAND_UNSIGNED (ival)")) + (not (match_test "LU12I_OPERAND (ival)" + +(define_constraint "N" + "A constant in the range -65535 to -1 (inclusive)." + (and (match_code "const_int") + (match_test "ival >= -0x && ival < 0"))) + +(define_constraint "O" + "A signed 15-bit constant." + (and (match_code "const_int") + (match_test "ival >= -0x4000 && ival < 0x4000"))) + +(define_constraint "P" + "A constant in the range 1 to 65535 (inclusive)." + (and (match_code "const_int") + (match_test "ival > 0 && ival < 0x1"))) These constraints are meant to be exposed for developers to use, right? If not so they should probably be marked "@internal", and if so you should update the docs as well. Also these are not documented in the comment block at the top of file. + +;; General constraints + +(define_memory_constraint "R" + "An address that can be used in a non-macro load or store." + (and (match_code "mem") + (match_test "loongarch_address_insns (XEXP (op, 0), mode, false) == 1"))) Similarly, is this "R" constraint meant to be exposed as well? Sure one's free to choose letters but "R" IMO strongly implies something related to registers, not addresses... +(define_constraint "S" + "@internal + A constant call address." + (and (match_operand 0 "call_insn_operand") + (match_test "CONSTANT_P (op)"))) Additionally, IMO we probably should minimize our use of single-letter constraints that don't overlap with other architectures' similar usage. (I know that several projects have accepted LSX/LASX code well ahead of this series, but I don't know off my head if their code used any inline asm instead of C intrinsics. Intuitively this shouldn't be a concern though.) Overall, I'd recommend moving all single-letter constraints added here to a two-letter namespace, so everything is better namespaced and easier to remember (e.g. if we choose something like "Vx" or "Yx" for everything vector-related, it'd be a lot easier to mentally associate the two-letter incantations with correct semantics.) + +(define_constraint "YG" + "@internal + A vector zero." + (and (match_code "const_vector") + (match_test "op == CONST0_RTX (mode)"))) + +(define_constraint "YA" + "@internal + An unsigned 6-bit constant." + (and (match_code "const_int") + (match_test "UIMM6_OPERAND (ival)"))) + +(define_constraint "YB" + "@internal + A signed 10-bit constant." + (and (match_code "const_int") + (match_test "IMM10_OPERAND (ival)"))) + +(define_constraint "Yb" + "@internal" + (match_operand 0 "qi_mask_operand")) + (define_constraint "Yd" "@internal A constant @code{move_operand} that can be safely
Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]
Hi, On 6/15/23 17:03, Xi Ruoyao wrote: Xuerui: I guess this makes it sensible to show "ret" instead of "jirl $zero, $ra, 0" in objdump -d output, but I don't know how to implement it. Do you have some idea? Thanks for the suggestion! Actually I have previously made this patch series [1] which included just that. But the Loongson maintainers said they're working on linker relaxation at that time so they would have to postpone processing it, and I've never had a review since then; it's expected to conflict with the relaxation patches so some rebasing would be needed, but IIRC all review comments should be addressed. You can take the series if you'd like to ;-) [1]: https://sourceware.org/pipermail/binutils/2023-February/126088.html On Thu, 2023-06-15 at 16:27 +0800, Lulu Cheng wrote: Pushed to trunk and gcc-12 gcc-13. r14-1866 r13-7448 r12-9698 在 2023/6/15 上午9:30, Lulu Cheng 写道: Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" would interfere with both subroutine return prediction and the more general indirect branch prediction. Therefore, a problem like PR110136 can cause a significant increase in branch error prediction rate and affect performance. The same problem exists with "indirect_jump". gcc/ChangeLog: * config/loongarch/loongarch.md: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e". Co-authored-by: Andrew Pinski --- v1 -> v2: 1. Modify the description. 2. Modify the register constraints of the template "indirect_jump". v2 -> v3: 1. Modify the description. --- gcc/config/loongarch/loongarch.md | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 816a943d155..b37e070660f 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2895,6 +2895,10 @@ (define_insn "*jump_pic" } [(set_attr "type" "branch")]) +;; Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine", +;; non-returning indirect jumps through $ra would interfere with both subroutine +;; return prediction and the more general indirect branch prediction. + (define_expand "indirect_jump" [(set (pc) (match_operand 0 "register_operand"))] "" @@ -2905,7 +2909,7 @@ (define_expand "indirect_jump" }) (define_insn "@indirect_jump" - [(set (pc) (match_operand:P 0 "register_operand" "r"))] + [(set (pc) (match_operand:P 0 "register_operand" "e"))] "" "jr\t%0" [(set_attr "type" "jump") @@ -2928,7 +2932,7 @@ (define_expand "tablejump" (define_insn "@tablejump" [(set (pc) - (match_operand:P 0 "register_operand" "r")) + (match_operand:P 0 "register_operand" "e")) (use (label_ref (match_operand 1 "" "")))] "" "jr\t%0"
Re: [PATCH v2] LoongArch: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e" [PR110136]
On 2023/6/8 10:27, Lulu Cheng wrote: Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" would interfere with both subroutine return prediction and the more general indirect branch prediction. Therefore, a problem like PR110136 can cause a significant increase in branch error prediction rate and affect performance. The same problem exists with "indirect_jump". gcc/ChangeLog: * config/loongarch/loongarch.md: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e". Co-authored-by: Andrew Pinski --- v1 -> v2: 1. Modify the description 2. Modify the register constraints of the template "indirect_jump". --- gcc/config/loongarch/loongarch.md | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 816a943d155..43a2ecc8957 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2895,6 +2895,10 @@ (define_insn "*jump_pic" } [(set_attr "type" "branch")]) +;; Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine", +;; hence doing "jr $ra" would interfere with both subroutine return prediction and Not all cases of "doing 'jr $ra'" are harmful, obviously. Paraphrasing it like e.g. "non-returning indirect jumps through $ra" would be better. You could simplify the patch title a lot with this too: "Avoid non-returning indirect jumps through $ra" is shorter and does not duplicate the ChangeLog message. +;; the more general indirect branch prediction. + (define_expand "indirect_jump" [(set (pc) (match_operand 0 "register_operand"))] "" @@ -2905,7 +2909,7 @@ (define_expand "indirect_jump" }) (define_insn "@indirect_jump" - [(set (pc) (match_operand:P 0 "register_operand" "r"))] + [(set (pc) (match_operand:P 0 "register_operand" "e"))] "" "jr\t%0" [(set_attr "type" "jump") @@ -2928,7 +2932,7 @@ (define_expand "tablejump" (define_insn "@tablejump" [(set (pc) - (match_operand:P 0 "register_operand" "r")) + (match_operand:P 0 "register_operand" "e")) (use (label_ref (match_operand 1 "" "")))] "" "jr\t%0"
Re: [PATCH] LoongArch: Change jumptable's register constraint to 'q' [PR110136]
On 2023/6/7 11:36, Lulu Cheng wrote: 在 2023/6/7 上午11:26, WANG Xuerui 写道: Hi, On 2023/6/7 10:31, Lulu Cheng wrote: If the $ra register is modified during the jump to the jump table, the hardware branch prediction function will be broken, resulting in a significant increase in the branch false prediction rate and affecting performance. Thanks for the insight! This is the kind of improvement that will probably become a lot harder to even *sight* without uarch details available. However, I think it's better to also include a minimized test case to ensure the compiled code doesn't regress. (Comparison of relevant statistics, e.g. output of perf stat, would be even nicer to have!) There was no way I could find a small test case that would replicate this problem. This occurs when compiling spec2006 400.perlbench. And it only appears when '-flto' is added.:-( But I paid for reproducible programs and compilation methods under the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136. Ah okay. I missed the bug number in the title initially. After reading through the context, I think the reason for avoiding $ra (in general) should be that the micro-architecture unconditionally treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" would interfere with *both* subroutine return prediction *and* the more general indirect branch prediction; am I right? This could be more informative than merely saying "HW branch prediction will be broken". gcc/ChangeLog: * config/loongarch/loongarch.md: Change register constraint to 'q'. --- gcc/config/loongarch/loongarch.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 816a943d155..f9b64173104 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2926,9 +2926,11 @@ (define_expand "tablejump" DONE; }) +;; Jump to the jump table Avoid using the $r1 register to prevent +;; affecting hardware branch prediction. (define_insn "@tablejump" [(set (pc) - (match_operand:P 0 "register_operand" "r")) + (match_operand:P 0 "register_operand" "q")) (use (label_ref (match_operand 1 "" "")))] "" "jr\t%0"
Re: [PATCH] LoongArch: Change jumptable's register constraint to 'q' [PR110136]
Hi, On 2023/6/7 10:31, Lulu Cheng wrote: If the $ra register is modified during the jump to the jump table, the hardware branch prediction function will be broken, resulting in a significant increase in the branch false prediction rate and affecting performance. Thanks for the insight! This is the kind of improvement that will probably become a lot harder to even *sight* without uarch details available. However, I think it's better to also include a minimized test case to ensure the compiled code doesn't regress. (Comparison of relevant statistics, e.g. output of perf stat, would be even nicer to have!) gcc/ChangeLog: * config/loongarch/loongarch.md: Change register constraint to 'q'. --- gcc/config/loongarch/loongarch.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 816a943d155..f9b64173104 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2926,9 +2926,11 @@ (define_expand "tablejump" DONE; }) +;; Jump to the jump table Avoid using the $r1 register to prevent +;; affecting hardware branch prediction. (define_insn "@tablejump" [(set (pc) - (match_operand:P 0 "register_operand" "r")) + (match_operand:P 0 "register_operand" "q")) (use (label_ref (match_operand 1 "" "")))] "" "jr\t%0"
Re: [PATCH] LoongArch: Fix the problem of structure parameter passing in C++. This structure has empty structure members and less than three floating point members.
On 2023/5/25 10:46, Lulu Cheng wrote: 在 2023/5/25 上午4:15, Jason Merrill 写道: On Wed, May 24, 2023 at 5:00 AM Jonathan Wakely via Gcc-patches mailto:gcc-patches@gcc.gnu.org>> wrote: On Wed, 24 May 2023 at 09:41, Xi Ruoyao wrote: > Wang Lei raised some concerns about Itanium C++ ABI, so let's ask a C++ > expert here... > > Jonathan: AFAIK the standard and the Itanium ABI treats an empty class > as size 1 Only as a complete object, not as a subobject. Also as a data member subobject. > in order to guarantee unique address, so for the following: > > class Empty {}; > class Test { Empty empty; double a, b; }; There is no need to have a unique address here, so Test::empty and Test::a have the same address. It's a potentially-overlapping subobject. For the Itanium ABI, sizeof(Test) == 2 * sizeof(double). That would be true if Test::empty were marked [[no_unique_address]], but without that attribute, sizeof(Test) is actually 3 * sizeof(double). > When we pass "Test" via registers, we may only allocate the registers > for Test::a and Test::b, and complete ignore Test::empty because there > is no addresses of registers. Is this correct or not? I think that's a decision for the loongarch psABI. In principle, there's no reason a register has to be used to pass Test::empty, since you can't read from it or write to it. Agreed. The Itanium C++ ABI has nothing to say about how registers are allocated for parameter passing; this is a matter for the psABI. And there is no need for a psABI to allocate a register for Test::empty because it contains no data. In the x86_64 psABI, Test above is passed in memory because of its size ("the size of the aggregate exceeds two eightbytes..."). But struct Test2 { Empty empty; double a; }; is passed in a single floating-point register; the Test2::empty subobject is not passed anywhere, because its eightbyte is classified as NO_CLASS, because there is no actual data there. I know nothing about the LoongArch psABI, but going out of your way to assign a register to an empty class seems like a mistake. MIPS64 and ARM64 also allocate parameter registers for empty structs. https://godbolt.org/z/jT4cY3T5o Our original intention is not to pass this empty structure member, but to make the following two structures treat empty structure members in the same way in the process of passing parameters. struct st1 { struct empty {} e1; long a; long b; }; struct st2 { struct empty {} e1; double f0; double f1; }; Then shouldn't we try to avoid the extra register in all cases, instead of wasting one regardless? ;-)
Re: [PATCH] LoongArch: Enable shrink wrapping
On 2023/4/26 18:14, Lulu Cheng wrote: 在 2023/4/26 下午6:02, WANG Xuerui 写道: On 2023/4/26 17:53, Lulu Cheng wrote: Hi, ruoyao: The performance of spec2006 is finished. The fixed-point 400.perlbench has about 3% performance improvement, and the other basics have not changed, and the floating-point tests have basically remained the same. Nice to know! Do you have any questions about the test cases mentioned by Guo Jie? If there is no problem, modify the test case, I think the code can be merged into the main branch. BTW what about the previous function/loop alignment patches? The LLVM changes are also waiting for such results. ;-) Well, there are many combinations in this align test, so the test time will be very long. I will reply the result as soon as the test results come out.:-) Oh, I got. Thanks very much for all the tests and take your time!
Re: [PATCH] LoongArch: Enable shrink wrapping
On 2023/4/26 17:53, Lulu Cheng wrote: Hi, ruoyao: The performance of spec2006 is finished. The fixed-point 400.perlbench has about 3% performance improvement, and the other basics have not changed, and the floating-point tests have basically remained the same. Nice to know! Do you have any questions about the test cases mentioned by Guo Jie? If there is no problem, modify the test case, I think the code can be merged into the main branch. BTW what about the previous function/loop alignment patches? The LLVM changes are also waiting for such results. ;-)
Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops
On 2023/4/18 20:45, Xi Ruoyao wrote: On Tue, 2023-04-18 at 20:39 +0800, WANG Xuerui wrote: Hi, Thanks for helping confirming on GCC and porting this! I'd never know even GCC lacked this adaptation without someone actually checking... Too many things are taken for granted these days. On 2023/4/18 20:17, Xi Ruoyao wrote: According to Xuerui's LLVM changeset [1], doing so can make a significant performace gain. "doing so can gain significant performance" or "significant performance can be gained by ..."? Also the other important thing is, guaranteeing alignment also makes performance *more predictable* in addition to generally making things faster. You may want to mention this too somewhere. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for GCC 14? [1]:https://reviews.llvm.org/D148622 nit: one space after the colon. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_option_override_internal): If -falign-functions is used but the alignment is not explicitly specified, set it to 4 * loongarch_issue_rate (). Likewise for -falign-loops. --- gcc/config/loongarch/loongarch.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 06fc1cd0604..6552484de7c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6236,6 +6236,17 @@ loongarch_option_override_internal (struct gcc_options *opts) && !opts->x_optimize_size) opts->x_flag_prefetch_loop_arrays = 1; + /* Align functions and loops to (issue rate) * (insn size) to improve + the throughput of the fetching units. */ What about gating all of these on !opts->x_optimize_size, similar to what aarch64 does? opts->x_flag_align_functions and opts->x_flag_align_loops are only set with -O2 or above unless the user manually uses -falign-functions or - falign-loops. If the user uses "-Os -falign-functions" as CFLAGS I'd assume s(he) wants to optimize for size but keep the optimized function alignment. Ah, okay. Fine then. BTW I've also added some comments to the commit message that I forgot to review earlier. + char *align = XNEWVEC (char, 16); + sprintf (align, "%d", loongarch_issue_rate () * 4); + + if (opts->x_flag_align_functions && !opts->x_str_align_functions) + opts->x_str_align_functions = align; + + if (opts->x_flag_align_loops && !opts->x_str_align_loops) + opts->x_str_align_loops = align; + if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib) error ("%qs cannot be used for compiling a shared library", "-mdirect-extern-access"); Otherwise LGTM, thanks!
Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops
Hi, Thanks for helping confirming on GCC and porting this! I'd never know even GCC lacked this adaptation without someone actually checking... Too many things are taken for granted these days. On 2023/4/18 20:17, Xi Ruoyao wrote: According to Xuerui's LLVM changeset [1], doing so can make a significant performace gain. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for GCC 14? [1]:https://reviews.llvm.org/D148622 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_option_override_internal): If -falign-functions is used but the alignment is not explicitly specified, set it to 4 * loongarch_issue_rate (). Likewise for -falign-loops. --- gcc/config/loongarch/loongarch.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 06fc1cd0604..6552484de7c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6236,6 +6236,17 @@ loongarch_option_override_internal (struct gcc_options *opts) && !opts->x_optimize_size) opts->x_flag_prefetch_loop_arrays = 1; + /* Align functions and loops to (issue rate) * (insn size) to improve + the throughput of the fetching units. */ What about gating all of these on !opts->x_optimize_size, similar to what aarch64 does? + char *align = XNEWVEC (char, 16); + sprintf (align, "%d", loongarch_issue_rate () * 4); + + if (opts->x_flag_align_functions && !opts->x_str_align_functions) +opts->x_str_align_functions = align; + + if (opts->x_flag_align_loops && !opts->x_str_align_loops) +opts->x_str_align_loops = align; + if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib) error ("%qs cannot be used for compiling a shared library", "-mdirect-extern-access"); Otherwise LGTM, thanks!
Re: [PATCH] gcc-13: Add changelog for LoongArch.
Hi, Just some minor fixes ;-) On 2023/4/18 14:15, Lulu Cheng wrote: --- htdocs/gcc-13/changes.html | 39 ++ 1 file changed, 39 insertions(+) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index f3b9afed..c75e341b 100644 --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -563,6 +563,45 @@ a work-in-progress. +LoongArch + +New features + +The new command-line option -mexplicit-relocs decides whether + to use or not use the assembler relocation operator when dealing with "provides control over whether ..."? + symbolic addresses. -mexplicit-relocs is enabled by default + when GCC is configured to use a compatible assembler. "if a compatible assembler (binutils 2.40 or later) is present at GCC build time"? + +Avoid using the GOT to access external symbols when the new + -mdirect-extern-access command-line option is specified. "The new command-line option -mdirect-extern-access can be used to prevent accessing external symbols through GOT"? + +The https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Variable-Attributes.html#LoongArch-Variable-Attributes;>model + variable attributes has been added. "A new variable attribute ... has been added." + + +Built-in functions support Improvements "Improvements to built-in functions" or just "Built-in functions" as Gerald pointed out. + +The rint and copysign mathematical builtins + (and their float variants) are now inplemented as inline LoongArch intrinsics. + +The lrint, logb, scalbln, + scalbn and ldexp mathematical builtins (and their + float variants) are now inplemented as inline LoongArch intrinsics when using + -fno-math-errno. + +The lceil and lfloor mathematical builtins + (and their float variants) are now inplemented as inline LoongArch intrinsics + when using -ffp-int-builtin-inexact. For this part I agree with what Gerald has suggested, the expressions are kinda natural after you fix the misspelled "implemented"'s. + + +Subprojects Support + +libvtv now supports LoongArch architecture. +libitm now supports LoongArch architecture. Remove "architecture"? Because "LoongArch" practically just means "Loongson Architecture". + LoongArch supports address sanitizers other than hwasan and tsan. + "Address sanitizers other than HWASan and TSan are now supported on LoongArch"? + +
Re: [PATCH] LoongArch: Control all __crc* __crcc* builtin functions with macro __loongarch64.
On 2023/3/13 13:14, Xi Ruoyao wrote: On Mon, 2023-03-13 at 12:58 +0800, Lulu Cheng wrote: 在 2023/3/13 下午12:54, Xi Ruoyao 写道: On Mon, 2023-03-13 at 12:40 +0800, WANG Xuerui wrote: This is ugly. The fact all current LA32 models don't support CRC ops is just a coincidence; it's entirely possible for a future product iteration to introduce such functionality. It's not like the CRC*.W.W.W ops require anything wider than 32 bits, after all. Maybe the correct way would be adding a switch like MIPS -mcrc or x86 -mcrc32. Because these instructions are part of the LA64 base instruction set, there are no control options here. I'd postpone the change until we add LA32 support because there is no details about LA32 now and it's hard to determine how to gate this in a best way... There is, actually; the "reduced" LA32 ISA manual was released together with the LA64 one. And CRC ops are certainly absent there. However I agree that currently it's not as rigorously defined as LA64, partly in that no LA32 micro-architectures have respective user manuals available. And Loongson doesn't seem to actively push for LA32 Primary support either. (IMO LA32 Primary support should be *prioritized* because (a) it's trivial and (b) it's beneficial to bring students aboard as early as possible, especially given RISC-V's significant presence now )
Re: [PATCH] LoongArch: Control all __crc* __crcc* builtin functions with macro __loongarch64.
On 2023/3/13 11:52, Lulu Cheng wrote: LoongArch 32-bit instruction set does not support crc* and crcc* instructions. gcc/ChangeLog: * config/loongarch/larchintrin.h (__crc_w_b_w): Add macros for control. (__crc_w_h_w): Likewise. (__crc_w_w_w): Likewise. (__crcc_w_b_w): Likewise. (__crcc_w_h_w): Likewise. (__crcc_w_w_w): Likewise. * config/loongarch/loongarch.md: Add condition TARGET_64BIT to loongarch_crc_* loongarch_crcc_* instruction template. --- gcc/config/loongarch/larchintrin.h | 4 +--- gcc/config/loongarch/loongarch.md | 4 ++-- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/gcc/config/loongarch/larchintrin.h b/gcc/config/loongarch/larchintrin.h index e571ed27b37..09f9a5db846 100644 --- a/gcc/config/loongarch/larchintrin.h +++ b/gcc/config/loongarch/larchintrin.h @@ -145,6 +145,7 @@ __asrtgt_d (long int _1, long int _2) #error "Unsupported ABI." #endif +#ifdef __loongarch64 This is ugly. The fact all current LA32 models don't support CRC ops is just a coincidence; it's entirely possible for a future product iteration to introduce such functionality. It's not like the CRC*.W.W.W ops require anything wider than 32 bits, after all. But do we have to ifdef these out, after all? The only difference would be the error description ("name undefined" vs "intrinsic not supported" or something like that). IMO we could simply remove every ifdef like this and be done with it... /* Assembly instruction format: rd, rj, rk. */ /* Data types in instruction templates: SI, QI, SI. */ extern __inline int @@ -172,7 +173,6 @@ __crc_w_w_w (int _1, int _2) return (int) __builtin_loongarch_crc_w_w_w ((int) _1, (int) _2); } -#ifdef __loongarch64 /* Assembly instruction format: rd, rj, rk. */ /* Data types in instruction templates: SI, DI, SI. */ extern __inline int @@ -181,7 +181,6 @@ __crc_w_d_w (long int _1, int _2) { return (int) __builtin_loongarch_crc_w_d_w ((long int) _1, (int) _2); } -#endif /* Assembly instruction format: rd, rj, rk. */ /* Data types in instruction templates: SI, QI, SI. */ @@ -210,7 +209,6 @@ __crcc_w_w_w (int _1, int _2) return (int) __builtin_loongarch_crcc_w_w_w ((int) _1, (int) _2); } -#ifdef __loongarch64 /* Assembly instruction format: rd, rj, rk. */ /* Data types in instruction templates: SI, DI, SI. */ extern __inline int diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 3509c3c21c1..227f3c6899c 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -3539,7 +3539,7 @@ (define_insn "loongarch_crc_w__w" (unspec:SI [(match_operand:QHSD 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] UNSPEC_CRC))] - "" + "TARGET_64BIT" Gating on micro-architectures instead of bitness would be better, per the reasoning above. "crc.w..w\t%0,%1,%2" [(set_attr "type" "unknown") (set_attr "mode" "")]) @@ -3549,7 +3549,7 @@ (define_insn "loongarch_crcc_w__w" (unspec:SI [(match_operand:QHSD 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] UNSPEC_CRCC))] - "" + "TARGET_64BIT" "crcc.w..w\t%0,%1,%2" [(set_attr "type" "unknown") (set_attr "mode" "")])
Re: [PATCH] LoongArch: Change the value of macro TRY_EMPTY_VM_SPACE from 0x8000000000 to 0x1000000000.
On 2023/2/22 17:30, Lulu Cheng wrote: 在 2023/2/21 下午9:56, WANG Xuerui 写道: Hi, On 2023/2/21 21:03, Lulu Cheng wrote: 在 2023/2/21 下午3:41, Xi Ruoyao 写道: On Tue, 2023-02-21 at 15:20 +0800, Lulu Cheng wrote: Like la264 only has 40 effective bits of virtual address space. I'm OK with the change. But the VA length is configurable building the kernel. Is there any specific reason LA264 has to use the 40-bit configuration, or should we reword the commit message like "for supporting the configuration with less page table level or smaller page size"? I consulted with my colleagues who are working on the kernel, it looks like this: The la264 chip desgn is physically 40-bit virtual address. User mode and kernel mode each account for half: User mode : 0x0-0x7f Kernel mode: 0x ff80 -0x The high bit is the sign extension of bit39. Looking at the comments around the TRY_EMPTY_VM_SPACE definitions, they all indicate that the guessed range should be "likely free" -- that implies "usable". Given the common VM allocation behavior, we want TRY_EMPTY_VM_SPACE to point at a reasonably high place in the VA so it's "likely free". So IMO the point is, will there be any LoongArch HW in the foreseeable future, with less than maybe 40 bits of VA? If the answer is "no" then a TRY_EMPTY_VM_SPACE near the 40-bit VA ceiling would be appropriate. Otherwise you may have to choose a value near or even below a 32-bit VA's upper limit: according to the ISA manual Volume 1, Section 2.1.5, "typical VALEN is in the range of [40, 48]"; also see Section 5.2.3, RDVA can be as large as 8, so the actual VA space could theoretically be as narrow as 40-8=32 bits. Yes, I agree with your point of view this is looking for a "likely free" virtual memory space. But if I want to support chips with less than 40-bit virtual addresses, then the value of this macro needs to be set small. I think setting this value small will increase the probability of virtual address mapping failure. Not exactly; in case the TYPE_EMPTY_VM_SPACE address happen to be occupied, the mmap will still return something else that's nonzero (consult mmap's man page for details), and will not just blow the process up straight away. But... Chips with less than 40-bit virtual address space are small chips for embedded use. The purpose of pch is to make compilation faster, but I think we rarely compile on embedded systems. So this situation may not be within our consideration. Everything makes more sense with this context. Now put these justification into the commit message (after a little bit of rewording maybe) and I think we're good to go then ;-)
Re: [PATCH] LoongArch: Change the value of macro TRY_EMPTY_VM_SPACE from 0x8000000000 to 0x1000000000.
Hi, On 2023/2/21 21:03, Lulu Cheng wrote: 在 2023/2/21 下午3:41, Xi Ruoyao 写道: On Tue, 2023-02-21 at 15:20 +0800, Lulu Cheng wrote: Like la264 only has 40 effective bits of virtual address space. I'm OK with the change. But the VA length is configurable building the kernel. Is there any specific reason LA264 has to use the 40-bit configuration, or should we reword the commit message like "for supporting the configuration with less page table level or smaller page size"? I consulted with my colleagues who are working on the kernel, it looks like this: The la264 chip desgn is physically 40-bit virtual address. User mode and kernel mode each account for half: User mode : 0x0-0x7f Kernel mode: 0x ff80 -0x The high bit is the sign extension of bit39. Looking at the comments around the TRY_EMPTY_VM_SPACE definitions, they all indicate that the guessed range should be "likely free" -- that implies "usable". Given the common VM allocation behavior, we want TRY_EMPTY_VM_SPACE to point at a reasonably high place in the VA so it's "likely free". So IMO the point is, will there be any LoongArch HW in the foreseeable future, with less than maybe 40 bits of VA? If the answer is "no" then a TRY_EMPTY_VM_SPACE near the 40-bit VA ceiling would be appropriate. Otherwise you may have to choose a value near or even below a 32-bit VA's upper limit: according to the ISA manual Volume 1, Section 2.1.5, "typical VALEN is in the range of [40, 48]"; also see Section 5.2.3, RDVA can be as large as 8, so the actual VA space could theoretically be as narrow as 40-8=32 bits.
Re: [PATCH] LoongArch: Fix multiarch tuple canonization
Hi, On 2023/2/13 18:38, Xi Ruoyao wrote: Multiarch tuple will be coded in file or directory names in multiarch-aware distros, so one ABI should have only one multiarch tuple. For example, "--target=loongarch64-linux-gnu --with-abi=lp64s" and "--target=loongarch64-linux-gnusf" should both set multiarch tuple to "loongarch64-linux-gnusf". Before this commit, "--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib" will produce wrong result (loongarch64-linux-gnu). A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some non-technical reason [1]. Note that we cannot make "loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because to implement such an alias, we must create thousands of symlinks in the distro and doing so would be completely unpractical. This commit also aligns GCC with the revision. Tested by building cross compilers with --enable-multiarch and multiple combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d}, and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then manually verify the result with eyesight. Ok for trunk and backport to releases/gcc-12? [1]: https://github.com/loongson/LoongArch-Documentation/pull/80 gcc/ChangeLog: * config.gcc (triplet_abi): Set its value based on $with_abi, instead of $target. (la_canonical_triplet): Set it after $triplet_abi is set correctly. * config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the multiarch tuple for lp64d "loongarch64-linux-gnu" (without "f64" suffix). --- gcc/config.gcc | 14 +++--- gcc/config/loongarch/t-linux | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 067720ac795..c070e6ecd2e 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4889,20 +4889,16 @@ case "${target}" in case ${target} in loongarch64-*-*-*f64) abi_pattern="lp64d" - triplet_abi="f64" ;; loongarch64-*-*-*f32) abi_pattern="lp64f" - triplet_abi="f32" ;; loongarch64-*-*-*sf) abi_pattern="lp64s" - triplet_abi="sf" ;; loongarch64-*-*-*) abi_pattern="lp64[dfs]" abi_default="lp64d" - triplet_abi="" ;; *) echo "Unsupported target ${target}." 1>&2 @@ -4923,9 +4919,6 @@ case "${target}" in ;; esac - la_canonical_triplet="loongarch64-${triplet_os}${triplet_abi}" - - # Perform initial sanity checks on --with-* options. case ${with_arch} in "" | loongarch64 | la464) ;; # OK, append here. @@ -4996,6 +4989,13 @@ case "${target}" in ;; esac + case ${with_abi} in + "lp64d") triplet_abi="";; + "lp64f") triplet_abi="f32";; + "lp64s") triplet_abi="sf";; + esac + la_canonical_triplet="loongarch64-${triplet_os}${triplet_abi}" + # Set default value for with_abiext (internal) case ${with_abiext} in "") diff --git a/gcc/config/loongarch/t-linux b/gcc/config/loongarch/t-linux index 131c45fdced..e40da179203 100644 --- a/gcc/config/loongarch/t-linux +++ b/gcc/config/loongarch/t-linux @@ -40,7 +40,7 @@ ifeq ($(filter LA_DISABLE_MULTILIB,$(tm_defines)),) MULTILIB_OSDIRNAMES = \ mabi.lp64d=../lib64$\ - $(call if_multiarch,:loongarch64-linux-gnuf64) + $(call if_multiarch,:loongarch64-linux-gnu) MULTILIB_OSDIRNAMES += \ mabi.lp64f=../lib64/f32$\ Thanks for the quick patch; however Revy told me offline yesterday that this might conflict with things Debian side once this gets merged. He may have more details to share. Adding him to CC -- you could keep him CC-ed on future changes that may impact distro packaging.
Re: [PATCH v3] LoongArch: Add prefetch instructions.
On 2022/11/16 10:10, Lulu Cheng wrote: v2 -> v3: 1. Remove preldx support. --- Enable sw prefetching at -O3 and higher. Co-Authored-By: xujiahao gcc/ChangeLog: * config/loongarch/constraints.md (ZD): New constraint. * config/loongarch/loongarch-def.c: Initial number of parallel prefetch. * config/loongarch/loongarch-tune.h (struct loongarch_cache): Define number of parallel prefetch. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set up parameters to be used in prefetching algorithm. * config/loongarch/loongarch.md (prefetch): New template. --- gcc/config/loongarch/constraints.md | 10 ++ gcc/config/loongarch/loongarch-def.c | 2 ++ gcc/config/loongarch/loongarch-tune.h | 1 + gcc/config/loongarch/loongarch.cc | 28 +++ gcc/config/loongarch/loongarch.md | 14 ++ 5 files changed, 55 insertions(+) diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md index 43cb7b5f0f5..46f7f63ae31 100644 --- a/gcc/config/loongarch/constraints.md +++ b/gcc/config/loongarch/constraints.md @@ -86,6 +86,10 @@ ;;"ZB" ;; "An address that is held in a general-purpose register. ;; The offset is zero" +;;"ZD" +;; "An address operand whose address is formed by a base register +;; and offset that is suitable for use in instructions with the same +;; addressing mode as @code{preld}." ;; "<" "Matches a pre-dec or post-dec operand." (Global non-architectural) ;; ">" "Matches a pre-inc or post-inc operand." (Global non-architectural) @@ -190,3 +194,9 @@ (define_memory_constraint "ZB" The offset is zero" (and (match_code "mem") (match_test "REG_P (XEXP (op, 0))"))) + +(define_address_constraint "ZD" + "An address operand whose address is formed by a base register + and offset that is suitable for use in instructions with the same + addressing mode as @code{preld}." + (match_test "loongarch_12bit_offset_address_p (op, mode)")) How is this different with the "m" constraint? AFAIK preld and ld share the same addressing mode (i.e. base register + 12-bit signed immediate offset). diff --git a/gcc/config/loongarch/loongarch-def.c b/gcc/config/loongarch/loongarch-def.c index cbf995d81b5..80ab10a52a8 100644 --- a/gcc/config/loongarch/loongarch-def.c +++ b/gcc/config/loongarch/loongarch-def.c @@ -62,11 +62,13 @@ loongarch_cpu_cache[N_TUNE_TYPES] = { .l1d_line_size = 64, .l1d_size = 64, .l2d_size = 256, + .simultaneous_prefetches = 4, }, [CPU_LA464] = { .l1d_line_size = 64, .l1d_size = 64, .l2d_size = 256, + .simultaneous_prefetches = 4, }, }; diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h index 6f3530f5c02..8e3eb29472b 100644 --- a/gcc/config/loongarch/loongarch-tune.h +++ b/gcc/config/loongarch/loongarch-tune.h @@ -45,6 +45,7 @@ struct loongarch_cache { int l1d_line_size; /* bytes */ int l1d_size; /* KiB */ int l2d_size; /* kiB */ +int simultaneous_prefetches; /* number of parallel prefetch */ nit: "prefetches" or "prefetch ops" or "int prefetch_width"? }; #endif /* LOONGARCH_TUNE_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 8d5d8d965dd..8ee32c90573 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -63,6 +63,7 @@ along with GCC; see the file COPYING3. If not see #include "context.h" #include "builtins.h" #include "rtl-iter.h" +#include "opts.h" /* This file should be included last. */ #include "target-def.h" @@ -6100,6 +6101,33 @@ loongarch_option_override_internal (struct gcc_options *opts) if (loongarch_branch_cost == 0) loongarch_branch_cost = loongarch_cost->branch_cost; + /* Set up parameters to be used in prefetching algorithm. */ + int simultaneous_prefetches += loongarch_cpu_cache[LARCH_ACTUAL_TUNE].simultaneous_prefetches; + + SET_OPTION_IF_UNSET (opts, _options_set, + param_simultaneous_prefetches, + simultaneous_prefetches); + + SET_OPTION_IF_UNSET (opts, _options_set, + param_l1_cache_line_size, + loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_line_size); + + SET_OPTION_IF_UNSET (opts, _options_set, + param_l1_cache_size, + loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_size); + + SET_OPTION_IF_UNSET (opts, _options_set, + param_l2_cache_size, + loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l2d_size); + + + /* Enable sw prefetching at -O3 and higher. */ + if (opts->x_flag_prefetch_loop_arrays < 0 + && (opts->x_optimize >= 3 || opts->x_flag_profile_use) + && !opts->x_optimize_size) +
Re: [PATCH v3] LoongArch: Libvtv add loongarch support.
Hi, The code change seems good but a few grammatical nits. Patch subject should be a verb phrase, something like "libvtv: add LoongArch support" could be better. On 2022/10/28 16:01, Lulu Cheng wrote: After several considerations, I decided to set VTV_PAGE_SIZE to 16KB under loongarch64. v1 - > v2: 1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is set to 64K. 2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not check whether VTV_PAGE_SIZE is equal to the system page size, if the macro __loongarch_lp64 is defined. v2 -> v3: Set VTV_PAGE_SIZE to 16KB under loongarch64. All regression tests of libvtv passed. === libvtv Summary === # of expected passes176 - Are the monologue and changelog supposed to be a part of the actual commit? If not, conventionally they should be placed *after* the "---" line separating the commit message and diffstat/patch content. The loongarch64 kernel supports 4KB,16KB, or 64KB pages, but only 16k pages are currently supported in this code. This sentence feels a little bit unnatural. I suggest just "The LoongArch specification permits page sizes of 4KiB, 16KiB and 64KiB, but only 16KiB pages are supported for now". Co-Authored-By: qijingwen include/ChangeLog: * vtv-change-permission.h (defined): (VTV_PAGE_SIZE): Set VTV_PAGE_SIZE to 16KB under loongarch64. "for loongarch64" feels more natural. libvtv/ChangeLog: * configure.tgt: Add loongarch support. --- include/vtv-change-permission.h | 5 + libvtv/configure.tgt| 3 +++ 2 files changed, 8 insertions(+) diff --git a/include/vtv-change-permission.h b/include/vtv-change-permission.h index 70bdad92bca..f61d8b68ef6 100644 --- a/include/vtv-change-permission.h +++ b/include/vtv-change-permission.h @@ -48,6 +48,11 @@ extern void __VLTChangePermission (int); #else #if defined(__sun__) && defined(__svr4__) && defined(__sparc__) #define VTV_PAGE_SIZE 8192 +#elif defined(__loongarch_lp64) +/* The page size can be configured to 4, 16, or 64KB configuring the kernel. "The page size is configurable by the kernel to be 4, 16 or 64 KiB." + However, only 16KB pages are supported here. Please modify this macro if you + want to support other page sizes. */ Are we actually encouraging the users to modify the sources themselves if they decide to run with non-16KiB page size? This might not even be feasible, as you're essentially telling them to recompile part of the toolchain, which they may not want to / cannot do. I think the message you want to convey here is for them to voice their need upstream so we can then discuss. In that case, the 2 sentences here could be: "For now, only the default page size of 16KiB is supported. If you have a need for other page sizes, please get in touch." Although I'm not sure if the vague "get in touch" wording is appropriate. What do others think? +#define VTV_PAGE_SIZE 16384 #else #define VTV_PAGE_SIZE 4096 #endif diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt index aa2a3f675b8..6cdd1e97ab1 100644 --- a/libvtv/configure.tgt +++ b/libvtv/configure.tgt @@ -50,6 +50,9 @@ case "${target}" in ;; x86_64-*-darwin[1]* | i?86-*-darwin[1]*) ;; + loongarch*-*-linux*) + VTV_SUPPORTED=yes + ;; *) ;; esac
Re: [PATCH] Libvtv-test: Fix the problem that scansarif.exp cannot be found in libvtv regression test.
On 2022/9/27 11:16, Lulu Cheng wrote: r13-967 add ARRIF output format. However libvtv does not add support. "SARIF support was added in r13-967 but libvtv wasn't updated." (Tip: always remember that English, unlike Chinese, isn't a "topic-prominent" language, meaning you should almost never put the "topic" at subject position of the sentence. IOW, if you find your English to be a perfect 1:1 mapping to some Chinese sentence, which is the case here, it's highly likely you need to improve it somehow. This is by no means personal, but the same pattern of broken English has been appearing in your and your teammates' commits since forever, so I'm afraid I have to point out.) commit 6cf276ddf22066af780335cd0072d2c27aabe468 Author: David Malcolm Date: Thu Jun 2 15:40:22 2022 -0400 diagnostics: add SARIF output format And I don't think this reference is necessary, r13-967 is already a precise description. libvtv/ChangeLog: * testsuite/lib/libvtv-dg.exp: Add load_gcc_lib of scansarif.exp. "Load scansarif.exp." -- another example of redundant expression (no pun intended on "expression"). --- libvtv/testsuite/lib/libvtv-dg.exp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libvtv/testsuite/lib/libvtv-dg.exp b/libvtv/testsuite/lib/libvtv-dg.exp index b140c194cdc..454d916e556 100644 --- a/libvtv/testsuite/lib/libvtv-dg.exp +++ b/libvtv/testsuite/lib/libvtv-dg.exp @@ -12,6 +12,8 @@ # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. +load_gcc_lib scansarif.exp + proc libvtv-dg-test { prog do_what extra_tool_flags } { return [gcc-dg-test-1 libvtv_target_compile $prog $do_what $extra_tool_flags] } Otherwise LGTM.
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On 2022/8/5 15:19, Lulu Cheng wrote: 在 2022/8/5 下午2:03, Xi Ruoyao 写道: On Fri, 2022-08-05 at 12:01 +0800, Lulu Cheng wrote: 在 2022/8/5 上午11:45, Xi Ruoyao 写道: On Fri, 2022-08-05 at 11:34 +0800, Xi Ruoyao via Gcc-patches wrote: Or maybe we should just use a PC-relative addressing with 4 instructions instead of GOT for -fno-PIC? Not possible, Glibc does not support R_LARCH_PCALA* relocations in ld.so. So we still need a -mno-got (or something) option to disable GOT for special cases like the kernel. Both way consumes 16 bytes (4 instructions for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC- relative may be more cache friendly. But such a major change cannot be backported for 12.2 IMO. I'm very sorry, my understanding of the precpu variable is wrong, I just read the code of the kernel you submitted, this precpu variable not only has a large offset but also has an uncertain address when compiling, so no matter whether it is addressed with pcrel Still got addressing needs dynamic relocation when loading. It seems that accessing through the got table is a better choice. The name movable is also very vivid to describe this function in the kernel, indicating that the address of the variable can be changed at will. But this name is more difficult to understand in gcc, I have no opinion on other, can this name be changed? Yes, we don't need to be compatible with old vendor compiler IMO. "force_got_access" as Xuerui suggested? Compared with these names, I think addr_global is better. Actually if "model(...)" can be implemented I'd prefer a descriptive word/phrase inside model(). Because it may well be the case that more peculiar ways of accessing some special data will have to be supported in the future, and all of them are kind of "data models" so we'd be able to nicely group them with model(...). Otherwise I actually don't have a particularly strong opinion, aside from "movable" which IMO should definitely not be taken.
Re: [PATCH v5] LoongArch: add movable attribute
On 2022/8/3 09:36, Xi Ruoyao wrote: Is it OK for trunk or I need to change something? By the way, I'm seeking a possibility to include this into 12.2. Then we leaves only 12.1 without this attribute, and we can just say "building the kernel needs GCC 12.2 or later". On Mon, 2022-08-01 at 18:07 +0800, Xi Ruoyao wrote: Changes v4 -> v5: Fix changelog. No code change. Changes v3 -> v4: * Use "movable" as the attribute name as Huacai says it's already used in downstream GCC fork. I don't think mindlessly caring for vendor forks is always correct. In fact I find the name "movable" too generic, and something like "force_got_access" could be better. I don't currently have time to test this, unfortunately, due to day job. Might be able to give it a whirl one or two week later though... * Remove an inaccurate line from the doc. (Initially I tried to implement a "model(...)" like IA64 or M32R. Then I changed my mind but forgot to remove the line copied from M32R doc.) -- >8 -- A linker script and/or a section attribute may locate a local object in some way unexpected by the code model, leading to a link failure. This happens when the Linux kernel loads a module with "local" per-CPU variables. Add an attribute to explicitly mark an variable with the address unlimited by the code model so we would be able to work around such problems. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_attribute_table): New attribute table. (TARGET_ATTRIBUTE_TABLE): Define the target hook. (loongarch_handle_addr_global_attribute): New static function. (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for SYMBOL_REF_DECL with addr_global attribute. (loongarch_use_anchors_for_symbol_p): New static function. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook. * doc/extend.texi (Variable Attributes): Document new LoongArch specific attribute. gcc/testsuite/ChangeLog: * gcc.target/loongarch/addr-global.c: New test. --- gcc/config/loongarch/loongarch.cc | 63 +++ gcc/doc/extend.texi | 16 + .../gcc.target/loongarch/attr-movable.c | 29 + 3 files changed, 108 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 79687340dfd..6b6026700a6 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x) && !loongarch_symbol_binds_local_p (x)) return SYMBOL_GOT_DISP; + if (SYMBOL_REF_P (x)) + { + tree decl = SYMBOL_REF_DECL (x); + /* A movable symbol may be moved away from the +/- 2GiB range around + the PC, so we have to use GOT. */ + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) + return SYMBOL_GOT_DISP; + } + return SYMBOL_PCREL; } @@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void) return crtl->outgoing_args_size; } +static tree +loongarch_handle_movable_attribute (tree *node, tree name, tree, int, + bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) == VAR_DECL) + { + if (DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && !TREE_STATIC (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "%qE attribute cannot be specified for local " + "variables", name); + *no_add_attrs = true; + } + } + else + { + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; + } + return NULL_TREE; +} + +static const struct attribute_spec loongarch_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "movable", 0, 0, true, false, false, false, + loongarch_handle_movable_attribute, NULL }, + /* The last attribute spec is set to be NULL. */ + {} +}; + +bool +loongarch_use_anchors_for_symbol_p (const_rtx symbol) +{ + tree decl = SYMBOL_REF_DECL (symbol); + + /* A movable attribute indicates the linker may move the symbol away, + so the use of anchor may cause relocation overflow. */ + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) + return false; + + return default_use_anchors_for_symbol_p (symbol); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE
Re: [PATCH] LoongArch: document -m[no-]explicit-relocs
On 2022/7/27 17:28, Lulu Cheng wrote: 在 2022/7/27 下午5:15, Xi Ruoyao 写道: On Wed, 2022-07-27 at 16:47 +0800, Lulu Cheng wrote: "Use or do not use assembler relocation operators when dealing with symbolic addresses. The alternative is to use assembler macros instead, which may limit optimization. The default value for the option is determined during GCC build- time by detecting corresponding assembler support: @code{-mexplicit- relocs} if said support is present, @code{-mno-explicit-relocs} otherwise. This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one." I agree with wangxuerui's idea. The same parameter and the same description information can reduce the user's time to learn how to use this parameter. I agree it's better than my origin paragraph. If you agree I'll commit it with Xuerui as the commit author. I have no opinion if wangxuerui agrees. Either is OK (if you really think the commit is effectively rewritten by me), thanks.
Re: [PATCH] LoongArch: document -m[no-]explicit-relocs
Hi, On 2022/7/27 15:06, Xi Ruoyao wrote: Document newly introduced -m[no-]explicit-relocs options. Ok for trunk? -- >8 -- gcc/ChangeLog: * doc/invoke.texi: Document -m[no-]explicit-relocs for LoongArch. --- gcc/doc/invoke.texi | 12 1 file changed, 12 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9a3f2d14c5a..04418f80428 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -24939,6 +24939,18 @@ global symbol: The data got table must be within +/-8EiB addressing space. @end itemize @end table The default code model is @code{normal}. + +@item -mexplicit-relocs +@itemx -mno-explicit-relocs +@opindex mexplicit-relocs +@opindex mno-explicit-relocs +Generate (do not generate) explicit symbol relocations instead of +assembler macros. Using explicit relocations can improve code generation. +GCC detects the capaiblities of the assembler when it is built and sets +the default to @code{-mexplicit-relocs} if the assembler supports the +syntax for explicit specification of relocations, and +@code{-mno-explicit-relocs} otherwise. This option is mostly useful for +debugging or using an assembler different from build-time. Some text massaging, along with some shameful copying from other (read: RISC-V) -mexplicit-relocs docs... "Use or do not use assembler relocation operators when dealing with symbolic addresses. The alternative is to use assembler macros instead, which may limit optimization. The default value for the option is determined during GCC build-time by detecting corresponding assembler support: @code{-mexplicit-relocs} if said support is present, @code{-mno-explicit-relocs} otherwise. This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one." What do you think? @end table @node M32C Options
Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.
Hi, On 2022/7/7 16:04, Lulu Cheng wrote: gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_compute_frame_info): Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD or UNITS_PER_FP_REG. IMO it's better to also state which problem this change is meant to solve (i.e. your intent), better yet, with an appropriate bugzilla link. --- gcc/config/loongarch/loongarch.cc | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..5c9a33c14f7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) frame->frame_pointer_offset = offset; /* Next are the callee-saved FPRs. */ if (frame->fmask) -offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +{ + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +} + else +frame->fp_sp_offset = offset; /* Next are the callee-saved GPRs. */ if (frame->mask) { @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) frame->save_libcall_adjustment = x_save_size; offset += x_save_size; + frame->gp_sp_offset = offset - UNITS_PER_WORD; } - frame->gp_sp_offset = offset - UNITS_PER_WORD; + else +frame->gp_sp_offset = offset; /* The hard frame pointer points above the callee-saved GPRs. */ frame->hard_frame_pointer_offset = offset; /* Above the hard frame pointer is the callee-allocated varags save area. */
Re: [PATCH] loongarch: ignore zero-size fields in calling convention
On 4/25/22 13:57, Xi Ruoyao wrote: Ping. Normally we shouldn't ping a patch after only a few days, but we're running out of time to catch GCC 12 milestone. And once GCC 12 is released the patch will become far more complicated for a psABI warning. And please note that the ABI difference between GCC and G++ should be considered a bug, and it has to be fixed anyway. If you don't like the idea of this patch, please develop another solution and apply it *before GCC 12*. On Wed, 2022-04-20 at 14:23 +0800, Xi Ruoyao via Gcc-patches wrote: Currently, LoongArch ELF psABI is not clear on the handling of zero- sized fields in aggregates arguments or return values [1]. The behavior of GCC trunk is puzzling considering the following cases: struct test1 { double a[0]; float x; }; struct test2 { float a[0]; float x; }; GCC trunk passes test1::x via GPR, but test2::x via FPR. I believe no rational Homo Sapiens can understand (or even expect) this. And, to make things even worse, test1 behaves differently in C and C++. GCC trunk passes test1::x via GPR, but G++ trunk passes test1::x via FPR. I've write a paragraph about current GCC behavior for the psABI [2], but I think it's cleaner to just ignore all zero-sized fields in the ABI. This will require only a two-line change in GCC (this patch), and an one-line change in the ABI doc. If there is not any better idea I'd like to see this reviewed and applied ASAP. If we finally have to apply this patch after GCC 12 release, we'll need to add a lot more boring code to emit a -Wpsabi inform [3]. That will be an unnecessary burden for both us, and the users using the compiler (as the compiler will spend CPU time only for checking if a warning should be informed). [1]:https://github.com/loongson/LoongArch-Documentation/issues/48 [2]:https://github.com/loongson/LoongArch-Documentation/pull/49 [3]:https://gcc.gnu.org/PR102024 gcc/ * config/loongarch/loongarch.cc (loongarch_flatten_aggregate_field): Ignore empty fields for RECORD_TYPE. gcc/testsuite/ * gcc.target/loongarch/zero-size-field-pass.c: New test. * gcc.target/loongarch/zero-size-field-ret.c: New test. --- gcc/config/loongarch/loongarch.cc | 3 ++ .../loongarch/zero-size-field-pass.c | 30 +++ .../loongarch/zero-size-field-ret.c | 28 + 3 files changed, 61 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c Agreed; this is urgent. While I haven't personally got around to testing this yet, I have looked at the linked LoongArch psABI spec change and agree with the present approach.