Re: [PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-15 Thread WANG Xuerui

On 11/16/23 14:17, Jiahao Xu wrote:

Based on SPEC2017 performance evaluation results, making them equal to the
cost of unaligned store/load to avoid odd alignment peeling is better.


Paraphrasing a bit to shorten the subject of the sentence:

"it's better to make them equal to ... so as to avoid odd-alignment peeling"



gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 738911661d7..d05743bec87 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3893,11 +3893,9 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
case scalar_stmt:
case scalar_load:
case vector_stmt:
-  case vector_load:
case vec_to_scalar:
case scalar_to_vec:
case scalar_store:
-  case vector_store:
return 1;
  
case vec_promote_demote:

@@ -3905,6 +3903,8 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
return LASX_SUPPORTED_MODE_P (mode)
  && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1;
  
+  case vector_load:

+  case vector_store:
case unaligned_load:
case unaligned_store:
return 2;


Re: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-17 Thread WANG Xuerui



On 10/17/23 22:06, Xi Ruoyao wrote:

During the review of a LLVM change [1], on LA464 we found that zeroing

"an" LLVM change (because the word LLVM is pronounced letter-by-letter)

a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

Similarly, "an" fcc


[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 68897799505..743e75907a6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2151,7 +2151,7 @@ (define_insn "movfcc"
[(set (match_operand:FCC 0 "register_operand" "=z")
(const_int 0))]
""
-  "movgr2cf\t%0,$r0")
+  "fcmp.caf.s\t%0,$f0,$f0")
  
  ;; Conditional move instructions.
  
Trivial enough, so this LGTM apart from the grammatical nits. (Whoever 
pushing this patch could simply amend it themselves so maybe there's no 
need for a v2.) Thanks!


Re: [PATCH] LoongArch: Fix lo_sum rtx cost

2023-09-16 Thread WANG Xuerui

Hi,

On 9/16/23 17:16, mengqinggang wrote:

The cost of lo_sum rtx for addi.d instruction my be a very big number if
computed by common function. It may cause some symbols saving to stack and
loading from stack if there no enough registers during loop optimization.


Thanks for the patch! It seems though this change is done in order to 
optimize some previously pathetic codegen, am I right? If so, it's 
appreciated to have a minimal test case attached, in order to ensure 
that codegen never regresses. (You can have your teammates help you if 
you're not familiar with that.)




gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost.
---
  gcc/config/loongarch/loongarch.cc | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 845fad5a8e8..0e57f09379c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = COSTS_N_INSNS (4);
return false;
  
+case LO_SUM:

+  *total = set_src_cost (XEXP (x, 0), mode, speed);
+  return true;
+
In order for the code to be more maintainable, it may be better to 
duplicate some of the change reasons here, just in case someone in the 
future questions this piece of code that's without any explanation, and 
regresses things (because there's no test case).

  case LT:
  case LTU:
  case LE:


Re: [PATCH 1/2] LoongArch: Optimize switch with sign-extended index.

2023-09-02 Thread WANG Xuerui

On 9/2/23 14:24, Lulu Cheng wrote:

The patch refers to the submission of RISCV
7bbce9b50302959286381d9177818642bceaf301.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_extend_comparands):
In unsigned QImode test, check for sign extended subreg and/or
constant operands, and do a sign extend in that case.

"do a sign extension"

* config/loongarch/loongarch.md (TARGET_64BIT): Define
template cbranchqi4.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/switch-qi.c: New test.
---
  gcc/config/loongarch/loongarch.cc  | 14 --
  gcc/config/loongarch/loongarch.md  |  8 ++--
  gcc/testsuite/gcc.target/loongarch/switch-qi.c | 16 
  3 files changed, 34 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/switch-qi.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index c72229cad87..7e300c826cf 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4228,8 +4228,18 @@ loongarch_extend_comparands (rtx_code code, rtx *op0, 
rtx *op1)
/* Comparisons consider all XLEN bits, so extend sub-XLEN values.  */
if (GET_MODE_SIZE (word_mode) > GET_MODE_SIZE (GET_MODE (*op0)))
  {
-  /* TODO: checkout It is more profitable to zero-extend QImode values.  */
-  if (unsigned_condition (code) == code && GET_MODE (*op0) == QImode)
+  /* It is more profitable to zero-extend QImode values.  But not if the
+first operand has already been sign-extended, and the second one is
+is a constant or has already been sign-extended also.  */
+  if (unsigned_condition (code) == code
+ && (GET_MODE (*op0) == QImode
+ && ! (GET_CODE (*op0) == SUBREG
+   && SUBREG_PROMOTED_VAR_P (*op0)
+   && SUBREG_PROMOTED_SIGNED_P (*op0)
+   && (CONST_INT_P (*op1)
+   || (GET_CODE (*op1) == SUBREG
+   && SUBREG_PROMOTED_VAR_P (*op1)
+   && SUBREG_PROMOTED_SIGNED_P (*op1))
{
  *op0 = gen_rtx_ZERO_EXTEND (word_mode, *op0);
  if (CONST_INT_P (*op1))
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index b37e070660f..1bb4e461b38 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2733,11 +2733,15 @@ (define_insn "*branch_equality_inverted"
[(set_attr "type" "branch")])
  
  
+;; Branches operate on XLEN-sized quantities, but for LoongArch64 we accept


LoongArch literature refers to "XLEN" as "GRLEN".

Otherwise the patch is fine, thanks ;-)


+;; QImode values so we can force zero-extension.
+(define_mode_iterator BR [(QI "TARGET_64BIT") SI (DI "TARGET_64BIT")])
+
  (define_expand "cbranch4"
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
-   [(match_operand:GPR 1 "register_operand")
-(match_operand:GPR 2 "nonmemory_operand")])
+   [(match_operand:BR 1 "register_operand")
+(match_operand:BR 2 "nonmemory_operand")])
  (label_ref (match_operand 3 ""))
  (pc)))]
""
diff --git a/gcc/testsuite/gcc.target/loongarch/switch-qi.c 
b/gcc/testsuite/gcc.target/loongarch/switch-qi.c
new file mode 100644
index 000..dd192fd497f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/switch-qi.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler-not "bstrpick" } } */
+
+/* Test for loongarch_extend_comparands patch.  */
+extern void asdf (int);
+void
+foo (signed char x) {
+switch (x) {
+  case 0: asdf (10); break;
+  case 1: asdf (11); break;
+  case 2: asdf (12); break;
+  case 3: asdf (13); break;
+  case 4: asdf (14); break;
+}
+}


Re: [PATCH v1] LoongArch: Remove the symbolic extension instruction due to the SLT directive.

2023-08-24 Thread WANG Xuerui

On 8/25/23 12:01, Lulu Cheng wrote:

Since the slt instruction does not distinguish between 32-bit and 64-bit 
operations
under the LoongArch 64-bit architecture, if the operands of slt are of SImode, 
symbol
expansion is required before operation.
Hint:“符号扩展” is "sign extension" (as noun) or "sign-extend" (as verb), 
not "symbol expansion".


But similar to the following test case, symbol expansion can be omitted:

extern int src1, src2, src3;

int
test (void)
{
  int data1 = src1 + src2;
  int data2 = src1 + src3;
  return test1 > test2 ? test1 : test2;
}
Assembly code before optimization:
...
add.w   $r4,$r4,$r14
add.w   $r13,$r13,$r14
slli.w  $r12,$r4,0
slli.w  $r14,$r13,0
slt $r12,$r12,$r14
masknez $r4,$r4,$r12
maskeqz $r12,$r13,$r12
or  $r4,$r4,$r12
slli.w  $r4,$r4,0
...

After optimization:
...
add.w   $r12,$r12,$r14
add.w   $r13,$r13,$r14
slt $r4,$r12,$r13
masknez $r12,$r12,$r4
maskeqz $r4,$r13,$r4
or  $r4,$r12,$r4
...

Similar to this test example, the two operands of SLT are obtained by the
addition operation, and the addition operation "add.w" is an implicit
symbolic extension function, so the two operands of SLT do not require


more naturally: "and add.w implicitly sign-extends" -- brevity are often 
desired and clearer ;-)



symbolic expansion.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Optimize the function implementation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/slt-sign-extend.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 53 +--
  .../gcc.target/loongarch/slt-sign-extend.c| 14 +
  2 files changed, 63 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/slt-sign-extend.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 86d58784113..1905599b9e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4384,14 +4384,30 @@ loongarch_expand_conditional_move (rtx *operands)
enum rtx_code code = GET_CODE (operands[1]);
rtx op0 = XEXP (operands[1], 0);
rtx op1 = XEXP (operands[1], 1);
+  rtx op0_extend = op0;
+  rtx op1_extend = op1;
+
+  /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_p = false;
+  machine_mode mode = GET_MODE (operands[0]);
  
if (FLOAT_MODE_P (GET_MODE (op1)))

  loongarch_emit_float_compare (, , );
else
  {
+  if ((REGNO (op0) == REGNO (operands[2])
+  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
+ && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+   {
+ mode = word_mode;
+ promote_p = true;
+   }
+
loongarch_extend_comparands (code, , );
  
op0 = force_reg (word_mode, op0);

+  op0_extend = op0;
+  op1_extend = force_reg (word_mode, op1);
  
if (code == EQ || code == NE)

{
@@ -4418,23 +4434,52 @@ loongarch_expand_conditional_move (rtx *operands)
&& register_operand (operands[2], VOIDmode)
&& register_operand (operands[3], VOIDmode))
  {
-  machine_mode mode = GET_MODE (operands[0]);
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
+
+  if (promote_p)
+   {
+ if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+   op2 = op0_extend;
+ else
+   {
+ loongarch_extend_comparands (code, , _rtx);
+ op2 = force_reg (mode, op2);
+   }
+
+ if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+   op3 = op1_extend;
+ else
+   {
+ loongarch_extend_comparands (code, , _rtx);
+ op3 = force_reg (mode, op3);
+   }
+   }
+
rtx temp = gen_reg_rtx (mode);
rtx temp2 = gen_reg_rtx (mode);
  
emit_insn (gen_rtx_SET (temp,

  gen_rtx_IF_THEN_ELSE (mode, cond,
-   operands[2], const0_rtx)));
+   op2, const0_rtx)));
  
/* Flip the test for the second operand.  */

cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, GET_MODE (op0), op0, 
op1);
  
emit_insn (gen_rtx_SET (temp2,

  gen_rtx_IF_THEN_ELSE (mode, cond,
-   operands[3], const0_rtx)));
+   op3, const0_rtx)));
  
/* Merge the two results, at least one is guaranteed to be zero.  */

-  emit_insn (gen_rtx_SET (operands[0], gen_rtx_IOR (mode, temp, temp2)));
+  if (promote_p)
+   {
+ rtx temp3 = gen_reg_rtx (mode);
+  

Re: [PATCH v1 2/6] LoongArch: Added Loongson SX base instruction support.

2023-06-30 Thread WANG Xuerui




On 2023/6/30 10:16, Chenghui Pan wrote:

[snip]
---
  gcc/config/loongarch/constraints.md|  128 +-
  gcc/config/loongarch/loongarch-builtins.cc |   10 +
  gcc/config/loongarch/loongarch-modes.def   |   38 +
  gcc/config/loongarch/loongarch-protos.h|   31 +
  gcc/config/loongarch/loongarch.cc  | 2235 +-
  gcc/config/loongarch/loongarch.h   |   65 +-
  gcc/config/loongarch/loongarch.md  |   44 +-
  gcc/config/loongarch/lsx.md| 4490 
  gcc/config/loongarch/predicates.md |  333 +-
  9 files changed, 7184 insertions(+), 190 deletions(-)
  create mode 100644 gcc/config/loongarch/lsx.md

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 7a38cd07ae9..1dd56af07c4 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -30,8 +30,7 @@
  ;; "h" <-unused
  ;; "i" "Matches a general integer constant." (Global non-architectural)
  ;; "j" SIBCALL_REGS
-;; "k" "A memory operand whose address is formed by a base register and
-;;  (optionally scaled) index register."
+;; "k" <-unused
  ;; "l" "A signed 16-bit constant."
  ;; "m" "A memory operand whose address is formed by a base register and offset
  ;;  that is suitable for use in instructions with the same addressing mode
@@ -80,13 +79,14 @@
  ;; "N" <-unused
  ;; "O" <-unused
  ;; "P" <-unused
-;; "Q" <-unused
+;; "Q" "A signed 12-bit constant"
  ;; "R" <-unused
  ;; "S" <-unused
  ;; "T" <-unused
  ;; "U" <-unused
  ;; "V" "Matches a non-offsettable memory reference." (Global 
non-architectural)
-;; "W" <-unused
+;; "W" "A memory address based on a member of @code{BASE_REG_CLASS}.  This is
+;; true for all references."
  ;; "X" "Matches anything." (Global non-architectural)
  ;; "Y" -
  ;;"Yd"
@@ -214,6 +214,63 @@ (define_constraint "Le"
(and (match_code "const_int")
 (match_test "loongarch_addu16i_imm12_operand_p (ival, SImode)")))
  
+(define_constraint "M"

+  "A constant that cannot be loaded using @code{lui}, @code{addiu}
+   or @code{ori}."
+  (and (match_code "const_int")
+   (not (match_test "IMM12_OPERAND (ival)"))
+   (not (match_test "IMM12_OPERAND_UNSIGNED (ival)"))
+   (not (match_test "LU12I_OPERAND (ival)"
+
+(define_constraint "N"
+  "A constant in the range -65535 to -1 (inclusive)."
+  (and (match_code "const_int")
+   (match_test "ival >= -0x && ival < 0")))
+
+(define_constraint "O"
+  "A signed 15-bit constant."
+  (and (match_code "const_int")
+   (match_test "ival >= -0x4000 && ival < 0x4000")))
+
+(define_constraint "P"
+  "A constant in the range 1 to 65535 (inclusive)."
+  (and (match_code "const_int")
+   (match_test "ival > 0 && ival < 0x1")))


These constraints are meant to be exposed for developers to use, right? 
If not so they should probably be marked "@internal", and if so you 
should update the docs as well.


Also these are not documented in the comment block at the top of file.


+
+;; General constraints
+
+(define_memory_constraint "R"
+  "An address that can be used in a non-macro load or store."
+  (and (match_code "mem")
+   (match_test "loongarch_address_insns (XEXP (op, 0), mode, false) == 
1")))


Similarly, is this "R" constraint meant to be exposed as well? Sure 
one's free to choose letters but "R" IMO strongly implies something 
related to registers, not addresses...



+(define_constraint "S"
+  "@internal
+   A constant call address."
+  (and (match_operand 0 "call_insn_operand")
+   (match_test "CONSTANT_P (op)")))


Additionally, IMO we probably should minimize our use of single-letter 
constraints that don't overlap with other architectures' similar usage. 
(I know that several projects have accepted LSX/LASX code well ahead of 
this series, but I don't know off my head if their code used any inline 
asm instead of C intrinsics. Intuitively this shouldn't be a concern 
though.)


Overall, I'd recommend moving all single-letter constraints added here 
to a two-letter namespace, so everything is better namespaced and easier 
to remember (e.g. if we choose something like "Vx" or "Yx" for 
everything vector-related, it'd be a lot easier to mentally associate 
the two-letter incantations with correct semantics.)



+
+(define_constraint "YG"
+  "@internal
+   A vector zero."
+  (and (match_code "const_vector")
+   (match_test "op == CONST0_RTX (mode)")))
+
+(define_constraint "YA"
+  "@internal
+   An unsigned 6-bit constant."
+  (and (match_code "const_int")
+   (match_test "UIMM6_OPERAND (ival)")))
+
+(define_constraint "YB"
+  "@internal
+   A signed 10-bit constant."
+  (and (match_code "const_int")
+   (match_test "IMM10_OPERAND (ival)")))
+
+(define_constraint "Yb"
+   "@internal"
+   (match_operand 0 "qi_mask_operand"))
+
  (define_constraint "Yd"
"@internal
 A constant @code{move_operand} that can be safely 

Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

2023-06-18 Thread WANG Xuerui

Hi,

On 6/15/23 17:03, Xi Ruoyao wrote:

Xuerui: I guess this makes it sensible to show "ret" instead of "jirl
$zero, $ra, 0" in objdump -d output, but I don't know how to implement
it.  Do you have some idea?


Thanks for the suggestion! Actually I have previously made this patch 
series [1] which included just that. But the Loongson maintainers said 
they're working on linker relaxation at that time so they would have to 
postpone processing it, and I've never had a review since then; it's 
expected to conflict with the relaxation patches so some rebasing would 
be needed, but IIRC all review comments should be addressed. You can 
take the series if you'd like to ;-)


[1]: https://sourceware.org/pipermail/binutils/2023-February/126088.html



On Thu, 2023-06-15 at 16:27 +0800, Lulu Cheng wrote:

Pushed to trunk and gcc-12 gcc-13.
r14-1866
r13-7448
r12-9698

在 2023/6/15 上午9:30, Lulu Cheng 写道:

Micro-architecture unconditionally treats a "jr $ra" as "return from
subroutine",
hence doing "jr $ra" would interfere with both subroutine return
prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase
in branch error
prediction rate and affect performance. The same problem exists with
"indirect_jump".

gcc/ChangeLog:

 * config/loongarch/loongarch.md: Modify the register
constraints for template
 "jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
    1. Modify the description.
    2. Modify the register constraints of the template
"indirect_jump".
v2 -> v3:
    1. Modify the description.
---
   gcc/config/loongarch/loongarch.md | 8 ++--
   1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md
b/gcc/config/loongarch/loongarch.md
index 816a943d155..b37e070660f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
   }
     [(set_attr "type" "branch")])
   
+;; Micro-architecture unconditionally treats a "jr $ra" as "return

from subroutine",
+;; non-returning indirect jumps through $ra would interfere with
both subroutine
+;; return prediction and the more general indirect branch
prediction.
+
   (define_expand "indirect_jump"
     [(set (pc) (match_operand 0 "register_operand"))]
     ""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
   })
   
   (define_insn "@indirect_jump"

-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
     ""
     "jr\t%0"
     [(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
   
   (define_insn "@tablejump"

     [(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
  (use (label_ref (match_operand 1 "" "")))]
     ""
     "jr\t%0"


Re: [PATCH v2] LoongArch: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e" [PR110136]

2023-06-08 Thread WANG Xuerui

On 2023/6/8 10:27, Lulu Cheng wrote:

Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase in branch 
error
prediction rate and affect performance. The same problem exists with 
"indirect_jump".

gcc/ChangeLog:

* config/loongarch/loongarch.md: Modify the register constraints for 
template
"jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
   1. Modify the description
   2. Modify the register constraints of the template "indirect_jump".
---
  gcc/config/loongarch/loongarch.md | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 816a943d155..43a2ecc8957 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
  }
[(set_attr "type" "branch")])
  
+;; Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine",

+;; hence doing "jr $ra" would interfere with both subroutine return prediction 
and


Not all cases of "doing 'jr $ra'" are harmful, obviously. Paraphrasing 
it like e.g. "non-returning indirect jumps through $ra" would be better.


You could simplify the patch title a lot with this too: "Avoid 
non-returning indirect jumps through $ra" is shorter and does not 
duplicate the ChangeLog message.



+;; the more general indirect branch prediction.
+
  (define_expand "indirect_jump"
[(set (pc) (match_operand 0 "register_operand"))]
""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
  })
  
  (define_insn "@indirect_jump"

-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
""
"jr\t%0"
[(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
  
  (define_insn "@tablejump"

[(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
 (use (label_ref (match_operand 1 "" "")))]
""
"jr\t%0"


Re: [PATCH] LoongArch: Change jumptable's register constraint to 'q' [PR110136]

2023-06-07 Thread WANG Xuerui

On 2023/6/7 11:36, Lulu Cheng wrote:


在 2023/6/7 上午11:26, WANG Xuerui 写道:

Hi,

On 2023/6/7 10:31, Lulu Cheng wrote:
If the $ra register is modified during the jump to the jump table, 
the hardware
branch prediction function will be broken, resulting in a significant 
increase

in the branch false prediction rate and affecting performance.


Thanks for the insight! This is the kind of improvement that will 
probably become a lot harder to even *sight* without uarch details 
available.


However, I think it's better to also include a minimized test case to 
ensure the compiled code doesn't regress. (Comparison of relevant 
statistics, e.g. output of perf stat, would be even nicer to have!)


There was no way I could find a small test case that would replicate 
this problem. This occurs when compiling spec2006 400.perlbench. And it 
only appears when '-flto' is added.:-(


But I paid for reproducible programs and compilation methods under the 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136.


Ah okay. I missed the bug number in the title initially.

After reading through the context, I think the reason for avoiding $ra 
(in general) should be that the micro-architecture unconditionally 
treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" 
would interfere with *both* subroutine return prediction *and* the more 
general indirect branch prediction; am I right? This could be more 
informative than merely saying "HW branch prediction will be broken".






gcc/ChangeLog:

* config/loongarch/loongarch.md: Change register constraint to 'q'.
---
  gcc/config/loongarch/loongarch.md | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md

index 816a943d155..f9b64173104 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2926,9 +2926,11 @@ (define_expand "tablejump"
    DONE;
  })
  +;; Jump to the jump table Avoid using the $r1 register to prevent
+;; affecting hardware branch prediction.
  (define_insn "@tablejump"
    [(set (pc)
-    (match_operand:P 0 "register_operand" "r"))
+    (match_operand:P 0 "register_operand" "q"))
 (use (label_ref (match_operand 1 "" "")))]
    ""
    "jr\t%0"




Re: [PATCH] LoongArch: Change jumptable's register constraint to 'q' [PR110136]

2023-06-06 Thread WANG Xuerui

Hi,

On 2023/6/7 10:31, Lulu Cheng wrote:

If the $ra register is modified during the jump to the jump table, the hardware
branch prediction function will be broken, resulting in a significant increase
in the branch false prediction rate and affecting performance.


Thanks for the insight! This is the kind of improvement that will 
probably become a lot harder to even *sight* without uarch details 
available.


However, I think it's better to also include a minimized test case to 
ensure the compiled code doesn't regress. (Comparison of relevant 
statistics, e.g. output of perf stat, would be even nicer to have!)




gcc/ChangeLog:

* config/loongarch/loongarch.md: Change register constraint to 'q'.
---
  gcc/config/loongarch/loongarch.md | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 816a943d155..f9b64173104 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2926,9 +2926,11 @@ (define_expand "tablejump"
DONE;
  })
  
+;; Jump to the jump table Avoid using the $r1 register to prevent

+;; affecting hardware branch prediction.
  (define_insn "@tablejump"
[(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "q"))
 (use (label_ref (match_operand 1 "" "")))]
""
"jr\t%0"


Re: [PATCH] LoongArch: Fix the problem of structure parameter passing in C++. This structure has empty structure members and less than three floating point members.

2023-05-24 Thread WANG Xuerui



On 2023/5/25 10:46, Lulu Cheng wrote:


在 2023/5/25 上午4:15, Jason Merrill 写道:
On Wed, May 24, 2023 at 5:00 AM Jonathan Wakely via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:


On Wed, 24 May 2023 at 09:41, Xi Ruoyao  wrote:

> Wang Lei raised some concerns about Itanium C++ ABI, so let's
ask a C++
> expert here...
>
> Jonathan: AFAIK the standard and the Itanium ABI treats an empty
class
> as size 1

Only as a complete object, not as a subobject.


Also as a data member subobject.

> in order to guarantee unique address, so for the following:
>
> class Empty {};
> class Test { Empty empty; double a, b; };

There is no need to have a unique address here, so Test::empty and
Test::a
have the same address. It's a potentially-overlapping subobject.

For the Itanium ABI, sizeof(Test) == 2 * sizeof(double).


That would be true if Test::empty were marked [[no_unique_address]], 
but without that attribute, sizeof(Test) is actually 3 * sizeof(double).


> When we pass "Test" via registers, we may only allocate the
registers
> for Test::a and Test::b, and complete ignore Test::empty because
there
> is no addresses of registers.  Is this correct or not?

I think that's a decision for the loongarch psABI. In principle,
there's no
reason a register has to be used to pass Test::empty, since you
can't read
from it or write to it.


Agreed.  The Itanium C++ ABI has nothing to say about how registers 
are allocated for parameter passing; this is a matter for the psABI.


And there is no need for a psABI to allocate a register for 
Test::empty because it contains no data.


In the x86_64 psABI, Test above is passed in memory because of its 
size ("the size of the aggregate exceeds two eightbytes...").  But


struct Test2 { Empty empty; double a; };

is passed in a single floating-point register; the Test2::empty 
subobject is not passed anywhere, because its eightbyte is classified 
as NO_CLASS, because there is no actual data there.






I know nothing about the LoongArch psABI, but going out of your way to 
assign a register to an empty class seems like a mistake.


MIPS64 and ARM64 also allocate parameter registers for empty structs. 
https://godbolt.org/z/jT4cY3T5o


Our original intention is not to pass this empty structure member, but 
to make the following two structures treat empty structure members


in the same way in the process of passing parameters.

struct st1
{
     struct empty {} e1;
     long a;
     long b;
};

struct st2
{
     struct empty {} e1;
     double f0;
     double f1;
};


Then shouldn't we try to avoid the extra register in all cases, instead 
of wasting one regardless? ;-)


Re: [PATCH] LoongArch: Enable shrink wrapping

2023-04-26 Thread WANG Xuerui

On 2023/4/26 18:14, Lulu Cheng wrote:


在 2023/4/26 下午6:02, WANG Xuerui 写道:


On 2023/4/26 17:53, Lulu Cheng wrote:

Hi, ruoyao:

  The performance of spec2006 is finished. The fixed-point 
400.perlbench has about 3% performance improvement,


and the other basics have not changed, and the floating-point tests 
have basically remained the same.

Nice to know!


  Do you have any questions about the test cases mentioned by 
Guo Jie? If there is no problem, modify the test case,


I think the code can be merged into the main branch.


BTW what about the previous function/loop alignment patches? The LLVM 
changes are also waiting for such results. ;-)
Well, there are many combinations in this align test, so the test time 
will be very long. I will reply the result as soon as the test results 
come out.:-)



Oh, I got. Thanks very much for all the tests and take your time!


Re: [PATCH] LoongArch: Enable shrink wrapping

2023-04-26 Thread WANG Xuerui



On 2023/4/26 17:53, Lulu Cheng wrote:

Hi, ruoyao:

  The performance of spec2006 is finished. The fixed-point 
400.perlbench has about 3% performance improvement,


and the other basics have not changed, and the floating-point tests 
have basically remained the same.

Nice to know!


  Do you have any questions about the test cases mentioned by Guo 
Jie? If there is no problem, modify the test case,


I think the code can be merged into the main branch.


BTW what about the previous function/loop alignment patches? The LLVM 
changes are also waiting for such results. ;-)


Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops

2023-04-18 Thread WANG Xuerui



On 2023/4/18 20:45, Xi Ruoyao wrote:

On Tue, 2023-04-18 at 20:39 +0800, WANG Xuerui wrote:

Hi,

Thanks for helping confirming on GCC and porting this! I'd never know
even GCC lacked this adaptation without someone actually checking... Too
many things are taken for granted these days.

On 2023/4/18 20:17, Xi Ruoyao wrote:

According to Xuerui's LLVM changeset [1], doing so can make a
significant performace gain.


"doing so can gain significant performance" or "significant performance 
can be gained by ..."?


Also the other important thing is, guaranteeing alignment also makes 
performance *more predictable* in addition to generally making things 
faster. You may want to mention this too somewhere.




Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for GCC 14?

[1]:https://reviews.llvm.org/D148622

nit: one space after the colon.


gcc/ChangeLog:

 * config/loongarch/loongarch.cc
 (loongarch_option_override_internal): If -falign-functions is
 used but the alignment is not explicitly specified, set it to
 4 * loongarch_issue_rate ().  Likewise for -falign-loops.
---
   gcc/config/loongarch/loongarch.cc | 11 +++
   1 file changed, 11 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 06fc1cd0604..6552484de7c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6236,6 +6236,17 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
     && !opts->x_optimize_size)
   opts->x_flag_prefetch_loop_arrays = 1;
   
+  /* Align functions and loops to (issue rate) * (insn size) to improve

+ the throughput of the fetching units.  */

What about gating all of these on !opts->x_optimize_size, similar to
what aarch64 does?

opts->x_flag_align_functions and opts->x_flag_align_loops are only set
with -O2 or above unless the user manually uses -falign-functions or -
falign-loops.  If the user uses "-Os -falign-functions" as CFLAGS I'd
assume s(he) wants to optimize for size but keep the optimized function
alignment.


Ah, okay. Fine then.

BTW I've also added some comments to the commit message that I forgot to 
review earlier.



+  char *align = XNEWVEC (char, 16);
+  sprintf (align, "%d", loongarch_issue_rate () * 4);
+
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+    opts->x_str_align_functions = align;
+
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
+    opts->x_str_align_loops = align;
+
     if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
   error ("%qs cannot be used for compiling a shared library",
    "-mdirect-extern-access");

Otherwise LGTM, thanks!


Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops

2023-04-18 Thread WANG Xuerui

Hi,

Thanks for helping confirming on GCC and porting this! I'd never know 
even GCC lacked this adaptation without someone actually checking... Too 
many things are taken for granted these days.


On 2023/4/18 20:17, Xi Ruoyao wrote:

According to Xuerui's LLVM changeset [1], doing so can make a
significant performace gain.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for GCC 14?

[1]:https://reviews.llvm.org/D148622

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_option_override_internal): If -falign-functions is
used but the alignment is not explicitly specified, set it to
4 * loongarch_issue_rate ().  Likewise for -falign-loops.
---
  gcc/config/loongarch/loongarch.cc | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 06fc1cd0604..6552484de7c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6236,6 +6236,17 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
&& !opts->x_optimize_size)
  opts->x_flag_prefetch_loop_arrays = 1;
  
+  /* Align functions and loops to (issue rate) * (insn size) to improve

+ the throughput of the fetching units.  */
What about gating all of these on !opts->x_optimize_size, similar to 
what aarch64 does?

+  char *align = XNEWVEC (char, 16);
+  sprintf (align, "%d", loongarch_issue_rate () * 4);
+
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
+opts->x_str_align_functions = align;
+
+  if (opts->x_flag_align_loops && !opts->x_str_align_loops)
+opts->x_str_align_loops = align;
+
if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
  error ("%qs cannot be used for compiling a shared library",
   "-mdirect-extern-access");

Otherwise LGTM, thanks!


Re: [PATCH] gcc-13: Add changelog for LoongArch.

2023-04-18 Thread WANG Xuerui

Hi,

Just some minor fixes ;-)

On 2023/4/18 14:15, Lulu Cheng wrote:

---
  htdocs/gcc-13/changes.html | 39 ++
  1 file changed, 39 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index f3b9afed..c75e341b 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -563,6 +563,45 @@ a work-in-progress.

  
  
+LoongArch

+  
+New features
+  
+The new command-line option -mexplicit-relocs decides 
whether
+   to use or not use the assembler relocation operator when dealing 
with

"provides control over whether ..."?

+   symbolic addresses. -mexplicit-relocs is enabled by 
default
+   when GCC is configured to use a compatible assembler.
"if a compatible assembler (binutils 2.40 or later) is present at GCC 
build time"?

+
+Avoid using the GOT to access external symbols when the new
+   -mdirect-extern-access command-line option is 
specified.


"The new command-line option -mdirect-extern-access can be 
used to prevent accessing external symbols through GOT"?



+
+The https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Variable-Attributes.html#LoongArch-Variable-Attributes;>model
+   variable attributes has been added.

"A new variable attribute ... has been added."

+
+  
+Built-in functions support Improvements
"Improvements to built-in functions" or just "Built-in functions" as 
Gerald pointed out.

+  
+The rint and copysign mathematical 
builtins
+   (and their float variants) are now inplemented as inline LoongArch 
intrinsics.
+
+The lrint, logb, scalbln,
+   scalbn and ldexp mathematical builtins 
(and their
+   float variants) are now inplemented as inline LoongArch intrinsics 
when using
+   -fno-math-errno.
+
+The lceil and lfloor mathematical 
builtins
+   (and their float variants) are now inplemented as inline LoongArch 
intrinsics
+   when using -ffp-int-builtin-inexact.
For this part I agree with what Gerald has suggested, the expressions 
are kinda natural after you fix the misspelled "implemented"'s.

+
+  
+Subprojects Support
+  
+libvtv now supports LoongArch architecture.
+libitm now supports LoongArch architecture.
Remove "architecture"? Because "LoongArch" practically just means 
"Loongson Architecture".

+   LoongArch supports address sanitizers other than hwasan and 
tsan.
+  
"Address sanitizers other than HWASan and TSan are now supported on 
LoongArch"?

+  
+
  
  
  


Re: [PATCH] LoongArch: Control all __crc* __crcc* builtin functions with macro __loongarch64.

2023-03-13 Thread WANG Xuerui

On 2023/3/13 13:14, Xi Ruoyao wrote:

On Mon, 2023-03-13 at 12:58 +0800, Lulu Cheng wrote:

在 2023/3/13 下午12:54, Xi Ruoyao 写道:

On Mon, 2023-03-13 at 12:40 +0800, WANG Xuerui wrote:

This is ugly. The fact all current LA32 models don't support CRC
ops is
just a coincidence; it's entirely possible for a future product
iteration to introduce such functionality. It's not like the
CRC*.W.W.W
ops require anything wider than 32 bits, after all.

Maybe the correct way would be adding a switch like MIPS -mcrc or
x86
-mcrc32.


Because these instructions are part of the LA64 base instruction set,
there are no control options here.

I'd postpone the change until we add LA32 support because there is no
details about LA32 now and it's hard to determine how to gate this in a
best way...


There is, actually; the "reduced" LA32 ISA manual was released together 
with the LA64 one. And CRC ops are certainly absent there.


However I agree that currently it's not as rigorously defined as LA64, 
partly in that no LA32 micro-architectures have respective user manuals 
available. And Loongson doesn't seem to actively push for LA32 Primary 
support either. (IMO LA32 Primary support should be *prioritized* 
because (a) it's trivial and (b) it's beneficial to bring students 
aboard as early as possible, especially given RISC-V's significant 
presence now )




Re: [PATCH] LoongArch: Control all __crc* __crcc* builtin functions with macro __loongarch64.

2023-03-12 Thread WANG Xuerui



On 2023/3/13 11:52, Lulu Cheng wrote:

LoongArch 32-bit instruction set does not support crc* and crcc* instructions.

gcc/ChangeLog:

* config/loongarch/larchintrin.h (__crc_w_b_w): Add macros for control.
(__crc_w_h_w): Likewise.
(__crc_w_w_w): Likewise.
(__crcc_w_b_w): Likewise.
(__crcc_w_h_w): Likewise.
(__crcc_w_w_w): Likewise.
* config/loongarch/loongarch.md: Add condition TARGET_64BIT
to loongarch_crc_* loongarch_crcc_* instruction template.
---
  gcc/config/loongarch/larchintrin.h | 4 +---
  gcc/config/loongarch/loongarch.md  | 4 ++--
  2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
index e571ed27b37..09f9a5db846 100644
--- a/gcc/config/loongarch/larchintrin.h
+++ b/gcc/config/loongarch/larchintrin.h
@@ -145,6 +145,7 @@ __asrtgt_d (long int _1, long int _2)
  #error "Unsupported ABI."
  #endif
  
+#ifdef __loongarch64


This is ugly. The fact all current LA32 models don't support CRC ops is 
just a coincidence; it's entirely possible for a future product 
iteration to introduce such functionality. It's not like the CRC*.W.W.W 
ops require anything wider than 32 bits, after all.


But do we have to ifdef these out, after all? The only difference would 
be the error description ("name undefined" vs "intrinsic not supported" 
or something like that). IMO we could simply remove every ifdef like 
this and be done with it...



  /* Assembly instruction format:   rd, rj, rk.  */
  /* Data types in instruction templates:  SI, QI, SI.  */
  extern __inline int
@@ -172,7 +173,6 @@ __crc_w_w_w (int _1, int _2)
return (int) __builtin_loongarch_crc_w_w_w ((int) _1, (int) _2);
  }
  
-#ifdef __loongarch64

  /* Assembly instruction format:   rd, rj, rk.  */
  /* Data types in instruction templates:  SI, DI, SI.  */
  extern __inline int
@@ -181,7 +181,6 @@ __crc_w_d_w (long int _1, int _2)
  {
return (int) __builtin_loongarch_crc_w_d_w ((long int) _1, (int) _2);
  }
-#endif
  
  /* Assembly instruction format:	rd, rj, rk.  */

  /* Data types in instruction templates:  SI, QI, SI.  */
@@ -210,7 +209,6 @@ __crcc_w_w_w (int _1, int _2)
return (int) __builtin_loongarch_crcc_w_w_w ((int) _1, (int) _2);
  }
  
-#ifdef __loongarch64

  /* Assembly instruction format:   rd, rj, rk.  */
  /* Data types in instruction templates:  SI, DI, SI.  */
  extern __inline int
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 3509c3c21c1..227f3c6899c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3539,7 +3539,7 @@ (define_insn "loongarch_crc_w__w"
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
 UNSPEC_CRC))]
-  ""
+  "TARGET_64BIT"
Gating on micro-architectures instead of bitness would be better, per 
the reasoning above.

"crc.w..w\t%0,%1,%2"
[(set_attr "type" "unknown")
 (set_attr "mode" "")])
@@ -3549,7 +3549,7 @@ (define_insn "loongarch_crcc_w__w"
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
 UNSPEC_CRCC))]
-  ""
+  "TARGET_64BIT"
"crcc.w..w\t%0,%1,%2"
[(set_attr "type" "unknown")
 (set_attr "mode" "")])


Re: [PATCH] LoongArch: Change the value of macro TRY_EMPTY_VM_SPACE from 0x8000000000 to 0x1000000000.

2023-02-22 Thread WANG Xuerui

On 2023/2/22 17:30, Lulu Cheng wrote:


在 2023/2/21 下午9:56, WANG Xuerui 写道:

Hi,

On 2023/2/21 21:03, Lulu Cheng wrote:


在 2023/2/21 下午3:41, Xi Ruoyao 写道:

On Tue, 2023-02-21 at 15:20 +0800, Lulu Cheng wrote:

Like la264 only has 40 effective bits of virtual address space.
I'm OK with the change.  But the VA length is configurable building 
the

kernel.  Is there any specific reason LA264 has to use the 40-bit
configuration, or should we reword the commit message like "for
supporting the configuration with less page table level or smaller 
page

size"?


I consulted with my colleagues who are working on the kernel,

it looks like this:

The la264 chip desgn is physically 40-bit virtual address.

User mode and kernel mode each account for half:

User mode :    0x0-0x7f  

Kernel mode:  0x ff80   -0x   

The high bit is the sign extension of bit39.

Looking at the comments around the TRY_EMPTY_VM_SPACE definitions, 
they all indicate that the guessed range should be "likely free" -- 
that implies "usable". Given the common VM allocation behavior, we 
want TRY_EMPTY_VM_SPACE to point at a reasonably high place in the VA 
so it's "likely free".


So IMO the point is, will there be any LoongArch HW in the 
foreseeable future, with less than maybe 40 bits of VA? If the answer 
is "no" then a TRY_EMPTY_VM_SPACE near the 40-bit VA ceiling would be 
appropriate. Otherwise you may have to choose a value near or even 
below a 32-bit VA's upper limit: according to the ISA manual Volume 
1, Section 2.1.5, "typical VALEN is in the range of [40, 48]"; also 
see Section 5.2.3, RDVA can be as large as 8, so the actual VA space 
could theoretically be as narrow as 40-8=32 bits.



Yes, I agree with your point of view this is looking for a "likely 
free" virtual memory space.



But if I want to support chips with less than 40-bit virtual 
addresses, then the value of this macro needs to be set small.


I think setting this value small will increase the probability of 
virtual address mapping failure.


Not exactly; in case the TYPE_EMPTY_VM_SPACE address happen to be 
occupied, the mmap will still return something else that's nonzero 
(consult mmap's man page for details), and will not just blow the 
process up straight away.


But...



Chips with less than 40-bit virtual address space are small chips for 
embedded use.


The purpose of pch is to make compilation faster, but I think we 
rarely compile on embedded systems. So this situation may not be 
within our consideration.


Everything makes more sense with this context. Now put these 
justification into the commit message (after a little bit of rewording 
maybe) and I think we're good to go then ;-)


Re: [PATCH] LoongArch: Change the value of macro TRY_EMPTY_VM_SPACE from 0x8000000000 to 0x1000000000.

2023-02-21 Thread WANG Xuerui

Hi,

On 2023/2/21 21:03, Lulu Cheng wrote:


在 2023/2/21 下午3:41, Xi Ruoyao 写道:

On Tue, 2023-02-21 at 15:20 +0800, Lulu Cheng wrote:

Like la264 only has 40 effective bits of virtual address space.

I'm OK with the change.  But the VA length is configurable building the
kernel.  Is there any specific reason LA264 has to use the 40-bit
configuration, or should we reword the commit message like "for
supporting the configuration with less page table level or smaller page
size"?


I consulted with my colleagues who are working on the kernel,

it looks like this:

The la264 chip desgn is physically 40-bit virtual address.

User mode and kernel mode each account for half:

User mode :    0x0-0x7f  

Kernel mode:  0x ff80   -0x   

The high bit is the sign extension of bit39.

Looking at the comments around the TRY_EMPTY_VM_SPACE definitions, they 
all indicate that the guessed range should be "likely free" -- that 
implies "usable". Given the common VM allocation behavior, we want 
TRY_EMPTY_VM_SPACE to point at a reasonably high place in the VA so it's 
"likely free".


So IMO the point is, will there be any LoongArch HW in the foreseeable 
future, with less than maybe 40 bits of VA? If the answer is "no" then a 
TRY_EMPTY_VM_SPACE near the 40-bit VA ceiling would be appropriate. 
Otherwise you may have to choose a value near or even below a 32-bit 
VA's upper limit: according to the ISA manual Volume 1, Section 2.1.5, 
"typical VALEN is in the range of [40, 48]"; also see Section 5.2.3, 
RDVA can be as large as 8, so the actual VA space could theoretically be 
as narrow as 40-8=32 bits.




Re: [PATCH] LoongArch: Fix multiarch tuple canonization

2023-02-15 Thread WANG Xuerui

Hi,

On 2023/2/13 18:38, Xi Ruoyao wrote:

Multiarch tuple will be coded in file or directory names in
multiarch-aware distros, so one ABI should have only one multiarch
tuple.  For example, "--target=loongarch64-linux-gnu --with-abi=lp64s"
and "--target=loongarch64-linux-gnusf" should both set multiarch tuple
to "loongarch64-linux-gnusf".  Before this commit,
"--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib"
will produce wrong result (loongarch64-linux-gnu).

A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be
used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some
non-technical reason [1].  Note that we cannot make
"loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because
to implement such an alias, we must create thousands of symlinks in the
distro and doing so would be completely unpractical.  This commit also
aligns GCC with the revision.

Tested by building cross compilers with --enable-multiarch and multiple
combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d},
and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then
manually verify the result with eyesight.

Ok for trunk and backport to releases/gcc-12?

[1]: https://github.com/loongson/LoongArch-Documentation/pull/80

gcc/ChangeLog:

* config.gcc (triplet_abi): Set its value based on $with_abi,
instead of $target.
(la_canonical_triplet): Set it after $triplet_abi is set
correctly.
* config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the
multiarch tuple for lp64d "loongarch64-linux-gnu" (without
"f64" suffix).
---
  gcc/config.gcc   | 14 +++---
  gcc/config/loongarch/t-linux |  2 +-
  2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 067720ac795..c070e6ecd2e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4889,20 +4889,16 @@ case "${target}" in
case ${target} in
loongarch64-*-*-*f64)
abi_pattern="lp64d"
-   triplet_abi="f64"
;;
loongarch64-*-*-*f32)
abi_pattern="lp64f"
-   triplet_abi="f32"
;;
loongarch64-*-*-*sf)
abi_pattern="lp64s"
-   triplet_abi="sf"
;;
loongarch64-*-*-*)
abi_pattern="lp64[dfs]"
abi_default="lp64d"
-   triplet_abi=""
;;
*)
echo "Unsupported target ${target}." 1>&2
@@ -4923,9 +4919,6 @@ case "${target}" in
  ;;
esac
  
-		la_canonical_triplet="loongarch64-${triplet_os}${triplet_abi}"

-
-
# Perform initial sanity checks on --with-* options.
case ${with_arch} in
"" | loongarch64 | la464) ;; # OK, append here.
@@ -4996,6 +4989,13 @@ case "${target}" in
;;
esac
  
+		case ${with_abi} in

+ "lp64d") triplet_abi="";;
+ "lp64f") triplet_abi="f32";;
+ "lp64s") triplet_abi="sf";;
+   esac
+   la_canonical_triplet="loongarch64-${triplet_os}${triplet_abi}"
+
# Set default value for with_abiext (internal)
case ${with_abiext} in
"")
diff --git a/gcc/config/loongarch/t-linux b/gcc/config/loongarch/t-linux
index 131c45fdced..e40da179203 100644
--- a/gcc/config/loongarch/t-linux
+++ b/gcc/config/loongarch/t-linux
@@ -40,7 +40,7 @@ ifeq ($(filter LA_DISABLE_MULTILIB,$(tm_defines)),)
  
  MULTILIB_OSDIRNAMES = \

mabi.lp64d=../lib64$\
-  $(call if_multiarch,:loongarch64-linux-gnuf64)
+  $(call if_multiarch,:loongarch64-linux-gnu)
  
  MULTILIB_OSDIRNAMES += \

mabi.lp64f=../lib64/f32$\


Thanks for the quick patch; however Revy told me offline yesterday that 
this might conflict with things Debian side once this gets merged. He 
may have more details to share.


Adding him to CC -- you could keep him CC-ed on future changes that may 
impact distro packaging.




Re: [PATCH v3] LoongArch: Add prefetch instructions.

2022-11-15 Thread WANG Xuerui



On 2022/11/16 10:10, Lulu Cheng wrote:

v2 -> v3:
1. Remove preldx support.

---
Enable sw prefetching at -O3 and higher.

Co-Authored-By: xujiahao 

gcc/ChangeLog:

* config/loongarch/constraints.md (ZD): New constraint.
* config/loongarch/loongarch-def.c: Initial number of parallel prefetch.
* config/loongarch/loongarch-tune.h (struct loongarch_cache):
Define number of parallel prefetch.
* config/loongarch/loongarch.cc (loongarch_option_override_internal):
Set up parameters to be used in prefetching algorithm.
* config/loongarch/loongarch.md (prefetch): New template.
---
  gcc/config/loongarch/constraints.md   | 10 ++
  gcc/config/loongarch/loongarch-def.c  |  2 ++
  gcc/config/loongarch/loongarch-tune.h |  1 +
  gcc/config/loongarch/loongarch.cc | 28 +++
  gcc/config/loongarch/loongarch.md | 14 ++
  5 files changed, 55 insertions(+)

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 43cb7b5f0f5..46f7f63ae31 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -86,6 +86,10 @@
  ;;"ZB"
  ;;  "An address that is held in a general-purpose register.
  ;;  The offset is zero"
+;;"ZD"
+;; "An address operand whose address is formed by a base register
+;;  and offset that is suitable for use in instructions with the same
+;;  addressing mode as @code{preld}."
  ;; "<" "Matches a pre-dec or post-dec operand." (Global non-architectural)
  ;; ">" "Matches a pre-inc or post-inc operand." (Global non-architectural)
  
@@ -190,3 +194,9 @@ (define_memory_constraint "ZB"

The offset is zero"
(and (match_code "mem")
 (match_test "REG_P (XEXP (op, 0))")))
+
+(define_address_constraint "ZD"
+  "An address operand whose address is formed by a base register
+   and offset that is suitable for use in instructions with the same
+   addressing mode as @code{preld}."
+   (match_test "loongarch_12bit_offset_address_p (op, mode)"))


How is this different with the "m" constraint? AFAIK preld and ld share 
the same addressing mode (i.e. base register + 12-bit signed immediate 
offset).



diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index cbf995d81b5..80ab10a52a8 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -62,11 +62,13 @@ loongarch_cpu_cache[N_TUNE_TYPES] = {
.l1d_line_size = 64,
.l1d_size = 64,
.l2d_size = 256,
+  .simultaneous_prefetches = 4,
},
[CPU_LA464] = {
.l1d_line_size = 64,
.l1d_size = 64,
.l2d_size = 256,
+  .simultaneous_prefetches = 4,
},
  };
  
diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h

index 6f3530f5c02..8e3eb29472b 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -45,6 +45,7 @@ struct loongarch_cache {
  int l1d_line_size;  /* bytes */
  int l1d_size;   /* KiB */
  int l2d_size;   /* kiB */
+int simultaneous_prefetches; /* number of parallel prefetch */

nit: "prefetches" or "prefetch ops" or "int prefetch_width"?

  };
  
  #endif /* LOONGARCH_TUNE_H */

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 8d5d8d965dd..8ee32c90573 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "context.h"
  #include "builtins.h"
  #include "rtl-iter.h"
+#include "opts.h"
  
  /* This file should be included last.  */

  #include "target-def.h"
@@ -6100,6 +6101,33 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
if (loongarch_branch_cost == 0)
  loongarch_branch_cost = loongarch_cost->branch_cost;
  
+  /* Set up parameters to be used in prefetching algorithm.  */

+  int simultaneous_prefetches
+= loongarch_cpu_cache[LARCH_ACTUAL_TUNE].simultaneous_prefetches;
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_simultaneous_prefetches,
+  simultaneous_prefetches);
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_l1_cache_line_size,
+  loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_line_size);
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_l1_cache_size,
+  loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l1d_size);
+
+  SET_OPTION_IF_UNSET (opts, _options_set,
+  param_l2_cache_size,
+  loongarch_cpu_cache[LARCH_ACTUAL_TUNE].l2d_size);
+
+
+  /* Enable sw prefetching at -O3 and higher.  */
+  if (opts->x_flag_prefetch_loop_arrays < 0
+  && (opts->x_optimize >= 3 || opts->x_flag_profile_use)
+  && !opts->x_optimize_size)
+

Re: [PATCH v3] LoongArch: Libvtv add loongarch support.

2022-10-28 Thread WANG Xuerui

Hi,

The code change seems good but a few grammatical nits.

Patch subject should be a verb phrase, something like "libvtv: add 
LoongArch support" could be better.


On 2022/10/28 16:01, Lulu Cheng wrote:

After several considerations, I decided to set VTV_PAGE_SIZE to 16KB under 
loongarch64.


v1 - > v2:

1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is set to 64K.
2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not check
whether VTV_PAGE_SIZE is equal to the system page size, if the macro
__loongarch_lp64 is defined.

v2 -> v3:

Set VTV_PAGE_SIZE to 16KB under loongarch64.



All regression tests of libvtv passed.

 === libvtv Summary ===

# of expected passes176

-


Are the monologue and changelog supposed to be a part of the actual 
commit? If not, conventionally they should be placed *after* the "---" 
line separating the commit message and diffstat/patch content.




The loongarch64 kernel supports 4KB,16KB, or 64KB pages,
but only 16k pages are currently supported in this code.
This sentence feels a little bit unnatural. I suggest just "The 
LoongArch specification permits page sizes of 4KiB, 16KiB and 64KiB, but 
only 16KiB pages are supported for now".


Co-Authored-By: qijingwen 

include/ChangeLog:

* vtv-change-permission.h (defined):
(VTV_PAGE_SIZE): Set VTV_PAGE_SIZE to 16KB under loongarch64.

"for loongarch64" feels more natural.


libvtv/ChangeLog:

* configure.tgt: Add loongarch support.
---
  include/vtv-change-permission.h | 5 +
  libvtv/configure.tgt| 3 +++
  2 files changed, 8 insertions(+)

diff --git a/include/vtv-change-permission.h b/include/vtv-change-permission.h
index 70bdad92bca..f61d8b68ef6 100644
--- a/include/vtv-change-permission.h
+++ b/include/vtv-change-permission.h
@@ -48,6 +48,11 @@ extern void __VLTChangePermission (int);
  #else
  #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
  #define VTV_PAGE_SIZE 8192
+#elif defined(__loongarch_lp64)
+/* The page size can be configured to 4, 16, or 64KB configuring the kernel.

"The page size is configurable by the kernel to be 4, 16 or 64 KiB."

+   However, only 16KB pages are supported here. Please modify this macro if you
+   want to support other page sizes.  */


Are we actually encouraging the users to modify the sources themselves 
if they decide to run with non-16KiB page size? This might not even be 
feasible, as you're essentially telling them to recompile part of the 
toolchain, which they may not want to / cannot do.


I think the message you want to convey here is for them to voice their 
need upstream so we can then discuss. In that case, the 2 sentences here 
could be:


"For now, only the default page size of 16KiB is supported. If you have 
a need for other page sizes, please get in touch."


Although I'm not sure if the vague "get in touch" wording is 
appropriate. What do others think?



+#define VTV_PAGE_SIZE 16384
  #else
  #define VTV_PAGE_SIZE 4096
  #endif
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
index aa2a3f675b8..6cdd1e97ab1 100644
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -50,6 +50,9 @@ case "${target}" in
;;
x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
;;
+  loongarch*-*-linux*)
+   VTV_SUPPORTED=yes
+   ;;
*)
;;
  esac


Re: [PATCH] Libvtv-test: Fix the problem that scansarif.exp cannot be found in libvtv regression test.

2022-09-26 Thread WANG Xuerui



On 2022/9/27 11:16, Lulu Cheng wrote:

r13-967 add ARRIF output format. However libvtv does not add support.


"SARIF support was added in r13-967 but libvtv wasn't updated."

(Tip: always remember that English, unlike Chinese, isn't a 
"topic-prominent" language, meaning you should almost never put the 
"topic" at subject position of the sentence. IOW, if you find your 
English to be a perfect 1:1 mapping to some Chinese sentence, which is 
the case here, it's highly likely you need to improve it somehow. This 
is by no means personal, but the same pattern of broken English has been 
appearing in your and your teammates' commits since forever, so I'm 
afraid I have to point out.)




commit 6cf276ddf22066af780335cd0072d2c27aabe468
Author: David Malcolm 
Date:   Thu Jun 2 15:40:22 2022 -0400

diagnostics: add SARIF output format


And I don't think this reference is necessary, r13-967 is already a 
precise description.




libvtv/ChangeLog:

* testsuite/lib/libvtv-dg.exp: Add load_gcc_lib of scansarif.exp.
"Load scansarif.exp." -- another example of redundant expression (no pun 
intended on "expression").

---
  libvtv/testsuite/lib/libvtv-dg.exp | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/libvtv/testsuite/lib/libvtv-dg.exp 
b/libvtv/testsuite/lib/libvtv-dg.exp
index b140c194cdc..454d916e556 100644
--- a/libvtv/testsuite/lib/libvtv-dg.exp
+++ b/libvtv/testsuite/lib/libvtv-dg.exp
@@ -12,6 +12,8 @@
  # along with this program; if not, write to the Free Software
  # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, 
USA.
  
+load_gcc_lib scansarif.exp

+
  proc libvtv-dg-test { prog do_what extra_tool_flags } {
  return [gcc-dg-test-1 libvtv_target_compile $prog $do_what 
$extra_tool_flags]
  }

Otherwise LGTM.


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-05 Thread WANG Xuerui

On 2022/8/5 15:19, Lulu Cheng wrote:



在 2022/8/5 下午2:03, Xi Ruoyao 写道:

On Fri, 2022-08-05 at 12:01 +0800, Lulu Cheng wrote:

在 2022/8/5 上午11:45, Xi Ruoyao 写道:


On Fri, 2022-08-05 at 11:34 +0800, Xi Ruoyao via Gcc-patches wrote:


Or maybe we should just use a PC-relative addressing with 4 instructions
instead of GOT for -fno-PIC?

Not possible, Glibc does not support R_LARCH_PCALA* relocations in
ld.so.  So we still need a -mno-got (or something) option to disable GOT
for special cases like the kernel.


Both way consumes 16 bytes (4 instructions
for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC-
relative may be more cache friendly.   But such a major change cannot be
backported for 12.2 IMO.

I'm very sorry, my understanding of the precpu variable is wrong,
I just read the code of the kernel you submitted, this precpu variable
not only has a large offset but also has an uncertain address when compiling,
so no matter whether it is addressed with pcrel Still got addressing needs
dynamic relocation when loading. It seems that accessing through the got table
is a better choice.

The name movable is also very vivid to describe this function in the kernel,
indicating that the address of the variable can be changed at will.

But this name is more difficult to understand in gcc, I have no opinion on 
other,
can this name be changed?

Yes, we don't need to be compatible with old vendor compiler IMO.


"force_got_access" as Xuerui suggested?

Compared with these names, I think addr_global is better.


Actually if "model(...)" can be implemented I'd prefer a descriptive 
word/phrase inside model(). Because it may well be the case that more 
peculiar ways of accessing some special data will have to be supported 
in the future, and all of them are kind of "data models" so we'd be able 
to nicely group them with model(...).


Otherwise I actually don't have a particularly strong opinion, aside 
from "movable" which IMO should definitely not be taken.




Re: [PATCH v5] LoongArch: add movable attribute

2022-08-02 Thread WANG Xuerui

On 2022/8/3 09:36, Xi Ruoyao wrote:

Is it OK for trunk or I need to change something?

By the way, I'm seeking a possibility to include this into 12.2.  Then
we leaves only 12.1 without this attribute, and we can just say
"building the kernel needs GCC 12.2 or later".

On Mon, 2022-08-01 at 18:07 +0800, Xi Ruoyao wrote:

Changes v4 -> v5: Fix changelog.  No code change.

Changes v3 -> v4:

  * Use "movable" as the attribute name as Huacai says it's already
used
    in downstream GCC fork.


I don't think mindlessly caring for vendor forks is always correct. In 
fact I find the name "movable" too generic, and something like 
"force_got_access" could be better.


I don't currently have time to test this, unfortunately, due to day job. 
Might be able to give it a whirl one or two week later though...



  * Remove an inaccurate line from the doc. (Initially I tried to
    implement a "model(...)" like IA64 or M32R. Then I changed my mind
    but forgot to remove the line copied from M32R doc.)

-- >8 --

A linker script and/or a section attribute may locate a local object
in
some way unexpected by the code model, leading to a link failure.
This
happens when the Linux kernel loads a module with "local" per-CPU
variables.

Add an attribute to explicitly mark an variable with the address
unlimited by the code model so we would be able to work around such
problems.

gcc/ChangeLog:

 * config/loongarch/loongarch.cc (loongarch_attribute_table):
 New attribute table.
 (TARGET_ATTRIBUTE_TABLE): Define the target hook.
 (loongarch_handle_addr_global_attribute): New static function.
 (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
 SYMBOL_REF_DECL with addr_global attribute.
 (loongarch_use_anchors_for_symbol_p): New static function.
 (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
 * doc/extend.texi (Variable Attributes): Document new
 LoongArch specific attribute.

gcc/testsuite/ChangeLog:

 * gcc.target/loongarch/addr-global.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 63
+++
  gcc/doc/extend.texi   | 16 +
  .../gcc.target/loongarch/attr-movable.c   | 29 +
  3 files changed, 108 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 79687340dfd..6b6026700a6 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x)
    && !loongarch_symbol_binds_local_p (x))
  return SYMBOL_GOT_DISP;
  
+  if (SYMBOL_REF_P (x))

+    {
+  tree decl = SYMBOL_REF_DECL (x);
+  /* A movable symbol may be moved away from the +/- 2GiB range
around
+    the PC, so we have to use GOT.  */
+  if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES
(decl)))
+   return SYMBOL_GOT_DISP;
+    }
+
    return SYMBOL_PCREL;
  }
  
@@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void)

    return crtl->outgoing_args_size;
  }
  
+static tree

+loongarch_handle_movable_attribute (tree *node, tree name, tree, int,
+   bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) == VAR_DECL)
+    {
+  if (DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
+ && !TREE_STATIC (decl))
+   {
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "%qE attribute cannot be specified for local "
+   "variables", name);
+ *no_add_attrs = true;
+   }
+    }
+  else
+    {
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
+static const struct attribute_spec loongarch_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "movable", 0, 0, true, false, false, false,
+    loongarch_handle_movable_attribute, NULL },
+  /* The last attribute spec is set to be NULL.  */
+  {}
+};
+
+bool
+loongarch_use_anchors_for_symbol_p (const_rtx symbol)
+{
+  tree decl = SYMBOL_REF_DECL (symbol);
+
+  /* A movable attribute indicates the linker may move the symbol
away,
+ so the use of anchor may cause relocation overflow.  */
+  if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl)))
+    return false;
+
+  return default_use_anchors_for_symbol_p (symbol);
+}
+
  /* Initialize the GCC target structure.  */
  #undef TARGET_ASM_ALIGNED_HI_OP
  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void)
  #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
  #define TARGET_HAVE_SPECULATION_SAFE_VALUE
speculation_safe_value_not_needed
  
+#undef  TARGET_ATTRIBUTE_TABLE

+#define TARGET_ATTRIBUTE_TABLE 

Re: [PATCH] LoongArch: document -m[no-]explicit-relocs

2022-07-27 Thread WANG Xuerui

On 2022/7/27 17:28, Lulu Cheng wrote:



在 2022/7/27 下午5:15, Xi Ruoyao 写道:

On Wed, 2022-07-27 at 16:47 +0800, Lulu Cheng wrote:


  "Use or do not use assembler relocation operators when dealing with
symbolic addresses. The alternative is to use assembler macros
instead, which may limit optimization.
  
  The default value for the option is determined during GCC build-

time by detecting corresponding assembler support: @code{-mexplicit-
relocs} if said support is present, @code{-mno-explicit-relocs}
otherwise. This option is mostly useful for debugging, or
interoperation with assemblers different from the build-time one."
  

I agree with wangxuerui's idea.
The same parameter and the same description information can reduce the
user's time to learn how to use this parameter.

I agree it's better than my origin paragraph.

If you agree I'll commit it with Xuerui as the commit author.



I have no opinion if wangxuerui agrees.
Either is OK (if you really think the commit is effectively rewritten by 
me), thanks.


Re: [PATCH] LoongArch: document -m[no-]explicit-relocs

2022-07-27 Thread WANG Xuerui

Hi,

On 2022/7/27 15:06, Xi Ruoyao wrote:

Document newly introduced -m[no-]explicit-relocs options.  Ok for trunk?

-- >8 --

gcc/ChangeLog:

* doc/invoke.texi: Document -m[no-]explicit-relocs for
LoongArch.
---
  gcc/doc/invoke.texi | 12 
  1 file changed, 12 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9a3f2d14c5a..04418f80428 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -24939,6 +24939,18 @@ global symbol: The data got table must be within 
+/-8EiB addressing space.
  @end itemize
  @end table
  The default code model is @code{normal}.
+
+@item -mexplicit-relocs
+@itemx -mno-explicit-relocs
+@opindex mexplicit-relocs
+@opindex mno-explicit-relocs
+Generate (do not generate) explicit symbol relocations instead of
+assembler macros.  Using explicit relocations can improve code generation.
+GCC detects the capaiblities of the assembler when it is built and sets
+the default to @code{-mexplicit-relocs} if the assembler supports the
+syntax for explicit specification of relocations, and
+@code{-mno-explicit-relocs} otherwise.  This option is mostly useful for
+debugging or using an assembler different from build-time.


Some text massaging, along with some shameful copying from other (read: 
RISC-V) -mexplicit-relocs docs...


"Use or do not use assembler relocation operators when dealing with 
symbolic addresses. The alternative is to use assembler macros instead, 
which may limit optimization.


The default value for the option is determined during GCC build-time by 
detecting corresponding assembler support: @code{-mexplicit-relocs} if 
said support is present, @code{-mno-explicit-relocs} otherwise. This 
option is mostly useful for debugging, or interoperation with assemblers 
different from the build-time one."


What do you think?


  @end table
  
  @node M32C Options


Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.

2022-07-07 Thread WANG Xuerui

Hi,

On 2022/7/7 16:04, Lulu Cheng wrote:

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.
IMO it's better to also state which problem this change is meant to 
solve (i.e. your intent), better yet, with an appropriate bugzilla link.

---
  gcc/config/loongarch/loongarch.cc | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d72b256df51..5c9a33c14f7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
frame->frame_pointer_offset = offset;
/* Next are the callee-saved FPRs.  */
if (frame->fmask)
-offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+{
+  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+}
+  else
+frame->fp_sp_offset = offset;
/* Next are the callee-saved GPRs.  */
if (frame->mask)
  {
@@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
frame->save_libcall_adjustment = x_save_size;
  
offset += x_save_size;

+  frame->gp_sp_offset = offset - UNITS_PER_WORD;
  }
-  frame->gp_sp_offset = offset - UNITS_PER_WORD;
+  else
+frame->gp_sp_offset = offset;
/* The hard frame pointer points above the callee-saved GPRs.  */
frame->hard_frame_pointer_offset = offset;
/* Above the hard frame pointer is the callee-allocated varags save area.  
*/


Re: [PATCH] loongarch: ignore zero-size fields in calling convention

2022-04-25 Thread WANG Xuerui

On 4/25/22 13:57, Xi Ruoyao wrote:

Ping.

Normally we shouldn't ping a patch after only a few days, but we're
running out of time to catch GCC 12 milestone.  And once GCC 12 is
released the patch will become far more complicated for a psABI warning.

And please note that the ABI difference between GCC and G++ should be
considered a bug, and it has to be fixed anyway.  If you don't like the
idea of this patch, please develop another solution and apply it *before
GCC 12*.

On Wed, 2022-04-20 at 14:23 +0800, Xi Ruoyao via Gcc-patches wrote:

Currently, LoongArch ELF psABI is not clear on the handling of zero-
sized fields in aggregates arguments or return values [1].  The behavior
of GCC trunk is puzzling considering the following cases:

struct test1
{
   double a[0];
   float x;
};

struct test2
{
   float a[0];
   float x;
};

GCC trunk passes test1::x via GPR, but test2::x via FPR.  I believe no
rational Homo Sapiens can understand (or even expect) this.

And, to make things even worse, test1 behaves differently in C and C++.
GCC trunk passes test1::x via GPR, but G++ trunk passes test1::x via
FPR.

I've write a paragraph about current GCC behavior for the psABI [2], but
I think it's cleaner to just ignore all zero-sized fields in the ABI.
This will require only a two-line change in GCC (this patch), and an
one-line change in the ABI doc.

If there is not any better idea I'd like to see this reviewed and
applied ASAP.  If we finally have to apply this patch after GCC 12
release, we'll need to add a lot more boring code to emit a -Wpsabi
inform [3].  That will be an unnecessary burden for both us, and the
users using the compiler (as the compiler will spend CPU time only for
checking if a warning should be informed).

[1]:https://github.com/loongson/LoongArch-Documentation/issues/48
[2]:https://github.com/loongson/LoongArch-Documentation/pull/49
[3]:https://gcc.gnu.org/PR102024

gcc/

 * config/loongarch/loongarch.cc
 (loongarch_flatten_aggregate_field): Ignore empty fields for
 RECORD_TYPE.

gcc/testsuite/

 * gcc.target/loongarch/zero-size-field-pass.c: New test.
 * gcc.target/loongarch/zero-size-field-ret.c: New test.
---
  gcc/config/loongarch/loongarch.cc |  3 ++
  .../loongarch/zero-size-field-pass.c  | 30 +++
  .../loongarch/zero-size-field-ret.c   | 28 +
  3 files changed, 61 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c

Agreed; this is urgent. While I haven't personally got around to testing 
this yet, I have looked at the linked LoongArch psABI spec change and 
agree with the present approach.