from:"Xi Ruoyao"

[gcc r15-1207] LoongArch: Use bstrins for "value & (-1u << const)"

2024-06-12 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:d0da347a1dd6e57cb0e0c55fd654d81dde545cf8

commit r15-1207-gd0da347a1dd6e57cb0e0c55fd654d81dde545cf8
Author: Xi Ruoyao 
Date:   Sun Jun 9 14:43:48 2024 +0800

LoongArch: Use bstrins for "value & (-1u << const)"

A move/bstrins pair is as fast as a (addi.w|lu12i.w|lu32i.d|lu52i.d)/and
pair, and twice fast as a srli/slli pair.  When the src reg and the dst
reg happens to be the same, the move instruction can be optimized away.

gcc/ChangeLog:

* config/loongarch/predicates.md (high_bitmask_operand): New
predicate.
* config/loongarch/constraints.md (Yy): New constriant.
* config/loongarch/loongarch.md (and3_align): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bstrins-1.c: New test.
* gcc.target/loongarch/bstrins-2.c: New test.

Diff:
---
 gcc/config/loongarch/constraints.md|  5 +
 gcc/config/loongarch/loongarch.md  | 17 +
 gcc/config/loongarch/predicates.md |  4 
 gcc/testsuite/gcc.target/loongarch/bstrins-1.c |  9 +
 gcc/testsuite/gcc.target/loongarch/bstrins-2.c | 14 ++
 5 files changed, 49 insertions(+)

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index f07d31650d29..12cf5e2924a3 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -94,6 +94,7 @@
 ;;   "A constant @code{move_operand} that can be safely loaded using
 ;;   @code{la}."
 ;;"Yx"
+;;"Yy"
 ;; "Z" -
 ;;"ZC"
 ;;  "A memory operand whose address is formed by a base register and offset
@@ -291,6 +292,10 @@
"@internal"
(match_operand 0 "low_bitmask_operand"))
 
+(define_constraint "Yy"
+   "@internal"
+   (match_operand 0 "high_bitmask_operand"))
+
 (define_constraint "YI"
   "@internal
A replicated vector const in which the replicated value is in the range
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 5c80c169cbf1..25c1d323ba0f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1542,6 +1542,23 @@
   [(set_attr "move_type" "pick_ins")
(set_attr "mode" "")])
 
+(define_insn_and_split "and3_align"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "high_bitmask_operand" "Yy")))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0) (match_dup 1))
+   (set (zero_extract:GPR (match_dup 0) (match_dup 2) (const_int 0))
+   (const_int 0))]
+{
+  int len;
+
+  len = low_bitmask_len (mode, ~INTVAL (operands[2]));
+  operands[2] = GEN_INT (len);
+})
+
 (define_insn_and_split "*bstrins__for_mask"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(and:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index eba7f246c84d..58e406ea522b 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -293,6 +293,10 @@
   (and (match_code "const_int")
(match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
 
+(define_predicate "high_bitmask_operand"
+  (and (match_code "const_int")
+   (match_test "low_bitmask_len (mode, ~INTVAL (op)) > 0")))
+
 (define_predicate "d_operand"
   (and (match_code "reg")
(match_test "GP_REG_P (REGNO (op))")))
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-1.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
new file mode 100644
index ..7cb3a9523222
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r4,\\\$r0,4,0" } } */
+
+long
+x (long a)
+{
+  return a & -32;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-2.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
new file mode 100644
index ..9777f502e5af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r\[0-9\]+,\\\$r0,4,0" } } */
+
+struct aligned_buffer {
+  _Alignas(32) char x[1024];
+};
+
+extern int f(char *);
+int g(void)
+{
+  struct aligned_buffer buf;
+  return f(buf.x);
+}

[gcc r15-1206] LoongArch: Fix mode size comparision in loongarch_expand_conditional_move

2024-06-12 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:53c703888eb51314f762c8998dc9215871b12722

commit r15-1206-g53c703888eb51314f762c8998dc9215871b12722
Author: Xi Ruoyao 
Date:   Wed Jun 12 11:01:53 2024 +0800

LoongArch: Fix mode size comparision in loongarch_expand_conditional_move

We were comparing a mode size with word_mode, but word_mode is an enum
value thus this does not really make any sense.  (Un)luckily E_DImode
happens to be 8 so this seemed to work, but let's make it correct so it
won't blow up when we add LA32 support or add another machine mode...

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Compare mode size with
UNITS_PER_WORD instead of word_mode.

Diff:
---
 gcc/config/loongarch/loongarch.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 1b6df6a43650..6ec3ee625026 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5352,7 +5352,7 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (, , );
   else
 {
-  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+  if (GET_MODE_SIZE (GET_MODE (op0)) < UNITS_PER_WORD)
{
  promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
   REGNO (op0) == REGNO (operands[2]));

Re: [PATCH v3 2/2] C++: Support constexpr strings for asm statements

2024-06-11 Thread Xi Ruoyao

On Tue, 2024-06-11 at 20:53 -0700, Andi Kleen wrote:
> > > -Some assemblers allow semicolons as a line separator. However,
> > > -note that some assembler dialects use semicolons to start a comment.
> > > +Some assemblers allow semicolons as a line separator. However,
> > > +note that some assembler dialects use semicolons to start a comment.
> > 
> > Why is diff flagging the above as a change?  I don't see any difference.
> 
> Strange, but I don't know. I don't see a white space change.

There was one trailing white space in either lines, and your commit has
removed the spaces.  But those spaces are completely missing in the
patch sent to gcc-patches.  Maybe your mail client has eaten them?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix mode size comparision in loongarch_expand_conditional_move

2024-06-11 Thread Xi Ruoyao

We were comparing a mode size with word_mode, but word_mode is an enum
value thus this does not really make any sense.  (Un)luckily E_DImode
happens to be 8 so this seemed to work, but let's make it correct so it
won't blow up when we add LA32 support or add another machine mode...

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Compare mode size with
UNITS_PER_WORD instead of word_mode.
---

I've not fully tested this but it should be obvious.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index eb132f06c2e..bc3dd2b713e 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5371,7 +5371,7 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (, , );
   else
 {
-  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+  if (GET_MODE_SIZE (GET_MODE (op0)) < UNITS_PER_WORD)
{
  promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
   REGNO (op0) == REGNO (operands[2]));
-- 
2.45.2

Re: [PATCH] jit: Ensure ssize_t is defined.

2024-06-11 Thread Xi Ruoyao

On Sat, 2024-05-11 at 17:16 +0200, FX Coudert wrote:
>   * libgccjit.h: Include 

Per the C standard size_t should be provided by stddef.h.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Ping^3: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-06-09 Thread Xi Ruoyao

Ping https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650763.html
again, adding more reviewers into CC...

On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> the build is configured --enable-default-pie.  Let's fix them.
> 
> Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> 
> Xi Ruoyao (2):
>   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
>   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> 
>  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
>  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Use bstrins for "value & (-1u << const)"

2024-06-09 Thread Xi Ruoyao

A move/bstrins pair is as fast as a (addi.w|lu12i.w|lu32i.d|lu52i.d)/and
pair, and twice fast as a srli/slli pair.  When the src reg and the dst
reg happens to be the same, the move instruction can be optimized away.

gcc/ChangeLog:

* config/loongarch/predicates.md (high_bitmask_operand): New
predicate.
* config/loongarch/constraints.md (Yy): New constriant.
* config/loongarch/loongarch.md (and3_align): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bstrins-1.c: New test.
* gcc.target/loongarch/bstrins-2.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/constraints.md|  5 +
 gcc/config/loongarch/loongarch.md  | 17 +
 gcc/config/loongarch/predicates.md |  4 
 gcc/testsuite/gcc.target/loongarch/bstrins-1.c |  9 +
 gcc/testsuite/gcc.target/loongarch/bstrins-2.c | 14 ++
 5 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-2.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index f07d31650d2..12cf5e2924a 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -94,6 +94,7 @@
 ;;   "A constant @code{move_operand} that can be safely loaded using
 ;;   @code{la}."
 ;;"Yx"
+;;"Yy"
 ;; "Z" -
 ;;"ZC"
 ;;  "A memory operand whose address is formed by a base register and offset
@@ -291,6 +292,10 @@ (define_constraint "Yx"
"@internal"
(match_operand 0 "low_bitmask_operand"))
 
+(define_constraint "Yy"
+   "@internal"
+   (match_operand 0 "high_bitmask_operand"))
+
 (define_constraint "YI"
   "@internal
A replicated vector const in which the replicated value is in the range
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 5c80c169cbf..25c1d323ba0 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1542,6 +1542,23 @@ (define_insn "and3_extended"
   [(set_attr "move_type" "pick_ins")
(set_attr "mode" "")])
 
+(define_insn_and_split "and3_align"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "high_bitmask_operand" "Yy")))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0) (match_dup 1))
+   (set (zero_extract:GPR (match_dup 0) (match_dup 2) (const_int 0))
+   (const_int 0))]
+{
+  int len;
+
+  len = low_bitmask_len (mode, ~INTVAL (operands[2]));
+  operands[2] = GEN_INT (len);
+})
+
 (define_insn_and_split "*bstrins__for_mask"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(and:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index eba7f246c84..58e406ea522 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -293,6 +293,10 @@ (define_predicate "low_bitmask_operand"
   (and (match_code "const_int")
(match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
 
+(define_predicate "high_bitmask_operand"
+  (and (match_code "const_int")
+   (match_test "low_bitmask_len (mode, ~INTVAL (op)) > 0")))
+
 (define_predicate "d_operand"
   (and (match_code "reg")
(match_test "GP_REG_P (REGNO (op))")))
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-1.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
new file mode 100644
index 000..7cb3a952322
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r4,\\\$r0,4,0" } } */
+
+long
+x (long a)
+{
+  return a & -32;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-2.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
new file mode 100644
index 000..9777f502e5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r\[0-9\]+,\\\$r0,4,0" } } */
+
+struct aligned_buffer {
+  _Alignas(32) char x[1024];
+};
+
+extern int f(char *);
+int g(void)
+{
+  struct aligned_buffer buf;
+  return f(buf.x);
+}
-- 
2.45.2

pushed: wwwdocs: [PATCH] gcc-14/changes: Fix mislocated in RISC-V changes

2024-06-04 Thread Xi Ruoyao

---

Pushed as obvious.

 htdocs/gcc-14/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 6447898e..7a5eb449 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -1218,9 +1218,9 @@ __asm (".global __flmap_lock"  "\n\t"
   configured with --with-tls=[trad|desc].
   Support for the TLS descriptors, this can be enabled by
   -mtls-dialect=desc and the default behavior can be configure
-  by --with-tls=[trad|desc], and this feature require glibc 2.40,
+  by --with-tls=[trad|desc], and this feature require glibc 
2.40,
   thanks to Tatsuyuki Ishi from
-  https://bluewhale.systems/;>Blue Whale Systems
+  https://bluewhale.systems/;>Blue Whale Systems.
   
   Support for the following standard extensions has been added:
 
-- 
2.45.2

gcc-wwwdocs branch master updated. b70633d48603da6175aaa1ba0157bd451d7c0c18

2024-06-04 Thread Xi Ruoyao via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  b70633d48603da6175aaa1ba0157bd451d7c0c18 (commit)
  from  803c0e440c2f90e432cf7f934df96e19232c62ca (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit b70633d48603da6175aaa1ba0157bd451d7c0c18
Author: Xi Ruoyao 
Date:   Wed Jun 5 13:11:37 2024 +0800

gcc-14/changes: Fix mislocated  in RISC-V changes

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 6447898e..7a5eb449 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -1218,9 +1218,9 @@ __asm (".global __flmap_lock"  "\n\t"
   configured with --with-tls=[trad|desc].
   Support for the TLS descriptors, this can be enabled by
   -mtls-dialect=desc and the default behavior can be configure
-  by --with-tls=[trad|desc], and this feature require glibc 2.40,
+  by --with-tls=[trad|desc], and this feature require glibc 
2.40,
   thanks to Tatsuyuki Ishi from
-  https://bluewhale.systems/;>Blue Whale Systems
+  https://bluewhale.systems/;>Blue Whale Systems.
   
   Support for the following standard extensions has been added:
 

---

Summary of changes:
 htdocs/gcc-14/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

Ping^2: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-29 Thread Xi Ruoyao

Ping again.

On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> the build is configured --enable-default-pie.  Let's fix them.
> 
> Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> 
> Xi Ruoyao (2):
>   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
>   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> 
>  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
>  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[gcc r14-10249] LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move [PR115169]

2024-05-27 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:e78980fdd5e82e09e26f524e98ad9cd90a29c1c4

commit r14-10249-ge78980fdd5e82e09e26f524e98ad9cd90a29c1c4
Author: Xi Ruoyao 
Date:   Wed May 22 09:29:43 2024 +0800

LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move 
[PR115169]

gcc/ChangeLog:

PR target/115169
* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Guard REGNO with REG_P.

(cherry picked from commit ded91d857772c0183cc342cdc54d9128f6c57fa2)

Diff:
---
 gcc/config/loongarch/loongarch.cc | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..1b6df6a4365 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5344,6 +5344,7 @@ loongarch_expand_conditional_move (rtx *operands)
   rtx op1_extend = op1;
 
   /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_op[2] = {false, false};
   bool promote_p = false;
   machine_mode mode = GET_MODE (operands[0]);
 
@@ -5351,9 +5352,15 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (, , );
   else
 {
-  if ((REGNO (op0) == REGNO (operands[2])
-  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
- && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+   {
+ promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
+  REGNO (op0) == REGNO (operands[2]));
+ promote_op[1] = (REG_P (op1) && REG_P (operands[3]) &&
+  REGNO (op1) == REGNO (operands[3]));
+   }
+
+  if (promote_op[0] || promote_op[1])
{
  mode = word_mode;
  promote_p = true;
@@ -5395,7 +5402,7 @@ loongarch_expand_conditional_move (rtx *operands)
 
   if (promote_p)
{
- if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+ if (promote_op[0])
op2 = op0_extend;
  else
{
@@ -5403,7 +5410,7 @@ loongarch_expand_conditional_move (rtx *operands)
  op2 = force_reg (mode, op2);
}
 
- if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+ if (promote_op[1])
op3 = op1_extend;
  else
{

[gcc r15-856] LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move [PR115169]

2024-05-27 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:ded91d857772c0183cc342cdc54d9128f6c57fa2

commit r15-856-gded91d857772c0183cc342cdc54d9128f6c57fa2
Author: Xi Ruoyao 
Date:   Wed May 22 09:29:43 2024 +0800

LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move 
[PR115169]

gcc/ChangeLog:

PR target/115169
* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Guard REGNO with REG_P.

Diff:
---
 gcc/config/loongarch/loongarch.cc | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..1b6df6a4365 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5344,6 +5344,7 @@ loongarch_expand_conditional_move (rtx *operands)
   rtx op1_extend = op1;
 
   /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_op[2] = {false, false};
   bool promote_p = false;
   machine_mode mode = GET_MODE (operands[0]);
 
@@ -5351,9 +5352,15 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (, , );
   else
 {
-  if ((REGNO (op0) == REGNO (operands[2])
-  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
- && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+   {
+ promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
+  REGNO (op0) == REGNO (operands[2]));
+ promote_op[1] = (REG_P (op1) && REG_P (operands[3]) &&
+  REGNO (op1) == REGNO (operands[3]));
+   }
+
+  if (promote_op[0] || promote_op[1])
{
  mode = word_mode;
  promote_p = true;
@@ -5395,7 +5402,7 @@ loongarch_expand_conditional_move (rtx *operands)
 
   if (promote_p)
{
- if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+ if (promote_op[0])
op2 = op0_extend;
  else
{
@@ -5403,7 +5410,7 @@ loongarch_expand_conditional_move (rtx *operands)
  op2 = force_reg (mode, op2);
}
 
- if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+ if (promote_op[1])
op3 = op1_extend;
  else
{

[PATCH] LoongArch: Guard REGNO with REG_P in loongarch_expand_conditional_move [PR115169]

2024-05-22 Thread Xi Ruoyao

gcc/ChangeLog:

PR target/115169
* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Guard REGNO with REG_P.
---

Bootstrapped with --enable-checking=all.  Ok for trunk and 14?

 gcc/config/loongarch/loongarch.cc | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e7835ae34ae..1b6df6a4365 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5344,6 +5344,7 @@ loongarch_expand_conditional_move (rtx *operands)
   rtx op1_extend = op1;
 
   /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_op[2] = {false, false};
   bool promote_p = false;
   machine_mode mode = GET_MODE (operands[0]);
 
@@ -5351,9 +5352,15 @@ loongarch_expand_conditional_move (rtx *operands)
 loongarch_emit_float_compare (, , );
   else
 {
-  if ((REGNO (op0) == REGNO (operands[2])
-  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
- && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+  if (GET_MODE_SIZE (GET_MODE (op0)) < word_mode)
+   {
+ promote_op[0] = (REG_P (op0) && REG_P (operands[2]) &&
+  REGNO (op0) == REGNO (operands[2]));
+ promote_op[1] = (REG_P (op1) && REG_P (operands[3]) &&
+  REGNO (op1) == REGNO (operands[3]));
+   }
+
+  if (promote_op[0] || promote_op[1])
{
  mode = word_mode;
  promote_p = true;
@@ -5395,7 +5402,7 @@ loongarch_expand_conditional_move (rtx *operands)
 
   if (promote_p)
{
- if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+ if (promote_op[0])
op2 = op0_extend;
  else
{
@@ -5403,7 +5410,7 @@ loongarch_expand_conditional_move (rtx *operands)
  op2 = force_reg (mode, op2);
}
 
- if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+ if (promote_op[1])
op3 = op1_extend;
  else
{
-- 
2.45.1

Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Xi Ruoyao

On Thu, 2024-05-16 at 12:14 +0200, Aldy Hernandez wrote:
> Wait, what's the preferred way of reverting a patch?  I followed what
> I saw in:
> 
> commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
> Author: Jeff Law 
> Date:   Mon May 13 21:42:38 2024 -0600
> 
>     Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"
> 
>     This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

Revert is OK, but revert revert is not.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651144.html

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Ping: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-15 Thread Xi Ruoyao

Ping.

On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> the build is configured --enable-default-pie.  Let's fix them.
> 
> Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> 
> Xi Ruoyao (2):
>   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
>   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> 
>  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
>  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-09 Thread Xi Ruoyao

On Thu, 2024-05-09 at 20:21 +, Joseph Myers wrote:
> On Wed, 8 May 2024, Xi Ruoyao wrote:
> 
> > In GCC 14 we started to emit URLs for "command-line option  is
> > valid for  but not " and "-Werror= argument
> > '-Werror=' is not valid for " warnings.  So we should
> > have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
> > -fdiagnostics-urls= wouldn't be able to control URLs in these warnings.
> > 
> > No test cases are added because with TERM=xterm-256colors PR114980
> > already triggers some test failures.
> > 
> > gcc/ChangeLog:
> > 
> > PR driver/114980
> > * opts-common.cc (prune_options): Move -fdiagnostics-urls=
> > early like -fdiagnostics-color=.
> 
> OK.

Pushed r15-355 and r14-10192.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[gcc r14-10192] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-09 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:21051de4bed3d541804bf965cbdc3e8047698777

commit r14-10192-g21051de4bed3d541804bf965cbdc3e8047698777
Author: Xi Ruoyao 
Date:   Wed May 8 11:25:57 2024 +0800

driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

In GCC 14 we started to emit URLs for "command-line option  is
valid for  but not " and "-Werror= argument
'-Werror=' is not valid for " warnings.  So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.

(cherry picked from commit f75806ec63ec1af2d76a194e5fa73e114b2b8857)

Diff:
---
 gcc/opts-common.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 4a2dff243b0c..2d1e86ff94fa 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1152,6 +1152,7 @@ prune_options (struct cl_decoded_option **decoded_options,
   unsigned int options_to_prepend = 0;
   unsigned int Wcomplain_wrong_lang_idx = 0;
   unsigned int fdiagnostics_color_idx = 0;
+  unsigned int fdiagnostics_urls_idx = 0;
 
   /* Remove arguments which are negated by others after them.  */
   new_decoded_options_count = 0;
@@ -1185,6 +1186,12 @@ prune_options (struct cl_decoded_option 
**decoded_options,
++options_to_prepend;
  fdiagnostics_color_idx = i;
  continue;
+   case OPT_fdiagnostics_urls_:
+ gcc_checking_assert (i != 0);
+ if (fdiagnostics_urls_idx == 0)
+   ++options_to_prepend;
+ fdiagnostics_urls_idx = i;
+ continue;
 
default:
  gcc_assert (opt_idx < cl_options_count);
@@ -1248,6 +1255,12 @@ keep:
= old_decoded_options[fdiagnostics_color_idx];
  new_decoded_options_count++;
}
+  if (fdiagnostics_urls_idx != 0)
+   {
+ new_decoded_options[argv_0 + options_prepended++]
+   = old_decoded_options[fdiagnostics_urls_idx];
+ new_decoded_options_count++;
+   }
   gcc_checking_assert (options_to_prepend == options_prepended);
 }

[gcc r15-355] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-09 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:f75806ec63ec1af2d76a194e5fa73e114b2b8857

commit r15-355-gf75806ec63ec1af2d76a194e5fa73e114b2b8857
Author: Xi Ruoyao 
Date:   Wed May 8 11:25:57 2024 +0800

driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

In GCC 14 we started to emit URLs for "command-line option  is
valid for  but not " and "-Werror= argument
'-Werror=' is not valid for " warnings.  So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.

Diff:
---
 gcc/opts-common.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 4a2dff243b0c..2d1e86ff94fa 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1152,6 +1152,7 @@ prune_options (struct cl_decoded_option **decoded_options,
   unsigned int options_to_prepend = 0;
   unsigned int Wcomplain_wrong_lang_idx = 0;
   unsigned int fdiagnostics_color_idx = 0;
+  unsigned int fdiagnostics_urls_idx = 0;
 
   /* Remove arguments which are negated by others after them.  */
   new_decoded_options_count = 0;
@@ -1185,6 +1186,12 @@ prune_options (struct cl_decoded_option 
**decoded_options,
++options_to_prepend;
  fdiagnostics_color_idx = i;
  continue;
+   case OPT_fdiagnostics_urls_:
+ gcc_checking_assert (i != 0);
+ if (fdiagnostics_urls_idx == 0)
+   ++options_to_prepend;
+ fdiagnostics_urls_idx = i;
+ continue;
 
default:
  gcc_assert (opt_idx < cl_options_count);
@@ -1248,6 +1255,12 @@ keep:
= old_decoded_options[fdiagnostics_color_idx];
  new_decoded_options_count++;
}
+  if (fdiagnostics_urls_idx != 0)
+   {
+ new_decoded_options[argv_0 + options_prepended++]
+   = old_decoded_options[fdiagnostics_urls_idx];
+ new_decoded_options_count++;
+   }
   gcc_checking_assert (options_to_prepend == options_prepended);
 }

[PATCH] driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]

2024-05-07 Thread Xi Ruoyao

In GCC 14 we started to emit URLs for "command-line option  is
valid for  but not " and "-Werror= argument
'-Werror=' is not valid for " warnings.  So we should
have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or
-fdiagnostics-urls= wouldn't be able to control URLs in these warnings.

No test cases are added because with TERM=xterm-256colors PR114980
already triggers some test failures.

gcc/ChangeLog:

PR driver/114980
* opts-common.cc (prune_options): Move -fdiagnostics-urls=
early like -fdiagnostics-color=.
---

Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk and
releases/gcc-14?

 gcc/opts-common.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 4a2dff243b0..2d1e86ff94f 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -1152,6 +1152,7 @@ prune_options (struct cl_decoded_option **decoded_options,
   unsigned int options_to_prepend = 0;
   unsigned int Wcomplain_wrong_lang_idx = 0;
   unsigned int fdiagnostics_color_idx = 0;
+  unsigned int fdiagnostics_urls_idx = 0;
 
   /* Remove arguments which are negated by others after them.  */
   new_decoded_options_count = 0;
@@ -1185,6 +1186,12 @@ prune_options (struct cl_decoded_option 
**decoded_options,
++options_to_prepend;
  fdiagnostics_color_idx = i;
  continue;
+   case OPT_fdiagnostics_urls_:
+ gcc_checking_assert (i != 0);
+ if (fdiagnostics_urls_idx == 0)
+   ++options_to_prepend;
+ fdiagnostics_urls_idx = i;
+ continue;
 
default:
  gcc_assert (opt_idx < cl_options_count);
@@ -1248,6 +1255,12 @@ keep:
= old_decoded_options[fdiagnostics_color_idx];
  new_decoded_options_count++;
}
+  if (fdiagnostics_urls_idx != 0)
+   {
+ new_decoded_options[argv_0 + options_prepended++]
+   = old_decoded_options[fdiagnostics_urls_idx];
+ new_decoded_options_count++;
+   }
   gcc_checking_assert (options_to_prepend == options_prepended);
 }
 
-- 
2.45.0

Re: [pushed] [PATCH v4 1/2] LoongArch: Define ISA versions

2024-05-07 Thread Xi Ruoyao

On Tue, 2024-05-07 at 18:01 +0800, Lulu Cheng wrote:
> 
> 在 2024/5/7 下午5:42, Xi Ruoyao 写道:
> > On Tue, 2024-05-07 at 17:07 +0800, Xi Ruoyao wrote:
> > > Hmm, after this change the default (-march=la64v1.0) is enabling LSX:
> > > 
> > > $ echo "int dummy;" | cc -c -v |& tail -n1
> > > COLLECT_GCC_OPTIONS='-c' '-v' '-mabi=lp64d' '-march=la64v1.0' '-
> > > mfpu=64'
> > > '-msimd=lsx' '-mcmodel=normal' '-mtune=generic'
> > > 
> > > Is this expected or there's something wrong?
> > Note that
> > https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#configuring-the-target-isa
> > says:
> > 
> > LoongArch V1.1 features:
> > 
> > Enable or disable features introduced by LoongArch V1.1. The LSX / LASX
> > part of the LoongArch v1.1 update should only be enabled with lsx / lasx
> > itself enabled.
> > 
> > So to me -march=la64v1.0 should not imply -mlsx.
> 
> 
> The link 
> https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#target-presets
>  
> has a detailed description of -march.
> -march=la64v1.0 will open lsx by default.

Hmm, I think it's worthy noted in
https://gcc.gnu.org/gcc-14/changes.html then.  I.e, for the

"It is now recommended to use -march=la64v1.0 as the only compiler
option to describe the target ISA when building binaries for
distribution."

paragraph, add something like:

"GCC now defaults to -march=la64v1.0 for loongarch64-* targets unless 
configured with a different --with-arch= option.  The -march=la64v1.0
option also implies -mlsx."

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [pushed] [PATCH v4 1/2] LoongArch: Define ISA versions

2024-05-07 Thread Xi Ruoyao

On Tue, 2024-05-07 at 17:07 +0800, Xi Ruoyao wrote:
> Hmm, after this change the default (-march=la64v1.0) is enabling LSX:
> 
> $ echo "int dummy;" | cc -c -v |& tail -n1
> COLLECT_GCC_OPTIONS='-c' '-v' '-mabi=lp64d' '-march=la64v1.0' '-
> mfpu=64'
> '-msimd=lsx' '-mcmodel=normal' '-mtune=generic'
> 
> Is this expected or there's something wrong?

Note that
https://github.com/loongson/la-toolchain-conventions?tab=readme-ov-file#configuring-the-target-isa
says:

LoongArch V1.1 features:

Enable or disable features introduced by LoongArch V1.1. The LSX / LASX
part of the LoongArch v1.1 update should only be enabled with lsx / lasx
itself enabled.

So to me -march=la64v1.0 should not imply -mlsx.

> On Tue, 2024-04-23 at 11:31 +0800, Lulu Cheng wrote:
> > Pushed to r14-10083.
> > 
> > 在 2024/4/23 上午10:42, Yang Yujie 写道:
> > > These ISA versions are defined as -march= parameters and
> > > are recommended for building binaries for distribution.
> > > 
> > > Detailed description of these definitions can be found at
> > > https://github.com/loongson/la-toolchain-conventions, which
> > > the LoongArch GCC port aims to conform to.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * config.gcc: Make la64v1.0 the default ISA preset of the
> > > lp64d ABI.
> > >   * config/loongarch/genopts/loongarch-strings: Define
> > > la64v1.0, la64v1.1.
> > >   * config/loongarch/genopts/loongarch.opt.in: Likewise.
> > >   * config/loongarch/loongarch-c.cc
> > > (LARCH_CPP_SET_PROCESSOR): Likewise.
> > >   (loongarch_cpu_cpp_builtins): Likewise.
> > >   * config/loongarch/loongarch-cpu.cc (get_native_prid):
> > > Likewise.
> > >   (fill_native_cpu_config): Likewise.
> > >   * config/loongarch/loongarch-def.cc (array_tune):
> > > Likewise.
> > >   * config/loongarch/loongarch-def.h: Likewise.
> > >   * config/loongarch/loongarch-driver.cc
> > > (driver_set_m_parm):
> > > Likewise.
> > >   (driver_get_normalized_m_opts): Likewise.
> > >   * config/loongarch/loongarch-opts.cc
> > > (default_tune_for_arch): Likewise.
> > >   (TUNE_FOR_ARCH): Likewise.
> > >   (arch_str): Likewise.
> > >   (loongarch_target_option_override): Likewise.
> > >   * config/loongarch/loongarch-opts.h (TARGET_uARCH_LA464):
> > > Likewise.
> > >   (TARGET_uARCH_LA664): Likewise.
> > >   * config/loongarch/loongarch-str.h (STR_CPU_ABI_DEFAULT):
> > > Likewise.
> > >   (STR_ARCH_ABI_DEFAULT): Likewise.
> > >   (STR_TUNE_GENERIC): Likewise.
> > >   (STR_ARCH_LA64V1_0): Likewise.
> > >   (STR_ARCH_LA64V1_1): Likewise.
> > >   * config/loongarch/loongarch.cc
> > > (loongarch_cpu_sched_reassociation_width): Likewise.
> > >   (loongarch_asm_code_end): Likewise.
> > >   * config/loongarch/loongarch.opt: Likewise.
> > >   * doc/invoke.texi: Likewise.
> > > ---
> > >   gcc/config.gcc    | 34 
> > >   .../loongarch/genopts/loongarch-strings   |  5 +-
> > >   gcc/config/loongarch/genopts/loongarch.opt.in | 43 --
> > >   gcc/config/loongarch/loongarch-c.cc   | 37 +++--
> > >   gcc/config/loongarch/loongarch-cpu.cc | 35 
> > >   gcc/config/loongarch/loongarch-def.cc | 83
> > > +--
> > > 
> > >   gcc/config/loongarch/loongarch-def.h  | 37 ++---
> > >   gcc/config/loongarch/loongarch-driver.cc  |  8 +-
> > >   gcc/config/loongarch/loongarch-opts.cc    | 66 +++--
> > > --
> > >   gcc/config/loongarch/loongarch-opts.h |  4 +-
> > >   gcc/config/loongarch/loongarch-str.h  |  5 +-
> > >   gcc/config/loongarch/loongarch.cc | 11 +--
> > >   gcc/config/loongarch/loongarch.opt    | 43 --
> > >   gcc/doc/invoke.texi   | 57 -
> > >   14 files changed, 300 insertions(+), 168 deletions(-)
> > > 
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 5df3c52f8e9..929695c25ab 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -5072,7 +5072,7 @@ case "${target}" in
> > >   
> > >   # Perform initial sanity checks on --with-*
> > > options.
> > >   case ${with_arch} in
> > > - "" | abi-default | loongarch64 | la[46]64) ;; #
> > > OK,
> > > append here.
>

Re: [pushed] [PATCH v4 1/2] LoongArch: Define ISA versions

2024-05-07 Thread Xi Ruoyao

END_STRING (loongarch_cpu_strings[target->cpu_arch]);
> > +    APPEND_STRING (loongarch_arch_strings[target->cpu_arch]);
> >   
> >     APPEND1 ('\0')
> >     return XOBFINISH (_obstack, const char *);
> > @@ -956,7 +986,7 @@ loongarch_target_option_override (struct
> > loongarch_target *target,
> >     /* Other arch-specific overrides.  */
> >     switch (target->cpu_arch)
> >   {
> > -  case CPU_LA664:
> > +  case ARCH_LA664:
> >     /* Enable -mrecipe=all for LA664 by default.  */
> >     if (!opts_set->x_recip_mask)
> >       {
> > diff --git a/gcc/config/loongarch/loongarch-opts.h
> > b/gcc/config/loongarch/loongarch-opts.h
> > index 9844b27ed27..f80482357ac 100644
> > --- a/gcc/config/loongarch/loongarch-opts.h
> > +++ b/gcc/config/loongarch/loongarch-opts.h
> > @@ -127,8 +127,8 @@ struct loongarch_flags {
> >     (la_target.isa.evolution & OPTION_MASK_ISA_LD_SEQ_SA)
> >   
> >   /* TARGET_ macros for use in *.md template conditionals */
> > -#define TARGET_uARCH_LA464   (la_target.cpu_tune == CPU_LA464)
> > -#define TARGET_uARCH_LA664   (la_target.cpu_tune == CPU_LA664)
> > +#define TARGET_uARCH_LA464   (la_target.cpu_tune ==
> > TUNE_LA464)
> > +#define TARGET_uARCH_LA664   (la_target.cpu_tune ==
> > TUNE_LA664)
> >   
> >   /* Note: optimize_size may vary across functions,
> >  while -m[no]-memcpy imposes a global constraint.  */
> > diff --git a/gcc/config/loongarch/loongarch-str.h
> > b/gcc/config/loongarch/loongarch-str.h
> > index 20da2b169ed..47f761babb2 100644
> > --- a/gcc/config/loongarch/loongarch-str.h
> > +++ b/gcc/config/loongarch/loongarch-str.h
> > @@ -27,10 +27,13 @@ along with GCC; see the file COPYING3.  If not
> > see
> >   #define OPTSTR_TUNE "tune"
> >   
> >   #define STR_CPU_NATIVE "native"
> > -#define STR_CPU_ABI_DEFAULT "abi-default"
> > +#define STR_ARCH_ABI_DEFAULT "abi-default"
> > +#define STR_TUNE_GENERIC "generic"
> >   #define STR_CPU_LOONGARCH64 "loongarch64"
> >   #define STR_CPU_LA464 "la464"
> >   #define STR_CPU_LA664 "la664"
> > +#define STR_ARCH_LA64V1_0 "la64v1.0"
> > +#define STR_ARCH_LA64V1_1 "la64v1.1"
> >   
> >   #define STR_ISA_BASE_LA64 "la64"
> >   
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 6b92e7034c5..e7835ae34ae 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -9609,9 +9609,10 @@ loongarch_cpu_sched_reassociation_width
> > (struct loongarch_target *target,
> >   
> >     switch (target->cpu_tune)
> >   {
> > -    case CPU_LOONGARCH64:
> > -    case CPU_LA464:
> > -    case CPU_LA664:
> > +    case TUNE_GENERIC:
> > +    case TUNE_LOONGARCH64:
> > +    case TUNE_LA464:
> > +    case TUNE_LA664:
> >     /* Vector part.  */
> >     if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P
> > (mode))
> >     {
> > @@ -10980,9

[PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-05 Thread Xi Ruoyao

In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
the build is configured --enable-default-pie.  Let's fix them.

Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?

Xi Ruoyao (2):
  i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
  i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]

 gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
 gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

-- 
2.45.0

[PATCH 2/2] i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]

2024-05-05 Thread Xi Ruoyao

After r14-811 "call *nop@GOTPCREL(%rip)" is only generated with
-mno-direct-extern-access even if --enable-default-pie.  So the r13-1614
change to this file is not valid anymore.

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/i386/fentryname3.c (dg-final): Revert r13-1614
change.
---
 gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/fentryname3.c 
b/gcc/testsuite/gcc.target/i386/fentryname3.c
index c14a4ebb0cf..bd7c997c178 100644
--- a/gcc/testsuite/gcc.target/i386/fentryname3.c
+++ b/gcc/testsuite/gcc.target/i386/fentryname3.c
@@ -3,8 +3,7 @@
 /* { dg-require-profiling "-pg" } */
 /* { dg-options "-pg -mfentry"  } */
 /* { dg-final { scan-assembler "section.*__entry_loc" } } */
-/* { dg-final { scan-assembler "0x0f, 0x1f, 0x44, 0x00, 0x00" { target nonpic 
} } } */
-/* { dg-final { scan-assembler "call\t\\*nop@GOTPCREL" { target { ! nonpic } } 
} } */
+/* { dg-final { scan-assembler "0x0f, 0x1f, 0x44, 0x00, 0x00" } } */
 /* { dg-final { scan-assembler-not "__fentry__" } } */
 
 __attribute__((fentry_name("nop"), fentry_section("__entry_loc")))
-- 
2.45.0

[PATCH 1/2] i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]

2024-05-05 Thread Xi Ruoyao

For a --enable-default-pie build, using -fno-pic (for compiler) but
not -no-pie (for linker) triggers some linker warnings counted as
excess errors:

/usr/bin/ld: /tmp/cc8MgxiR.o: warning: relocation in read-only
section `.text.startup'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/i386/pr113689-1.c (dg-options): Add -no-pie.
---
 gcc/testsuite/gcc.target/i386/pr113689-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr113689-1.c 
b/gcc/testsuite/gcc.target/i386/pr113689-1.c
index 9b8474ed933..0424db2dfdc 100644
--- a/gcc/testsuite/gcc.target/i386/pr113689-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr113689-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { lp64 && fpic } } } */
-/* { dg-options "-O2 -fno-pic -fprofile -mcmodel=large" } */
+/* { dg-options "-O2 -fno-pic -no-pie -fprofile -mcmodel=large" } */
 /* { dg-skip-if "PR90698" { *-*-darwin* } } */
 /* { dg-skip-if "PR113909" { *-*-solaris2* } } */
 
-- 
2.45.0

Pushed: [PATCH] LoongArch: Add constraints for bit string operation define_insn_and_split's [PR114861]

2024-04-26 Thread Xi Ruoyao

On Sat, 2024-04-27 at 11:04 +0800, Lulu Cheng wrote:
> LGTM!
> 
> Thanks.

Pushed r15-11 and r14-10142.

> 在 2024/4/26 下午9:52, Xi Ruoyao 写道:
> > Without the constrants, the compiler attempts to use a stack slot as the
> > target, causing an ICE building the kernel with -Os:
> > 
> >  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:3144:1:
> >  error: could not split insn
> >  (insn:TI 1764 67 1745
> >    (set (mem/c:DI (reg/f:DI 3 $r3) [707 %sfp+-80 S8 A64])
> >     (and:DI (reg/v:DI 28 $r28 [orig:422 raster_config ] [422])
> >     (const_int -50331649 [0xfcff])))
> >    "drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c":1386:21 111
> >    {*bstrins_di_for_mask}
> >    (nil))
> > 
> > Add these constrants to fix the issue.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/114861
> > * config/loongarch/loongarch.md (bstrins__for_mask): Add
> > constraints for operands.
> > (bstrins__for_ior_mask): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/114861
> > * gcc.target/loongarch/pr114861.c: New test.
> > ---
> >   gcc/config/loongarch/loongarch.md | 16 
> >   gcc/testsuite/gcc.target/loongarch/pr114861.c | 39 +++
> >   2 files changed, 47 insertions(+), 8 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/pr114861.c
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md 
> > b/gcc/config/loongarch/loongarch.md
> > index a316c8fb820..5c80c169cbf 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -1543,9 +1543,9 @@ (define_insn "and3_extended"
> >  (set_attr "mode" "")])
> >   
> >   (define_insn_and_split "*bstrins__for_mask"
> > -  [(set (match_operand:GPR 0 "register_operand")
> > -   (and:GPR (match_operand:GPR 1 "register_operand")
> > -(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +   (and:GPR (match_operand:GPR 1 "register_operand" "r")
> > +(match_operand:GPR 2 "ins_zero_bitmask_operand" "i")))]
> >     ""
> >     "#"
> >     ""
> > @@ -1563,11 +1563,11 @@ (define_insn_and_split "*bstrins__for_mask"
> >     })
> >   
> >   (define_insn_and_split "*bstrins__for_ior_mask"
> > -  [(set (match_operand:GPR 0 "register_operand")
> > -   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand")
> > -  (match_operand:GPR 2 "const_int_operand"))
> > -(and:GPR (match_operand:GPR 3 "register_operand")
> > -     (match_operand:GPR 4 "const_int_operand"]
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
> > +     (match_operand:GPR 2 "const_int_operand" "i"))
> > +(and:GPR (match_operand:GPR 3 "register_operand" "r")
> > +     (match_operand:GPR 4 "const_int_operand" "i"]
> >     "loongarch_pre_reload_split ()
> >  && loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
> >     "#"
> > diff --git a/gcc/testsuite/gcc.target/loongarch/pr114861.c 
> > b/gcc/testsuite/gcc.target/loongarch/pr114861.c
> > new file mode 100644
> > index 000..e6507c406b9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/loongarch/pr114861.c
> > @@ -0,0 +1,39 @@
> > +/* PR114861: ICE building the kernel with -Os
> > +   Reduced from linux/fs/ntfs3/attrib.c at revision c942a0cd3603.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -march=loongarch64 -msoft-float -mabi=lp64s" } */
> > +
> > +long evcn, attr_collapse_range_vbo, attr_collapse_range_bytes;
> > +unsigned short flags;
> > +int attr_collapse_range_ni_0_0;
> > +int *attr_collapse_range_mi;
> > +unsigned attr_collapse_range_svcn, attr_collapse_range_vcn1;
> > +void ni_insert_nonresident (unsigned, unsigned short, int **);
> > +int mi_pack_runs (int);
> > +int
> > +attr_collapse_range (void)
> > +{
> > +  _Bool __trans_tmp_1;
> > +  int run = attr_collapse_range_ni_0_0;
> > +  unsigned evcn1, vcn, end;
> > +  short a_flags = flags;
> > +  __trans_tmp_1 = flags & (32768 | 1);
> > +  if (__trans_tmp_1)
> > +    return 2;
> > +  vcn = attr_collapse_range_vbo;
> > +  end = attr_collapse_range_bytes;
> > +  evcn1 = evcn;
> > +  for (;;)
> > +    if (attr_collapse_range_svcn >= end)
> > +  {
> > +    unsigned eat, next_svcn = mi_pack_runs (42);
> > +    attr_collapse_range_vcn1 = (vcn ? vcn : attr_collapse_range_svcn);
> > +    eat = (0 < end) - attr_collapse_range_vcn1;
> > +    mi_pack_runs (run - eat);
> > +    if (next_svcn + eat)
> > +  ni_insert_nonresident (evcn1 - eat - next_svcn, a_flags,
> > + _collapse_range_mi);
> > +  }
> > +    else
> > +  return 42;
> > +}
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[gcc r14-10142] LoongArch: Add constraints for bit string operation define_insn_and_split's [PR114861]

2024-04-26 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:3e04b6f6ba568e6183e8aa8223d4156234503843

commit r14-10142-g3e04b6f6ba568e6183e8aa8223d4156234503843
Author: Xi Ruoyao 
Date:   Fri Apr 26 15:59:11 2024 +0800

LoongArch: Add constraints for bit string operation define_insn_and_split's 
[PR114861]

Without the constrants, the compiler attempts to use a stack slot as the
target, causing an ICE building the kernel with -Os:

drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:3144:1:
error: could not split insn
(insn:TI 1764 67 1745
  (set (mem/c:DI (reg/f:DI 3 $r3) [707 %sfp+-80 S8 A64])
   (and:DI (reg/v:DI 28 $r28 [orig:422 raster_config ] [422])
   (const_int -50331649 [0xfcff])))
  "drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c":1386:21 111
  {*bstrins_di_for_mask}
  (nil))

Add these constrants to fix the issue.

gcc/ChangeLog:

PR target/114861
* config/loongarch/loongarch.md (bstrins__for_mask): Add
constraints for operands.
(bstrins__for_ior_mask): Likewise.

gcc/testsuite/ChangeLog:

PR target/114861
* gcc.target/loongarch/pr114861.c: New test.

(cherry picked from commit 140124ad54eef88ca87909f63aedc8aaeacefc65)

Diff:
---
 gcc/config/loongarch/loongarch.md | 16 +--
 gcc/testsuite/gcc.target/loongarch/pr114861.c | 39 +++
 2 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a316c8fb820..5c80c169cbf 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1543,9 +1543,9 @@
(set_attr "mode" "")])
 
 (define_insn_and_split "*bstrins__for_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (and:GPR (match_operand:GPR 1 "register_operand")
-(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "ins_zero_bitmask_operand" "i")))]
   ""
   "#"
   ""
@@ -1563,11 +1563,11 @@
   })
 
 (define_insn_and_split "*bstrins__for_ior_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand")
-  (match_operand:GPR 2 "const_int_operand"))
-(and:GPR (match_operand:GPR 3 "register_operand")
- (match_operand:GPR 4 "const_int_operand"]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
+ (match_operand:GPR 2 "const_int_operand" "i"))
+(and:GPR (match_operand:GPR 3 "register_operand" "r")
+ (match_operand:GPR 4 "const_int_operand" "i"]
   "loongarch_pre_reload_split ()
&& loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
   "#"
diff --git a/gcc/testsuite/gcc.target/loongarch/pr114861.c 
b/gcc/testsuite/gcc.target/loongarch/pr114861.c
new file mode 100644
index 000..e6507c406b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr114861.c
@@ -0,0 +1,39 @@
+/* PR114861: ICE building the kernel with -Os
+   Reduced from linux/fs/ntfs3/attrib.c at revision c942a0cd3603.  */
+/* { dg-do compile } */
+/* { dg-options "-Os -march=loongarch64 -msoft-float -mabi=lp64s" } */
+
+long evcn, attr_collapse_range_vbo, attr_collapse_range_bytes;
+unsigned short flags;
+int attr_collapse_range_ni_0_0;
+int *attr_collapse_range_mi;
+unsigned attr_collapse_range_svcn, attr_collapse_range_vcn1;
+void ni_insert_nonresident (unsigned, unsigned short, int **);
+int mi_pack_runs (int);
+int
+attr_collapse_range (void)
+{
+  _Bool __trans_tmp_1;
+  int run = attr_collapse_range_ni_0_0;
+  unsigned evcn1, vcn, end;
+  short a_flags = flags;
+  __trans_tmp_1 = flags & (32768 | 1);
+  if (__trans_tmp_1)
+return 2;
+  vcn = attr_collapse_range_vbo;
+  end = attr_collapse_range_bytes;
+  evcn1 = evcn;
+  for (;;)
+if (attr_collapse_range_svcn >= end)
+  {
+unsigned eat, next_svcn = mi_pack_runs (42);
+attr_collapse_range_vcn1 = (vcn ? vcn : attr_collapse_range_svcn);
+eat = (0 < end) - attr_collapse_range_vcn1;
+mi_pack_runs (run - eat);
+if (next_svcn + eat)
+  ni_insert_nonresident (evcn1 - eat - next_svcn, a_flags,
+ _collapse_range_mi);
+  }
+else
+  return 42;
+}

[gcc r15-11] LoongArch: Add constraints for bit string operation define_insn_and_split's [PR114861]

2024-04-26 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:140124ad54eef88ca87909f63aedc8aaeacefc65

commit r15-11-g140124ad54eef88ca87909f63aedc8aaeacefc65
Author: Xi Ruoyao 
Date:   Fri Apr 26 15:59:11 2024 +0800

LoongArch: Add constraints for bit string operation define_insn_and_split's 
[PR114861]

Without the constrants, the compiler attempts to use a stack slot as the
target, causing an ICE building the kernel with -Os:

drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:3144:1:
error: could not split insn
(insn:TI 1764 67 1745
  (set (mem/c:DI (reg/f:DI 3 $r3) [707 %sfp+-80 S8 A64])
   (and:DI (reg/v:DI 28 $r28 [orig:422 raster_config ] [422])
   (const_int -50331649 [0xfcff])))
  "drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c":1386:21 111
  {*bstrins_di_for_mask}
  (nil))

Add these constrants to fix the issue.

gcc/ChangeLog:

PR target/114861
* config/loongarch/loongarch.md (bstrins__for_mask): Add
constraints for operands.
(bstrins__for_ior_mask): Likewise.

gcc/testsuite/ChangeLog:

PR target/114861
* gcc.target/loongarch/pr114861.c: New test.

Diff:
---
 gcc/config/loongarch/loongarch.md | 16 +--
 gcc/testsuite/gcc.target/loongarch/pr114861.c | 39 +++
 2 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a316c8fb820..5c80c169cbf 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1543,9 +1543,9 @@
(set_attr "mode" "")])
 
 (define_insn_and_split "*bstrins__for_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (and:GPR (match_operand:GPR 1 "register_operand")
-(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "ins_zero_bitmask_operand" "i")))]
   ""
   "#"
   ""
@@ -1563,11 +1563,11 @@
   })
 
 (define_insn_and_split "*bstrins__for_ior_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand")
-  (match_operand:GPR 2 "const_int_operand"))
-(and:GPR (match_operand:GPR 3 "register_operand")
- (match_operand:GPR 4 "const_int_operand"]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
+ (match_operand:GPR 2 "const_int_operand" "i"))
+(and:GPR (match_operand:GPR 3 "register_operand" "r")
+ (match_operand:GPR 4 "const_int_operand" "i"]
   "loongarch_pre_reload_split ()
&& loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
   "#"
diff --git a/gcc/testsuite/gcc.target/loongarch/pr114861.c 
b/gcc/testsuite/gcc.target/loongarch/pr114861.c
new file mode 100644
index 000..e6507c406b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr114861.c
@@ -0,0 +1,39 @@
+/* PR114861: ICE building the kernel with -Os
+   Reduced from linux/fs/ntfs3/attrib.c at revision c942a0cd3603.  */
+/* { dg-do compile } */
+/* { dg-options "-Os -march=loongarch64 -msoft-float -mabi=lp64s" } */
+
+long evcn, attr_collapse_range_vbo, attr_collapse_range_bytes;
+unsigned short flags;
+int attr_collapse_range_ni_0_0;
+int *attr_collapse_range_mi;
+unsigned attr_collapse_range_svcn, attr_collapse_range_vcn1;
+void ni_insert_nonresident (unsigned, unsigned short, int **);
+int mi_pack_runs (int);
+int
+attr_collapse_range (void)
+{
+  _Bool __trans_tmp_1;
+  int run = attr_collapse_range_ni_0_0;
+  unsigned evcn1, vcn, end;
+  short a_flags = flags;
+  __trans_tmp_1 = flags & (32768 | 1);
+  if (__trans_tmp_1)
+return 2;
+  vcn = attr_collapse_range_vbo;
+  end = attr_collapse_range_bytes;
+  evcn1 = evcn;
+  for (;;)
+if (attr_collapse_range_svcn >= end)
+  {
+unsigned eat, next_svcn = mi_pack_runs (42);
+attr_collapse_range_vcn1 = (vcn ? vcn : attr_collapse_range_svcn);
+eat = (0 < end) - attr_collapse_range_vcn1;
+mi_pack_runs (run - eat);
+if (next_svcn + eat)
+  ni_insert_nonresident (evcn1 - eat - next_svcn, a_flags,
+ _collapse_range_mi);
+  }
+else
+  return 42;
+}

[PATCH] LoongArch: Add constraints for bit string operation define_insn_and_split's [PR114861]

2024-04-26 Thread Xi Ruoyao

Without the constrants, the compiler attempts to use a stack slot as the
target, causing an ICE building the kernel with -Os:

drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:3144:1:
error: could not split insn
(insn:TI 1764 67 1745
  (set (mem/c:DI (reg/f:DI 3 $r3) [707 %sfp+-80 S8 A64])
   (and:DI (reg/v:DI 28 $r28 [orig:422 raster_config ] [422])
   (const_int -50331649 [0xfcff])))
  "drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c":1386:21 111
  {*bstrins_di_for_mask}
  (nil))

Add these constrants to fix the issue.

gcc/ChangeLog:

PR target/114861
* config/loongarch/loongarch.md (bstrins__for_mask): Add
constraints for operands.
(bstrins__for_ior_mask): Likewise.

gcc/testsuite/ChangeLog:

PR target/114861
* gcc.target/loongarch/pr114861.c: New test.
---
 gcc/config/loongarch/loongarch.md | 16 
 gcc/testsuite/gcc.target/loongarch/pr114861.c | 39 +++
 2 files changed, 47 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr114861.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a316c8fb820..5c80c169cbf 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1543,9 +1543,9 @@ (define_insn "and3_extended"
(set_attr "mode" "")])
 
 (define_insn_and_split "*bstrins__for_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (and:GPR (match_operand:GPR 1 "register_operand")
-(match_operand:GPR 2 "ins_zero_bitmask_operand")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "ins_zero_bitmask_operand" "i")))]
   ""
   "#"
   ""
@@ -1563,11 +1563,11 @@ (define_insn_and_split "*bstrins__for_mask"
   })
 
 (define_insn_and_split "*bstrins__for_ior_mask"
-  [(set (match_operand:GPR 0 "register_operand")
-   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand")
-  (match_operand:GPR 2 "const_int_operand"))
-(and:GPR (match_operand:GPR 3 "register_operand")
- (match_operand:GPR 4 "const_int_operand"]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
+ (match_operand:GPR 2 "const_int_operand" "i"))
+(and:GPR (match_operand:GPR 3 "register_operand" "r")
+ (match_operand:GPR 4 "const_int_operand" "i"]
   "loongarch_pre_reload_split ()
&& loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
   "#"
diff --git a/gcc/testsuite/gcc.target/loongarch/pr114861.c 
b/gcc/testsuite/gcc.target/loongarch/pr114861.c
new file mode 100644
index 000..e6507c406b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr114861.c
@@ -0,0 +1,39 @@
+/* PR114861: ICE building the kernel with -Os
+   Reduced from linux/fs/ntfs3/attrib.c at revision c942a0cd3603.  */
+/* { dg-do compile } */
+/* { dg-options "-Os -march=loongarch64 -msoft-float -mabi=lp64s" } */
+
+long evcn, attr_collapse_range_vbo, attr_collapse_range_bytes;
+unsigned short flags;
+int attr_collapse_range_ni_0_0;
+int *attr_collapse_range_mi;
+unsigned attr_collapse_range_svcn, attr_collapse_range_vcn1;
+void ni_insert_nonresident (unsigned, unsigned short, int **);
+int mi_pack_runs (int);
+int
+attr_collapse_range (void)
+{
+  _Bool __trans_tmp_1;
+  int run = attr_collapse_range_ni_0_0;
+  unsigned evcn1, vcn, end;
+  short a_flags = flags;
+  __trans_tmp_1 = flags & (32768 | 1);
+  if (__trans_tmp_1)
+return 2;
+  vcn = attr_collapse_range_vbo;
+  end = attr_collapse_range_bytes;
+  evcn1 = evcn;
+  for (;;)
+if (attr_collapse_range_svcn >= end)
+  {
+unsigned eat, next_svcn = mi_pack_runs (42);
+attr_collapse_range_vcn1 = (vcn ? vcn : attr_collapse_range_svcn);
+eat = (0 < end) - attr_collapse_range_vcn1;
+mi_pack_runs (run - eat);
+if (next_svcn + eat)
+  ni_insert_nonresident (evcn1 - eat - next_svcn, a_flags,
+ _collapse_range_mi);
+  }
+else
+  return 42;
+}
-- 
2.44.0

Re: [PATCH v2 1/2] LoongArch: Define ISA versions

2024-04-22 Thread Xi Ruoyao

On Sat, 2024-04-20 at 18:47 +0800, Yang Yujie wrote:
> +@item la664
> +LoongArch LA664-based processor with LSX, LASX and all LoongArch v1.1
> features.

I still prefer "v1.1 instructions" instead of "v1.1 features" because
LA664 (at least all launched LA664 CPUs) does not support HPTW, which
**is** a v1.1 feature.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/2] LoongArch: Define ISA versions

2024-04-20 Thread Xi Ruoyao

On Sat, 2024-04-20 at 11:26 +0800, Lulu Cheng wrote:

> 
> > One LoongArch v1.1 feature "Hardware Page Table Walker" is not
> > implemented by LA664.  Maybe "all LoongArch v1.1 **unprivileged**
> > features"?
> > 
> The description of -march is "+Generate instructions for the machine type 
> @var{arch-type}.",
> 
>  so is there no need to write it like this here?

Then maybe just say "all LoongArch v1.1 instructions" instead of
"features" here as well?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/2] LoongArch: Define ISA versions

2024-04-19 Thread Xi Ruoyao

On Fri, 2024-04-19 at 19:04 +0800, Yang Yujie wrote:
>  @table @samp
>  @item native
> -This selects the CPU to generate code for at compilation time by determining
> -the processor type of the compiling machine.  Using @option{-march=native}
> -enables all instruction subsets supported by the local machine (hence
> -the result might not run on different machines).  Using 
> @option{-mtune=native}
> -produces code optimized for the local machine under the constraints
> -of the selected instruction set.
> +Local processor type detected by the native compiler.
>  @item loongarch64
> -A generic CPU with 64-bit extensions.
> +Generic LoongArch 64-bit processor.
>  @item la464
> -LoongArch LA464 CPU with LBT, LSX, LASX, LVZ.
> +LoongArch LA464-based processor with LSX, LASX.
> +@item la664
> +LoongArch LA664-based processor with LSX, LASX and all LoongArch v1.1 
> features.

One LoongArch v1.1 feature "Hardware Page Table Walker" is not
implemented by LA664.  Maybe "all LoongArch v1.1 **unprivileged**
features"?

> +@item la64v1.0
> +LoongArch64 ISA version 1.0.
> +@item la64v1.1
> +LoongArch64 ISA version 1.1.

IMO it's better to use a wording like LA664, i.e. "a CPU implementing
all LoongArch v1.1 unprivileged features" (emphasising "all", as the
v1.1 manual allows to only implement a subset of v1.1 features).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/2] LoongArch: Define ISA versions

2024-04-19 Thread Xi Ruoyao

On Fri, 2024-04-19 at 19:04 +0800, Yang Yujie wrote:
> These ISA versions are defined as -march= parameters and
> are recommended for building binaries for distribution.
> 
> Detailed description of these definitions can be found at
> https://github.com/loongson/la-toolchain-conventions, which
> the LoongArch GCC port aims to conform to.

The links seems broken.  Do you mean la-softdev-convention? 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Enable switchable target

2024-04-08 Thread Xi Ruoyao

On Mon, 2024-04-08 at 16:46 +0800, Yang Yujie wrote:
> v1 -> v2:
> Remove spaces from changelog.

I've rebuilt the base system with a GCC including this patch.  LTO+PGO
bootstrap fine, regtested fine, and no issues observed.

I do usually include the optimization flags into LDFLAGS when I do LTO,
so I don't really rely on this patch though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao

On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
>   * config/loongarch/loongarch-builtins.cc
> (loongarch_init_builtins):
>     Initialize all builtin functions at startup.

git gcc-verify complains that tab should be used instead of space for
this line.

>   (loongarch_expand_builtin): Turn assertion of builtin
> availability
>     into a test.

and this line.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] ICF: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao

On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> +/* Given two types in an assignment, return true either if any one cannot be
> +   totally scalarized or if they have padding (i.e. not copied bits)  */
> +
> +bool
> +sra_total_scalarization_would_copy_same_data_p (tree t1, tree t2)
> +{
> +  sra_padding_collecting p1;
> +  if (!check_ts_and_push_padding_to_vec (t1, ))
> +    return true;
> +
> +  sra_padding_collecting p2;
> +  if (!check_ts_and_push_padding_to_vec (t2, ))
> +    return true;
> +
> +  unsigned l = p1.m_padding.length ();
> +  if (l != p2.m_padding.length ())
> +    return false;
> +  for (unsigned i = 0; i < l; i++)
> +    if (p1.m_padding[i].first != p2.m_padding[i].first
> + || p1.m_padding[i].second != p2.m_padding[i].second)
> +  return false;
> +
> +  return true;
> +}
> +

Better remove this trailing empty line from tree-sra.cc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] ICF: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao

On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> The patch has been approved by Honza in Bugzilla. (I hope.  He did write
> it looked reasonable.)  Together with the patch for PR 113907, it has
> passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on
> x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux.  It also
> passed normal bootstrap on aarch64-linux but there many testcases failed
> because the compiler timed out.  The machine is old and slow and might
> have been oversubscribed so my plan is to try again on gcc185 from
> cfarm.  If that goes well, I intend to commit the patch and then start
> working on backports.

I've tried these two patches out on my own 24-core AArch64 machine. 
Bootstrapped (but no LTO or PGO) and regtested fine.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao

On Sun, 2024-04-07 at 16:23 +0800, Yang Yujie wrote:
> On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote:
> > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> > > This patch fixes the back-end context switching in cases where functions
> > > should be built with their own target contexts instead of the
> > > global one, such as LTO linking and functions with target attributes 
> > > (TBD).
> > > 
> > >   PR target/113233
> > 
> > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
> > save/restore"?  Should I reopen it?
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> Yes, the issue was not fixed with that patch. This one should do.

So reopened the PR.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao

On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> This patch fixes the back-end context switching in cases where functions
> should be built with their own target contexts instead of the
> global one, such as LTO linking and functions with target attributes (TBD).
> 
>   PR target/113233

Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
save/restore"?  Should I reopen it?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].

2024-04-06 Thread Xi Ruoyao

On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote:
> +/* Alignment for functions loops and jumps for best performance.  For new
> +   uarchs the value should be measured via benchmarking.  See the 
> documentation
> +   for -falign-functions -falign-loops and -falign-jumps in invoke.texi for 
> the
   ^ ^

Better have two commas here.

Otherwise it should be OK.

> +   format.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v5] LoongArch: Add support for TLS descriptors

2024-04-01 Thread Xi Ruoyao

Is this patch targeting GCC 14 or 15?  If 14 I guess we'd commit now...

Generally we don't add features in stage 4, but if we keep trad as the
default I think it'd be OK.  And RISC-V guys plan to push their TLS desc
implementation this week too.

On Tue, 2024-03-19 at 09:54 +0800, mengqinggang wrote:
> Add support for TLS descriptors on normal code model and extreme code model.
> 
> Normal code model instruction sequence:
>   -mno-explicit-relocs:
>     la.tls.desc   $r4, s
>     add.d $r12, $r4, $r2
>   -mexplicit-relocs:
>     pcalau12i $r4,%desc_pc_hi20(s)
>     addi.d$r4,$r4,%desc_pc_lo12(s)
>     ld.d  $r1,$r4,%desc_ld(s)
>     jirl  $r1,$r1,%desc_call(s)
>     add.d $r12, $r4, $r2
> 
> Extreme code model instruction sequence:
>   -mno-explicit-relocs:
>     la.tls.desc   $r4, $r12, s
>     add.d $r12, $r4, $r2
>   -mexplicit-relocs:
>     pcalau12i $r4,%desc_pc_hi20(s)
>     addi.d$r12,$r0,%desc_pc_lo12(s)
>     lu32i.d   $r12,%desc64_pc_lo20(s)
>     lu52i.d   $r12,$r12,%desc64_pc_hi12(s)
>     add.d $r4,$r4,$r12
>     ld.d  $r1,$r4,%desc_ld(s)
>     jirl  $r1,$r1,%desc_call(s)
>     add.d $r12, $r4, $r2
> 
> The default is still traditional TLS model, but can be configured with
> --with-tls={trad,desc}. The default can change to TLS descriptors once
> libc and LLVM support this.
> 
> gcc/ChangeLog:
> 
>   * config.gcc: Add --with-tls option to change TLS flavor.
>   * config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to
>   configure TLS flavor.
>   * config/loongarch/loongarch-def.h (struct loongarch_target): Add
>   tls_dialect.
>   * config/loongarch/loongarch-driver.cc (la_driver_init): Add tls
>   flavor.
>   * config/loongarch/loongarch-opts.cc (loongarch_init_target): Add
>   tls_dialect.
>   (loongarch_config_target): Ditto.
>   (loongarch_update_gcc_opt_status): Ditto.
>   * config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto.
>   (TARGET_TLS_DESC): New define.
>   * config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC
>   instructions sequence length.
>   (loongarch_legitimize_tls_address): New TLS DESC instruction sequence.
>   (loongarch_option_override_internal): Add la_opt_tls_dialect.
>   (loongarch_option_restore): Add la_target.tls_dialect.
>   * config/loongarch/loongarch.md (@got_load_tls_desc): Normal
>   code model for TLS DESC.
>   (got_load_tls_desc_off64): Extreme code model for TLS DESC.
>   * config/loongarch/loongarch.opt: Regenerated.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad.
>   * gcc.target/loongarch/cmodel-extreme-2.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c:
>   Ditto.
>   * gcc.target/loongarch/func-call-medium-1.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-2.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-3.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-4.c: Ditto.
>   * gcc.target/loongarch/tls-extreme-macro.c: Ditto.
>   * gcc.target/loongarch/tls-gd-noplt.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New 
> test.
>   * gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test.
>   * gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test.
>   * gcc.target/loongarch/explicit-relocs-tls-desc.c: New test.
> 
> Co-authored-by: Lulu Cheng 
> Co-authored-by: Xi Ruoyao 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread Xi Ruoyao

On Mon, 2024-04-01 at 10:22 +0800, chenglulu wrote:
> 
> 在 2024/4/1 上午9:29, Xi Ruoyao 写道:
> > On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:
> > 
> > > I tested spec2006. In the floating-point program, the test items with 
> > > large
> > > 
> > > fluctuations are removed, and the rest is basically unchanged.
> > > 
> > > The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and 
> > > (10,22).
> > So IIUC (10,10) is better than (5,5), (10,22), and the originally
> > proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
> > DF, SI, DI) 10?
> 
> I think this may require the analysis of the spec's test case. I took a 
> look at the test results again,
> 
> where the scores of SPEC INT 462.libquantum fluctuated greatly, but the 
> combination of (10,22)
> 
> showed an overall upward trend compared to the scores of the other two
> combinations.
> 
> I don't know if (10,22) this combination happens to have the kind of 
> test cases in the changelog.
> 
> So can we change it together in GCC15?

Ok.  Abandoning this patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread Xi Ruoyao

On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:

> I tested spec2006. In the floating-point program, the test items with large
> 
> fluctuations are removed, and the rest is basically unchanged.
> 
> The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).

So IIUC (10,10) is better than (5,5), (10,22), and the originally
proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
DF, SI, DI) 10?

I'd still want DI % 17 to be reduced as reciprocal sequence (but
not SI % 17) since DI % (smaller const) is quite important for
some workloads like competitive programming.  However "adapting with
different modulos" is not possible w/o refactoring generic code so it
must be deferred to at least GCC 15.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[gcc r13-8527] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-29 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:1ab646f678a9d47b338424ed69e9ae5d774b06ae

commit r13-8527-g1ab646f678a9d47b338424ed69e9ae5d774b06ae
Author: Xi Ruoyao 
Date:   Wed Mar 20 15:09:21 2024 +0800

mips: Fix C23 (...) functions returning large aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.

(cherry picked from commit 6fc84f680d098f82c1c43435fdb206099f0df4df)

Diff:
---
 gcc/config/mips/mips.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..8e3dc313cb3 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6684,7 +6684,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 mips_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */

[gcc r13-8526] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-29 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:b73a6a20113ca607cf3073c751698b12aa167881

commit r13-8526-gb73a6a20113ca607cf3073c751698b12aa167881
Author: Xi Ruoyao 
Date:   Mon Mar 18 17:18:34 2024 +0800

LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.

(cherry picked from commit c1fd4589c2bf9fd8409d51b94df219cb75107762)

Diff:
---
 gcc/config/loongarch/loongarch.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e78b81cd8fc..d2c26407447 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -760,7 +760,13 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */

[gcc r14-9728] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-29 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:6fc84f680d098f82c1c43435fdb206099f0df4df

commit r14-9728-g6fc84f680d098f82c1c43435fdb206099f0df4df
Author: Xi Ruoyao 
Date:   Wed Mar 20 15:09:21 2024 +0800

mips: Fix C23 (...) functions returning large aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.

Diff:
---
 gcc/config/mips/mips.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 68e2ae8d8fa..ce764a5cb35 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 mips_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */

[gcc r14-9717] testsuite: Add a test case for negating FP vectors containing zeros

2024-03-28 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:26a723625bb63e42b8c78bbce52ba0286eefda1c

commit r14-9717-g26a723625bb63e42b8c78bbce52ba0286eefda1c
Author: Xi Ruoyao 
Date:   Tue Feb 6 17:49:50 2024 +0800

testsuite: Add a test case for negating FP vectors containing zeros

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..21fa00cfa15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fno-associative-math -fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}

Ping: [PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-28 Thread Xi Ruoyao

Ping.

On Wed, 2024-03-20 at 15:10 +0800, Xi Ruoyao wrote:
> We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> arguments and there is nothing to advance, but that is not the case
> for (...) functions returning by hidden reference which have one such
> artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
> fail.
> 
> Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> 
> gcc/ChangeLog:
> 
>   PR target/114175
>   * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
>   mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
>   functions if arg.type is NULL.
> ---
> 
> Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?
> 
>  gcc/config/mips/mips.cc | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 68e2ae8d8fa..ce764a5cb35 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
>   argument.  Advance a local copy of CUM past the last "real" named
>   argument, to find out how many registers are left over.  */
>    local_cum = *get_cumulative_args (cum);
> -  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
> +
> +  /* For a C23 variadic function w/o any named argument, and w/o an
> + artifical argument for large return value, skip advancing args.
> + There is such an artifical argument iff. arg.type is non-NULL
> + (PR 114175).  */
> +  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +  || arg.type != NULL_TREE)
>  mips_function_arg_advance (pack_cumulative_args (_cum), arg);
>  
>    /* Found out how many registers we need to save.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> > 
> > 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > > 
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > > 
> > >  int
> > >  main (void)
> > >  {
> > >    unsigned long stat = 0xdeadbeef;
> > >    for (int i = 0; i < 1; i++)
> > >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> > >    asm(""::"r"(stat));
> > >  }
> > > 
> > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> > 
> > The test case div-const-reduction.c is modified to assemble the instruction
> > sequence as follows:
> > lu12i.w $r12,97440>>12  # 0x3b9ac000
> > ori $r12,$r12,2567
> > mod.w   $r13,$r13,$r12
> > 
> > This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.

> Hmm indeed, it seems a waste to do this reduction for int / 17.
> I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao

On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
> > 
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >     int
> >     main (void)
> >     {
> >   unsigned long stat = 0xdeadbeef;
> >   for (int i = 0; i < 1; i++)
> >     stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >   asm(""::"r"(stat));
> >     }
> 
> I think you should be able to see a constant divisor and thus could do
> better than return the same latency for everything.  For non-constant
> divisors using the best-case latency shouldn't be a problem.

Hmm, it seems not really possible as at now.  expand_divmod does
something like:

  max_cost = (unsignedp
  ? udiv_cost (speed, compute_mode)
  : sdiv_cost (speed, compute_mode));

which is reading the pre-calculated costs from a table.  Thus we don't
really know the denominator and cannot estimate the cost based on it :(.

CSE really invokes the cost hook with the actual (mod (a, (const_int
17)) RTX but it's less important.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao

On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> 
> 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >  int
> >  main (void)
> >  {
> >    unsigned long stat = 0xdeadbeef;
> >    for (int i = 0; i < 1; i++)
> >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >    asm(""::"r"(stat));
> >  }
> > 
> > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> 
> The test case div-const-reduction.c is modified to assemble the instruction
> sequence as follows:
>   lu12i.w $r12,97440>>12  # 0x3b9ac000
>   ori $r12,$r12,2567
>   mod.w   $r13,$r13,$r12
> 
> This sequence of instructions takes 5 clock cycles.

Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-26 Thread Xi Ruoyao

On Tue, 2024-03-26 at 11:15 +0800, YunQiang Su wrote:

/* snip */

> With -ffinite-math-only -fno-signed-zeros, it does work with
>     x >= y ? x : y
> while without `-ffinite-math-only -fno-signed-zeros`, it cannot.
> @Xi Ruoyao Is it expected by IEEE?

When y is (quiet) NaN and x is not, fmax(x, y) should produce x but x >=
y ? x : y should produce y.  Thus -ffinite-math-only is needed.

When x is +0.0 and y is -0.0, x >= y ? x : y should produce +0.0 but
fmax(x, y) may produce +0.0 or -0.0 (IEEE allows both and I don't see a
more strict requirement in MIPS 6.06 manual either).  Thus -fno-signed-
zeros is needed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Increase division costs

2024-03-26 Thread Xi Ruoyao

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

int
main (void)
{
  unsigned long stat = 0xdeadbeef;
  for (int i = 0; i < 1; i++)
stat = (stat * stat + stat * 114514 + 1919810) % 17;
  asm(""::"r"(stat));
}

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

gcc/ChangeLog:

* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
default division cost to the average of the best case and worst
case senarios observed.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/div-const-reduction.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch-def.cc| 8 
 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +
 2 files changed, 13 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..93e72a520d5 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
   : fp_add (COSTS_N_INSNS (5)),
 fp_mult_sf (COSTS_N_INSNS (5)),
 fp_mult_df (COSTS_N_INSNS (5)),
-fp_div_sf (COSTS_N_INSNS (8)),
-fp_div_df (COSTS_N_INSNS (8)),
+fp_div_sf (COSTS_N_INSNS (12)),
+fp_div_df (COSTS_N_INSNS (15)),
 int_mult_si (COSTS_N_INSNS (4)),
 int_mult_di (COSTS_N_INSNS (4)),
-int_div_si (COSTS_N_INSNS (5)),
-int_div_di (COSTS_N_INSNS (5)),
+int_div_si (COSTS_N_INSNS (14)),
+int_div_di (COSTS_N_INSNS (22)),
 movcf2gr (COSTS_N_INSNS (7)),
 movgr2cf (COSTS_N_INSNS (15)),
 branch_cost (6),
diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
new file mode 100644
index 000..0ee86410dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
+
+int
+test (int a)
+{
+  return a % 17;
+}
-- 
2.44.0

TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Xi Ruoyao

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:
> > +/* Costs to use when optimizing for xiangshan nanhu.  */
> > +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
> > +  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
> > +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
> > +  6,   /* issue_rate */
> > +  3,   /* branch_cost */
> > +  3,   /* memory_cost */
> > +  3,   /* fmv_cost */
> > +  true,/* 
> > slow_unaligned_access */
> > +  false,   /* use_divmod_expansion */
> > +  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
> > +  NULL,/* vector cost */

> Is your integer division really that fast?  The table above essentially 
> says that your cpu can do integer division in 6 cycles.

Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-21 Thread Xi Ruoyao

On Thu, 2024-03-21 at 10:14 +0800, Jie Mei wrote:
> diff --git a/gcc/testsuite/gcc.target/mips/mips-minmax.c 
> b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> new file mode 100644
> index 000..2d234ac4b1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mhard-float -ffinite-math-only -march=mips32r6" } */

You may want to add fmin3 and fmax3 in addition to
smin3 and smax3 so it will work without -ffinite-math-only.

‘fminM3’, ‘fmaxM3’
 IEEE-conformant minimum and maximum operations.  If one operand is
 a quiet ‘NaN’, then the other operand is returned.  If both
 operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned.  In the
 case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an
 invalid floating point exception is raised and a quiet ‘NaN’ is
 returned.

And the MIPS 6.06 manual says:

Numbers are preferred to NaNs: if one input is a NaN, but not both, the
value of the numeric input is returned. If both are NaNs, the NaN in fs
is returned.

for MAX.fmt and MIN.fmt, so they matches fmin3 and fmax3.

> +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] LoongArch: Fix a typo [PR 114407]

2024-03-20 Thread Xi Ruoyao

gcc/ChangeLog:

PR target/114407
* config/loongarch/loongarch-opts.cc (loongarch_config_target):
Fix typo in diagnostic message, enabing -> enabling.
---

Pushed r14-9582 as obvious.

 gcc/config/loongarch/loongarch-opts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 7eeac43ed2f..627f9148adf 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -362,7 +362,7 @@ config_target_isa:
  gcc_assert (constrained.simd);
 
  inform (UNKNOWN_LOCATION,
- "enabing %qs promotes %<%s%s%> to %<%s%s%>",
+ "enabling %qs promotes %<%s%s%> to %<%s%s%>",
  loongarch_isa_ext_strings[t.isa.simd],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]);
-- 
2.44.0

[gcc r14-9582] LoongArch: Fix a typo [PR 114407]

2024-03-20 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:806621dc14f893476dd6f2a9ae5cf773f3b0951b

commit r14-9582-g806621dc14f893476dd6f2a9ae5cf773f3b0951b
Author: Xi Ruoyao 
Date:   Thu Mar 21 04:01:17 2024 +0800

LoongArch: Fix a typo [PR 114407]

gcc/ChangeLog:

PR target/114407
* config/loongarch/loongarch-opts.cc (loongarch_config_target):
Fix typo in diagnostic message, enabing -> enabling.

Diff:
---
 gcc/config/loongarch/loongarch-opts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 7eeac43ed2f..627f9148adf 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -362,7 +362,7 @@ config_target_isa:
  gcc_assert (constrained.simd);
 
  inform (UNKNOWN_LOCATION,
- "enabing %qs promotes %<%s%s%> to %<%s%s%>",
+ "enabling %qs promotes %<%s%s%> to %<%s%s%>",
  loongarch_isa_ext_strings[t.isa.simd],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]);

[PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-20 Thread Xi Ruoyao

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?

 gcc/config/mips/mips.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 68e2ae8d8fa..ce764a5cb35 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 mips_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0

Pushed: [PATCH v2] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-19 Thread Xi Ruoyao

On Tue, 2024-03-19 at 11:19 +0800, chenglulu wrote:
> 
> 在 2024/3/18 下午5:34, Xi Ruoyao 写道:
> > We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> > arguments and there is nothing to advance, but that is not the case
> > for (...) functions returning by hidden reference which have one
> > such
> > artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
> > gcc.dg/c23-stdarg-8.c to fail.
> > 
> > Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/114175
> > * config/loongarch/loongarch.cc
> > (loongarch_setup_incoming_varargs): Only skip
> > loongarch_function_arg_advance for
> > TYPE_NO_NAMED_ARGS_STDARG_P
> > functions if arg.type is NULL.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> >   gcc/config/loongarch/loongarch.cc | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 70e31bb831c..57de8ef7d20 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs
> > (cumulative_args_t cum,
> >    argument.  Advance a local copy of CUM past the last "real"
> > named
> >    argument, to find out how many registers are left over.  */
> >     local_cum = *get_cumulative_args (cum);
> I think it's important to add annotation information here:
>  /* where there is no hidden return argument passed, arg.type
> 
>   is always NULL.  */
> 
> Others LTGM.
> 
> Thanks!

Pushed v2 with a comment added as attached.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From c1fd4589c2bf9fd8409d51b94df219cb75107762 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Mon, 18 Mar 2024 17:18:34 +0800
Subject: [PATCH v2] LoongArch: Fix C23 (...) functions returning large
 aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

	PR target/114175
	* config/loongarch/loongarch.cc
	(loongarch_setup_incoming_varargs): Only skip
	loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
	functions if arg.type is NULL.
---
 gcc/config/loongarch/loongarch.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..5344f2a6987 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,13 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0

[gcc r14-9538] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-19 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:c1fd4589c2bf9fd8409d51b94df219cb75107762

commit r14-9538-gc1fd4589c2bf9fd8409d51b94df219cb75107762
Author: Xi Ruoyao 
Date:   Mon Mar 18 17:18:34 2024 +0800

LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.

Diff:
---
 gcc/config/loongarch/loongarch.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..5344f2a6987 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,13 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */

[PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread Xi Ruoyao

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..57de8ef7d20 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0

[gcc r14-9474] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-14 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:f98b85b1ef74b7c5c0852b3d063262bce63df14e

commit r14-9474-gf98b85b1ef74b7c5c0852b3d063262bce63df14e
Author: Xi Ruoyao 
Date:   Wed Mar 13 20:44:38 2024 +0800

LoongArch: Remove unused and incorrect "sge_" 
define_insn

If this insn is really used, we'll have something like

slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.

Diff:
---
 gcc/config/loongarch/loongarch.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@
 ;; These code iterators allow the signed and unsigned scc operations to use
 ;; the same template.
 (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
@@ -3355,15 +3354,6 @@
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
-(define_insn "*sge_"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
 (define_insn "*slt_"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")

[PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-13 Thread Xi Ruoyao

If this insn is really used, we'll have something like

slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.
---

Not fully tested but should be obvious.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne])
 ;; These code iterators allow the signed and unsigned scc operations to use
 ;; the same template.
 (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
@@ -3355,15 +3354,6 @@ (define_insn "*sgt_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
-(define_insn "*sge_"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
 (define_insn "*slt_"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")
-- 
2.44.0

Re: [PATCH v1] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-13 Thread Xi Ruoyao

On Tue, 2024-03-12 at 09:56 +0800, Chenghui Pan wrote:
> The behavior of non-zero unused bits in xvpermi.q instruction's
> third operand is undefined on LoongArch, according to our
> discussion (https://github.com/llvm/llvm-project/pull/83540),
> we think that keeping original insn operand as unmodified
> state is better solution.
> 
> This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/lasx.md: Remove masking of operand 3.

Add (lasx_xvpermi_q_) before ":".

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
>     Reposition operand 3's value into instruction's defined accept range.
^^

Remove these two white spaces.

Should be OK with these ChangeLog style issues fixed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] testsuite: Fix vfprintf-chk-1.c with -fhardened

2024-03-13 Thread Xi Ruoyao

On Tue, 2024-03-12 at 17:19 +0100, Jakub Jelinek wrote:
> On Thu, Feb 15, 2024 at 10:53:08PM +, Sam James wrote:
> > With _FORTIFY_SOURCE >= 2 (enabled by -fhardened), vfprintf-chk-1.c's
> > __vfprintf_chk ends up calling __vprintf_chk rather than vprintf.

Do we really want to support adding random CFLAGS running the test
suite?  AFAIK adding random CFLAGS will just cause test failures here or
there.  We are adjusting the test suite for -fPIE -pie and -fstack-
protector-strong but it's because they can be implicitly enabled with --
enable-default-* options, and we don't have --enable-default-hardened as
at now.

If we need to bootstrap a hardened GCC and test it, pass -fhardened as
how "info gccinstall" suggests:

make BOOT_CFLAGS="-O2 -g -fhardened"

instead of

env C{,XX}FLAGS="-O2 -g -fhardened" /path/to/gcc/configure ...

which will taint the test suite with -fhardened.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao

On Wed, 2024-03-13 at 10:24 +0800, Xi Ruoyao wrote:
>    return TARGET_EXPLICIT_RELOCS
> -    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> -  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> -  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> -  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> -  \tadd.d\t$r4,$r4,%2\n\
> -  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> -  \tjirl\t$r1,$r1,%%desc_call(%1)"
> -    : "la.tls.desc\t%0,%2,%1";
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
> +  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
> +  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
> +  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"

Oops, the "%2" in the above line should be "%1".

> +  "add.d\t$r4,$r4,%1\n\t"
> +  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
> +  "jirl\t$r1,$r1,%%desc_call(%0)"
> +    : "la.tls.desc\t$r4,%1,%0";

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao

On Wed, 2024-03-13 at 11:06 +0800, mengqinggang wrote:
> 
> 在 2024/3/13 上午6:15, Xi Ruoyao 写道:
> > On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> > > + (unspec:P
> > > +     [(match_operand:P 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > > +  "TARGET_TLS_DESC"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > Use something like
> > 
> >  ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
> >    "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
> >    "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
> >    "jirl\t$r1,$r1,%%desc_call(%1)"
> >  : "la.tls.desc\t%0,%1";
> > 
> > to prevent additional white spaces in the output asm before tabs.
> > 
> > > +    : "la.tls.desc\t%0,%1";
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "mode" "")
> > > +   (set_attr "length" "16")])
> > > +
> > > +(define_insn "got_load_tls_desc_off64"
> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > > + (unspec:DI
> > > +     [(match_operand:DI 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC_OFF64))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> > > +    (clobber (match_operand:DI 2 "register_operand" "="))]
> > > +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> > > +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> > > +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> > > +  \tadd.d\t$r4,$r4,%2\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > > +    : "la.tls.desc\t%0,%2,%1";
> > Likewise.
> > 
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "length" "28")])
> > Otherwise OK.
> > 
> > It's better to allow splitting these two instructions but we can do it
> > in another patch.  And IMO it's better to enable TLS desc by default if
> > supported by both the assembler and the libc, but we'll have to defer it
> > until Glibc 2.40 release.
> 
> 
> Do we need to wait until LLVM also supports TLS DESC  before setting it 
> as default?

Hmm, maybe...  I remember when we added R_LARCH_ALIGN lld was being
broken for a while.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao

On Wed, 2024-03-13 at 06:56 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> 
> Hmm, and it looks like we should use (reg:P 4) instead of match_operand
> here, because the instruction does not work for a different register:
> with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without
> TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return
> value in r4.

Suggested changes:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 303666bf6d5..8f4d3f36c26 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2954,10 +2954,10 @@ loongarch_legitimize_tls_address (rtx loc)
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
 
  if (TARGET_CMODEL_EXTREME)
-   emit_insn (gen_got_load_tls_desc_off64 (a0, loc,
+   emit_insn (gen_got_load_tls_desc_off64 (loc,
gen_reg_rtx (DImode)));
  else
-   emit_insn (gen_got_load_tls_desc (Pmode, a0, loc));
+   emit_insn (gen_got_load_tls_desc (Pmode, loc));
 
  emit_insn (gen_add3_insn (dest, a0, tp));
}
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0a1a6a24f61..8e8f1012344 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2772,9 +2772,9 @@ (define_insn "store_word"
 ;; Thread-Local Storage
 
 (define_insn "@got_load_tls_desc"
-  [(set (match_operand:P 0 "register_operand" "=r")
+  [(set (reg:P 4)
(unspec:P
-   [(match_operand:P 1 "symbolic_operand" "")]
+   [(match_operand:P 0 "symbolic_operand" "")]
UNSPEC_TLS_DESC))
 (clobber (reg:SI FCC0_REGNUM))
 (clobber (reg:SI FCC1_REGNUM))
@@ -2788,20 +2788,20 @@ (define_insn "@got_load_tls_desc"
   "TARGET_TLS_DESC"
 {
   return TARGET_EXPLICIT_RELOCS
-? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
-  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
-  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
-  \tjirl\t$r1,$r1,%%desc_call(%1)"
-: "la.tls.desc\t%0,%1";
+? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
+  "addi.d\t$r4,$r4,%%desc_pc_lo12(%0)\n\t"
+  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
+  "jirl\t$r1,$r1,%%desc_call(%0)"
+: "la.tls.desc\t$r4,%0";
 }
   [(set_attr "got" "load")
(set_attr "mode" "")
(set_attr "length" "16")])
 
 (define_insn "got_load_tls_desc_off64"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (reg:DI 4)
(unspec:DI
-   [(match_operand:DI 1 "symbolic_operand" "")]
+   [(match_operand:DI 0 "symbolic_operand" "")]
UNSPEC_TLS_DESC_OFF64))
 (clobber (reg:SI FCC0_REGNUM))
 (clobber (reg:SI FCC1_REGNUM))
@@ -2812,18 +2812,18 @@ (define_insn "got_load_tls_desc_off64"
 (clobber (reg:SI FCC6_REGNUM))
 (clobber (reg:SI FCC7_REGNUM))
 (clobber (reg:SI RETURN_ADDR_REGNUM))
-(clobber (match_operand:DI 2 "register_operand" "="))]
+(clobber (match_operand:DI 1 "register_operand" "="))]
   "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
 {
   return TARGET_EXPLICIT_RELOCS
-? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
-  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
-  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
-  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
-  \tadd.d\t$r4,$r4,%2\n\
-  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
-  \tjirl\t$r1,$r1,%%desc_call(%1)"
-: "la.tls.desc\t%0,%2,%1";
+? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
+  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
+  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
+  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"
+  "add.d\t$r4,$r4,%1\n\t"
+  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
+  "jirl\t$r1,$r1,%%desc_call(%0)"
+: "la.tls.desc\t$r4,%1,%0";
 }
   [(set_attr "got" "load")
(set_attr "length" "28")])

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao

On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote:
> > +(define_insn "@got_load_tls_desc"
> > +  [(set (match_operand:P 0 "register_operand" "=r")

Hmm, and it looks like we should use (reg:P 4) instead of match_operand
here, because the instruction does not work for a different register:
with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without
TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return
value in r4.

> > +   (unspec:P
> > +       [(match_operand:P 1 "symbolic_operand" "")]
> > +       UNSPEC_TLS_DESC))
> > +    (clobber (reg:SI FCC0_REGNUM))
> > +    (clobber (reg:SI FCC1_REGNUM))
> > +    (clobber (reg:SI FCC2_REGNUM))
> > +    (clobber (reg:SI FCC3_REGNUM))
> > +    (clobber (reg:SI FCC4_REGNUM))
> > +    (clobber (reg:SI FCC5_REGNUM))
> > +    (clobber (reg:SI FCC6_REGNUM))
> > +    (clobber (reg:SI FCC7_REGNUM))
> > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > +  "TARGET_TLS_DESC"
> > +{
> > +  return TARGET_EXPLICIT_RELOCS
> > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> 
> Use something like
> 
>     ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
>   "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
>   "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
>   "jirl\t$r1,$r1,%%desc_call(%1)"
>     : "la.tls.desc\t%0,%1";
> 
> to prevent additional white spaces in the output asm before tabs.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao

On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> +(define_insn "@got_load_tls_desc"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> + (unspec:P
> +     [(match_operand:P 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> +  "TARGET_TLS_DESC"
> +{
> +  return TARGET_EXPLICIT_RELOCS
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> +  \tjirl\t$r1,$r1,%%desc_call(%1)"

Use something like

? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
  "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
  "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
  "jirl\t$r1,$r1,%%desc_call(%1)"
: "la.tls.desc\t%0,%1";

to prevent additional white spaces in the output asm before tabs.

> +    : "la.tls.desc\t%0,%1";
> +}
> +  [(set_attr "got" "load")
> +   (set_attr "mode" "")
> +   (set_attr "length" "16")])
> +
> +(define_insn "got_load_tls_desc_off64"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (unspec:DI
> +     [(match_operand:DI 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC_OFF64))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> +    (clobber (match_operand:DI 2 "register_operand" "="))]
> +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> +{
> +  return TARGET_EXPLICIT_RELOCS
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> +  \tadd.d\t$r4,$r4,%2\n\
> +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> +    : "la.tls.desc\t%0,%2,%1";

Likewise.

> +}
> +  [(set_attr "got" "load")
> +   (set_attr "length" "28")])

Otherwise OK.

It's better to allow splitting these two instructions but we can do it
in another patch.  And IMO it's better to enable TLS desc by default if
supported by both the assembler and the libc, but we'll have to defer it
until Glibc 2.40 release.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[gcc r14-9411] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker re

2024-03-09 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:42cd49aa48c7ca99e8d5e91ce582d41fdb75f3fc

commit r14-9411-g42cd49aa48c7ca99e8d5e91ce582d41fdb75f3fc
Author: Xi Ruoyao 
Date:   Fri Jan 26 18:28:32 2024 +0800

LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to 
allow the IE to LE linker relaxation

In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model.  So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation.  The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.

Diff:
---
 gcc/config/loongarch/loongarch.cc   | 15 ++-
 gcc/config/loongarch/loongarch.md   |  2 +-
 gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c |  5 +
 gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c |  5 +
 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c   | 11 +++
 5 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 0428b6e65d5..70e31bb831c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4981,7 +4981,7 @@ loongarch_output_move (rtx dest, rtx src)
  if (type == SYMBOL_TLS_LE)
return "lu12i.w\t%0,%h1";
  else
-   return "pcalau12i\t%0,%h1";
+   return "%Q1pcalau12i\t%0,%h1";
}
 
   if (src_code == CONST_INT)
@@ -6145,6 +6145,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'L'  Print the low-part relocation associated with OP.
'm' Print one less than CONST_INT OP in decimal.
'N' Print the inverse of the integer branch condition for comparison OP.
+   'Q'  Print R_LARCH_RELAX for TLS IE.
'r'  Print address 12-31bit relocation associated with OP.
'R'  Print address 32-51bit relocation associated with OP.
'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
@@ -6282,6 +6283,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
letter);
   break;
 
+case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+   break;
+
+  if (code == HIGH)
+   op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
+   fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;
+
 case 'r':
   loongarch_print_operand_reloc (file, op, false /* hi64_part */,
 true /* lo_reloc */);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index f3b5c641fce..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2620,7 +2620,7 @@
(match_operand:P 2 "symbolic_operand")))]
UNSPEC_LOAD_FROM_GOT))]
   ""
-  "ld.\t%0,%1,%L2"
+  "%Q2ld.\t%0,%1,%L2"
   [(set_attr "type" "move")]
 )
 
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
new file mode 100644
index 000..00c545a3e8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mcmodel=extreme 
-mexplicit-relocs=auto -mrelax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
new file mode 100644
index 000..dd6bf3634a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#i

Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao

On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:
> 
> 在 2024/3/7 下午8:52, Xi Ruoyao 写道:
> > It should be better to extend the expected value before the ll/sc loop
> > (like what LLVM does), instead of repeating the extending in each
> > iteration.  Something like:
> 
> I wanted to do this at first, but it didn't work out.
> 
> But then I thought about it, and there are two benefits to putting it in 
> the middle of ll/sc:
> 
> 1. If there is an operation that uses the $r4 register after this atomic 
> operation, another
> 
> register is required to store $r4.
> 
> 2. ll.w requires long cycles, so putting an addi.w command after ll.w 
> won't make a difference.
> 
> So based on the above, I didn't try again, but directly made a 
> modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao

On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote:

> +  output_asm_insn ("1:", operands);
> +  output_asm_insn ("ll.\t%0,%1", operands);
> +
> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
> + return value of the val_without_const_folding will not be truncated and
> + will be passed directly to the function compare_exchange_strong.
> + However, the instruction 'bne' does not distinguish between 32-bit and
> + 64-bit operations.  so if the upper 32 bits of the register are not
> + extended by the 32nd bit symbol, then the comparison may not be valid
> + here.  This will affect the result of the operation.  */
> +
> +  if (TARGET_64BIT && REG_P (operands[2])
> +  && GET_MODE (operands[2]) == SImode)
> +    {
> +  output_asm_insn ("addi.w\t%5,%2,0", operands);
> +  output_asm_insn ("bne\t%0,%5,2f", operands);

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..c21781947fd 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -234,11 +234,11 @@ (define_insn "atomic_exchange_short"
   "amswap%A3.\t%0,%z2,%1"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_cas_value_strong"
+(define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=")
(match_operand:GPR 1 "memory_operand" "+ZC"))
(set (match_dup 1)
-   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+   (unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ")
  (match_operand:GPR 3 "reg_or_0_operand" "rJ")
  (match_operand:SI 4 "const_int_operand")]  ;; 
mod_s
 UNSPEC_COMPARE_AND_SWAP))
@@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong"
   ""
 {
   return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
+"ll.\\t%0,%1\\n\\t"
 "bne\\t%0,%z2,2f\\n\\t"
 "or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
+"sc.\\t%5,%1\\n\\t"
 "beqz\\t%5,1b\\n\\t"
 "b\\t3f\\n\\t"
 "2:\\n\\t"
@@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap"
 operands[3], 
operands[4],
 operands[6]));
   else
-emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2],
- operands[3], operands[4],
- operands[6]));
+{
+  rtx (*cas)(rtx, rtx, rtx, rtx, rtx) =
+   TARGET_64BIT ? gen_atomic_cas_value_strongdi
+: gen_atomic_cas_value_strongsi;
+  rtx expect = operands[3];
+
+  if (mode == SImode
+ && TARGET_64BIT
+ && operands[3] != const0_rtx)
+   {
+ expect = gen_reg_rtx (DImode);
+ emit_insn (gen_extendsidi2 (expect, operands[3]));
+   }
+
+  emit_insn (cas (operands[1], operands[2], expect, operands[4],
+ operands[6]));
+}
 
   rtx compare = operands[1];
   if (operands[3] != const0_rtx)

It produces:

slli.w  $r4,$r4,0
1:
ll.w$r14,$r3,0
bne $r14,$r4,2f
or  $r15,$zero,$r12
sc.w$r15,$r3,0
beqz$r15,1b
b   3f
2:
dbar0b10100
3:

for the test case and the compiled test case runs successfully.  I've
not done a full bootstrap yet though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread Xi Ruoyao

On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:
> Hi,
> 
> Whether to add an option to control the generation of R_LARCH_RELAX,
> similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */

> > +    case 'Q':
> > +  if (!TARGET_LINKER_RELAXATION)
> > +break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.

> > +  if (code == HIGH)
> > +op = XEXP (op, 0);
> > +
> > +  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
> > +fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
> > +
> > +  break;

The tls-ie-norelax.c test case also checks for -mno-relax:

> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
> > +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } 
> > } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[gcc r14-9334] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-06 Thread Xi Ruoyao via Gcc-cvs

https://gcc.gnu.org/g:7719b9be2daa55edf336d721839300e62a7abbdc

commit r14-9334-g7719b9be2daa55edf336d721839300e62a7abbdc
Author: Xi Ruoyao 
Date:   Tue Mar 5 20:46:57 2024 +0800

LoongArch: testsuite: Rewrite {x,}vfcmp-{d,f}.c to avoid named registers

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 +--
 gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 --
 gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 +--
 gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 +--
 4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #define F double
 #define I long long
 
 #include "vfcmp-f.c"
 
-/* { dg-final { scan-assembler 
"compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$v

[PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-05 Thread Xi Ruoyao

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
---

Tested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 --
 gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 ++
 gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 --
 gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 --
 4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #define F double
 #define I long long
 
 #include "vfcmp-f.c"
 
-/* { dg-final { scan-assembler 
"compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_quiet_not_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_signaling_greater:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.slt.d (\$vr[0-9]+),\2,\1
+** vst \3,\$r6,0
+**

[PATCH v2] LoongArch: Allow s9 as a register alias

2024-03-05 Thread Xi Ruoyao

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

v1 -> v2: Add a test case.

Ok for trunk?

 gcc/config/loongarch/loongarch.h   | 1 +
 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++
 2 files changed, 4 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
new file mode 100644
index 000..d2e3b80f83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
+register long fp asm("fp"); /* { dg-warning "register of 'fp' used for 
multiple global register variables" } */
-- 
2.44.0

[PATCH v3] testsuite: Add a test case for negating FP vectors containing zeros

2024-03-05 Thread Xi Ruoyao

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

- v1 -> v2: Remove { dg-do run } which may cause SIGILL.
- v2 -> v3: Add -fno-associative-math to fix an excessive warning on
  arm.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..21fa00cfa15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fno-associative-math -fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0

Re: [PATCH v2] LoongArch: Fix inconsistent description in *sge_

2024-03-05 Thread Xi Ruoyao

On Tue, 2024-03-05 at 16:05 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
> 
> ---
> Update in v2:
>     Remove useless support for op[1] is const_imm12_operand.
> 
> ---
>  gcc/config/loongarch/loongarch.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..e35a001e0ed 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3360,7 +3360,7 @@ (define_insn "*sge_"
>   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
>    (const_int 1)))]
>    ""
> -  "slti\t%0,%.,%1"
> +  "slt\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])

Hmm, this define_insn seems never really used or it would generate
something like "sltui $r4,$r0,$r4" and trigger an assembler failure. 
The generic path seems already converting "x >= 1" to "x > 0".

So it seems we should just remove this define_insn?


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Fix inconsistent description in *sge_

2024-03-04 Thread Xi Ruoyao

On Mon, 2024-03-04 at 11:03 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
>
> ---
>  gcc/config/loongarch/loongarch.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..2d25374bdc9 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3357,10 +3357,10 @@ (define_insn "*sgt_"
>  
>  (define_insn "*sge_"
>    [(set (match_operand:GPR 0 "register_operand" "=r")
> - (any_ge:GPR (match_operand:X 1 "register_operand" "r")
> + (any_ge:GPR (match_operand:X 1 "arith_operand" "rI")
>    (const_int 1)))]

No, arith_operand is just register_operand or const_imm12_operand, but
comparing a const_imm12_operand with (const_int 1) should be folded into
a constant (even at -O0, AFAIK).  So allowing const_imm12_operand here
makes no benefit.

>    ""
> -  "slti\t%0,%.,%1"
> +  "slt%i1\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-29 Thread Xi Ruoyao

On Thu, 2024-02-29 at 15:09 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
> Linaro ARM CI.

Oops, still failing ARM CI.  Not sure why...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Allow s9 as a register alias

2024-02-28 Thread Xi Ruoyao

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
-- 
2.44.0

[PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-02-28 Thread Xi Ruoyao

In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model.  So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation.  The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 15 ++-
 gcc/config/loongarch/loongarch.md |  2 +-
 .../gcc.target/loongarch/tls-ie-extreme.c |  5 +
 .../gcc.target/loongarch/tls-ie-norelax.c |  5 +
 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c | 11 +++
 5 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 0428b6e65d5..70e31bb831c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4981,7 +4981,7 @@ loongarch_output_move (rtx dest, rtx src)
  if (type == SYMBOL_TLS_LE)
return "lu12i.w\t%0,%h1";
  else
-   return "pcalau12i\t%0,%h1";
+   return "%Q1pcalau12i\t%0,%h1";
}
 
   if (src_code == CONST_INT)
@@ -6145,6 +6145,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'L'  Print the low-part relocation associated with OP.
'm' Print one less than CONST_INT OP in decimal.
'N' Print the inverse of the integer branch condition for comparison OP.
+   'Q'  Print R_LARCH_RELAX for TLS IE.
'r'  Print address 12-31bit relocation associated with OP.
'R'  Print address 32-51bit relocation associated with OP.
'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
@@ -6282,6 +6283,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
letter);
   break;
 
+case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+   break;
+
+  if (code == HIGH)
+   op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
+   fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;
+
 case 'r':
   loongarch_print_operand_reloc (file, op, false /* hi64_part */,
 true /* lo_reloc */);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index f3b5c641fce..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2620,7 +2620,7 @@ (define_insn "@ld_from_got"
(match_operand:P 2 "symbolic_operand")))]
UNSPEC_LOAD_FROM_GOT))]
   ""
-  "ld.\t%0,%1,%L2"
+  "%Q2ld.\t%0,%1,%L2"
   [(set_attr "type" "move")]
 )
 
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
new file mode 100644
index 000..00c545a3e8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mcmodel=extreme 
-mexplicit-relocs=auto -mrelax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
new file mode 100644
index 000..dd6bf3634a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c
new file mode 100644
index 000..e9f7569b1da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c
@@

[PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-28 Thread Xi Ruoyao

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
Linaro ARM CI.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..6af4a02c517
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0

[PATCH v2] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-02-28 Thread Xi Ruoyao

The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories.  So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.

In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation.  So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.

gcc/testsuite/ChangeLog:

PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
* lib/target-supports.exp (check_effective_target_vect_int_mod):
Remove because it's not used anymore.
---

v1->v2: Remove check_effective_target_vect_int_mod as it's now unused.

This fixes the test failure on loongarch64-linux-gnu.  Also tested on
x86_64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.dg/pr104992.c   |  5 ++---
 gcc/testsuite/lib/target-supports.exp | 13 -
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
index 82f8c75559c..6fd513d34b2 100644
--- a/gcc/testsuite/gcc.dg/pr104992.c
+++ b/gcc/testsuite/gcc.dg/pr104992.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
@@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned 
y, unsigned z) {
 return x / y * z == x;
 }
 
-/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! 
vect_int_mod } } } } */
-/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target vect_int_mod 
} } } */
+/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4138cc9a662..ae33c4f1e3a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9064,19 +9064,6 @@ proc check_effective_target_vect_long_mult { } {
 return $answer
 }
 
-# Return 1 if the target supports vector int modulus, 0 otherwise.
-
-proc check_effective_target_vect_int_mod { } {
-return [check_cached_effective_target_indexed vect_int_mod {
-  expr { ([istarget powerpc*-*-*]
- && [check_effective_target_has_arch_pwr10])
- || [istarget amdgcn-*-*]
- || ([istarget loongarch*-*-*]
-&& [check_effective_target_loongarch_sx])
- || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
-}
-
 # Return 1 if the target supports vector even/odd elements extraction, 0 
otherwise.
 
 proc check_effective_target_vect_extract_even_odd { } {
-- 
2.44.0

Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-02-28 Thread Xi Ruoyao

On Thu, 2024-02-29 at 14:08 +0800, Xi Ruoyao wrote:
> > +  "TARGET_TLS_DESC"
> > +  "la.tls.desc\t%0,%1"
> 
> With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
> of la.tls.desc.  As we don't want to add too many code we can just hard
> code the 4 instructions here instead of splitting this insn, just
> something like
> 
> { return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; }

And if -mcmodel=extreme we should use a 3-operand la.tls.desc.  Or if we
don't want to support this we can just error out if -mcmodel=extreme -
mtls-dialect=desc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-02-28 Thread Xi Ruoyao

On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote:
> Generate la.tls.desc macro instruction for TLS descriptors model.
> 
> la.tls.desc expand to
>   pcalau12i $a0, %desc_pc_hi20(a)
>   ld.d  $a1, $a0, %desc_ld_pc_lo12(a)
>   addi.d    $a0, $a0, %desc_add_pc_lo12(a)
>   jirl  $ra, $a1, %desc_call(a)
> 
> The default is TLS descriptors, but can be configure with
> -mtls-dialect={desc,trad}.

Please keep trad as the default for now.  Glibc-2.40 will be released
after GCC 14.1 but we don't want to end up in a situation where the
default configuration of the latest GCC release creating something not
working with latest Glibc release.

And there's also musl libc we need to take into account.

Or you can write some autoconf test for if the assembler supports
tlsdesc and check TARGET_GLIBC_MAJOR & TARGET_GLIBC_MINOR for Glibc
version to decide if enable desc by default.  If you want this but don't
have time to implement you can leave trad the default and I'll take care
of this.

/* snip */

> +(define_insn "@got_load_tls_desc"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> + (unspec:P
> +     [(match_operand:P 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI A1_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))]

Ok, the clobber list is correct.

> +  "TARGET_TLS_DESC"
> +  "la.tls.desc\t%0,%1"

With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
of la.tls.desc.  As we don't want to add too many code we can just hard
code the 4 instructions here instead of splitting this insn, just
something like

{ return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; }

> +  [(set_attr "got" "load")
> +   (set_attr "mode" "")])

We need (set_attr "length" "16") in this list as this actually expands
into 16 bytes.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH 2/2] LoongArch: Remove unneeded sign extension after crc/crcc instructions

2024-02-25 Thread Xi Ruoyao

The specification of crc/crcc instructions is clear that the output is
sign-extended to GRLEN.  Add a define_insn to tell the compiler this
fact and allow it to remove the unneeded sign extension on crc/crcc
output.  As crc/crcc instructions are usually used in a tight loop,
this should produce a significant performance gain.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(loongarch__w__w_extended): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/crc-sext.c: New test;
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 11 +++
 gcc/testsuite/gcc.target/loongarch/crc-sext.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/crc-sext.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 4ded1b3a117..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4264,6 +4264,17 @@ (define_insn "loongarch__w__w"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+(define_insn "loongarch__w__w_extended"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
+ (match_operand:SI 2 "register_operand" "r")]
+CRC)))]
+  "TARGET_64BIT"
+  ".w..w\t%0,%1,%2"
+  [(set_attr "type" "unknown")
+   (set_attr "mode" "")])
+
 ;; With normal or medium code models, if the only use of a pc-relative
 ;; address is for loading or storing a value, then relying on linker
 ;; relaxation is not better than emitting the machine instruction directly.
diff --git a/gcc/testsuite/gcc.target/loongarch/crc-sext.c 
b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
new file mode 100644
index 000..9ade5a8e4ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+**my_crc:
+** crc.w.d.w   \$r4,\$r4,\$r5
+** jr  \$r1
+*/
+int my_crc(long long dword, int crc)
+{
+   return __builtin_loongarch_crc_w_d_w(dword, crc);
+}
-- 
2.44.0

[PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines

2024-02-25 Thread Xi Ruoyao

Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next
change easier.

gcc/ChangeLog:

* config/loongarch/loongarch.md (CRC): New define_int_iterator.
(crc): New define_int_attr.
(loongarch_crc_w__w, loongarch_crcc_w__w): Unify
into ...
(loongarch__w__w): ... here.
---
 gcc/config/loongarch/loongarch.md | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2ce7a151880..4ded1b3a117 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4251,24 +4251,16 @@ (define_peephole2
 
 
 (define_mode_iterator QHSD [QI HI SI DI])
+(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC])
+(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")])
 
-(define_insn "loongarch_crc_w__w"
+(define_insn "loongarch__w__w"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRC))]
+CRC))]
   ""
-  "crc.w..w\t%0,%1,%2"
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")])
-
-(define_insn "loongarch_crcc_w__w"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
-  (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRCC))]
-  ""
-  "crcc.w..w\t%0,%1,%2"
+  ".w..w\t%0,%1,%2"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
-- 
2.44.0

Pushed: [GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-23 Thread Xi Ruoyao

On Thu, 2024-02-22 at 19:09 +0800, chenglulu wrote:
> 
> 在 2024/2/22 下午6:20, Xi Ruoyao 写道:
> > To improve Binutils compatibility we've had to backported relaxation
> > support.  But if a user just updates to GCC 13.3 and sticks with
> > Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
> > default because we are turning off relaxation for Binutils 2.41 (it
> > lacks conditional branch relaxation support) anyway.
> > 
> > So like GCC 14, make the default of -m[no-]explicit-relocs depend on
> > -m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc
> > to
> > reflect the behavior change.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/genopts/loongarch.opt.in
> > (TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
> > * config/loongarch/loongarch.opt: Regenerate.
> > * config/loongarch/loongarch.cc
> > (loongarch_option_override_internal): Set the default of
> > TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
> > && !loongarch_mrelax.
> > * doc/invoke.texi (-m[no-]explicit-relocs): Update for
> > LoongArch.
> > ---
> > 
> > Ok for releases/gcc-13?
> 
> LGTM!

Pushed r13-8357.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-23 Thread Xi Ruoyao

On Fri, 2024-02-23 at 11:37 +0800, chenglulu wrote:
> 
> 在 2024/2/23 上午11:27, Xi Ruoyao 写道:
> > On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:
> > > 在 2024/2/22 下午5:17, Xi Ruoyao 写道:
> > > > The gold linker has never been ported to LoongArch (and it seems
> > > > unlikely to be ported in the future as the new architectures are
> > > > focusing on lld and/or mold for fast linkers).
> > > > 
> > > > ChangeLog:
> > > > 
> > > >     * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
> > > >     list.
> > > >     * configure: Regenerate.
> > > > ---
> > > > 
> > > > Ok for GCC trunk (to get synced into Binutils later)?
> > > I have no problem. But I have a question. Is this modification simply
> > > because we don’t
> > > 
> > > support it or is there an error somewhere?
> > If a user specify --enable-gold building Binutils, with loongarch in
> > this list the building system will attempt to build gold and fail.  If
> > removing loongarch from the list the building system will ignore --
> > enable-gold.
> > 
> Okay, I understand.

Pushed r14-9149 and the Binutils maintainer will pick it up before the
next Binutils release (AFAIK).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread Xi Ruoyao

On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:
> 
> 在 2024/2/22 下午5:17, Xi Ruoyao 写道:
> > The gold linker has never been ported to LoongArch (and it seems
> > unlikely to be ported in the future as the new architectures are
> > focusing on lld and/or mold for fast linkers).
> > 
> > ChangeLog:
> > 
> >     * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
> >     list.
> >     * configure: Regenerate.
> > ---
> > 
> > Ok for GCC trunk (to get synced into Binutils later)?
> 
> I have no problem. But I have a question. Is this modification simply 
> because we don’t
> 
> support it or is there an error somewhere?

If a user specify --enable-gold building Binutils, with loongarch in
this list the building system will attempt to build gold and fail.  If
removing loongarch from the list the building system will ignore --
enable-gold.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-22 Thread Xi Ruoyao

To improve Binutils compatibility we've had to backported relaxation
support.  But if a user just updates to GCC 13.3 and sticks with
Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
default because we are turning off relaxation for Binutils 2.41 (it
lacks conditional branch relaxation support) anyway.

So like GCC 14, make the default of -m[no-]explicit-relocs depend on
-m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc to
reflect the behavior change.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in
(TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the default of
TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
&& !loongarch_mrelax.
* doc/invoke.texi (-m[no-]explicit-relocs): Update for
LoongArch.
---

Ok for releases/gcc-13?

 gcc/config/loongarch/genopts/loongarch.opt.in |  2 +-
 gcc/config/loongarch/loongarch.cc |  4 
 gcc/config/loongarch/loongarch.opt|  2 +-
 gcc/doc/invoke.texi   | 11 +--
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index da6fedd153e..76acd35d39c 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 768e2427285..e78b81cd8fc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6222,6 +6222,10 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
gcc_unreachable ();
 }
 
+  if (TARGET_EXPLICIT_RELOCS == M_OPTION_NOT_SEEN)
+TARGET_EXPLICIT_RELOCS = (HAVE_AS_EXPLICIT_RELOCS
+ && !loongarch_mrelax);
+
   /* Validate the guard size.  */
   int guard_size = param_stack_clash_protection_guard_size;
 
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 59b1e06d3f2..e61fbaed2c1 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99657fb44d8..792ce283bb9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25830,12 +25830,11 @@ The default code model is @code{normal}.
 @itemx -mno-explicit-relocs
 Use or do not use assembler relocation operators when dealing with symbolic
 addresses.  The alternative is to use assembler macros instead, which may
-limit optimization.  The default value for the option is determined during
-GCC build-time by detecting corresponding assembler support:
-@code{-mexplicit-relocs} if said support is present,
-@code{-mno-explicit-relocs} otherwise.  This option is mostly useful for
-debugging, or interoperation with assemblers different from the build-time
-one.
+limit instruction scheduling but allow linker relaxation.  The default
+value for the option is determined with the assembler capability detected
+during GCC build-time and the setting of @code{-mrelax}:
+@code{-mexplicit-relocs} if the assembler supports relocation operators
+but @code{-mrelax} is not enabled, @code{-mno-explicit-relocs} otherwise.
 
 @opindex mdirect-extern-access
 @item -mdirect-extern-access
-- 
2.43.2

[PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread Xi Ruoyao

The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).

ChangeLog:

* configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
list.
* configure: Regenerate.
---

Ok for GCC trunk (to get synced into Binutils later)?

 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 874966fb9f0..02b435c1163 100755
--- a/configure
+++ b/configure
@@ -3092,7 +3092,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
diff --git a/configure.ac b/configure.ac
index 4f34004a072..1a19c07a27b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -364,7 +364,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
-- 
2.43.2

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao

On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote:
> 
> 在 2024/2/20 下午7:31, Xi Ruoyao 写道:
> > On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> > > On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> > > 
> > > > So I think that without worrying about performance and ensuring that
> > > > there is no problem
> > > > 
> > > > with binutils, I think we can make the following modifications:
> > > > 
> > > >     -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> > > >     -   used for padding.  */
> > > >     +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding 
> > > > by
> > > >     +   default.  */
> > > >  #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> > > >     -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> > > >     +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > > > 
> > > > What do you think of it?
> > > Unfortunately it will cause warnings with GAS 2.41 or earlier like
> > > 
> > > t1.s:1: Warning: expected fill pattern missing
> > > t1.s:5: Warning: expected fill pattern missing
> > > 
> > > And AFAIK these things may cause many test failures due to "excessive
> > > errors" if running the GCC test suite with these earlier GAS versions.
> > > Maybe we'll have to add some autoconf-based probing for the linker
> > > anyway?
> > Or just silence the warning passing "--no-warn" to the assembler but I'm
> > highly unsure if this is really a good idea :(.
> > 
> I am not opposed to adding detection code, but I looked at this problem 
> today
> 
> and I think this change is the smallest change. I asked Meng Qinggang and he
> 
> said that the warning of GAS 2.41 can be removed.

Yes, but we cannot change a released binutils-2.41 tarball and Binutils
folks don't make point releases like GCC.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao

On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> 
> > So I think that without worrying about performance and ensuring that
> > there is no problem
> > 
> > with binutils, I think we can make the following modifications:
> > 
> >    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> >    -   used for padding.  */
> >    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
> >    +   default.  */
> >     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> >    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> >    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > 
> > What do you think of it?
> 
> Unfortunately it will cause warnings with GAS 2.41 or earlier like
> 
> t1.s:1: Warning: expected fill pattern missing
> t1.s:5: Warning: expected fill pattern missing
> 
> And AFAIK these things may cause many test failures due to "excessive
> errors" if running the GCC test suite with these earlier GAS versions.
> Maybe we'll have to add some autoconf-based probing for the linker
> anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao

On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:

> So I think that without worrying about performance and ensuring that 
> there is no problem
> 
> with binutils, I think we can make the following modifications:
> 
>    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
>    -   used for padding.  */
>    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
>    +   default.  */
>     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
>    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
>    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> 
> What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions. 
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-09 Thread Xi Ruoyao

On Fri, 2024-02-09 at 00:02 +0800, chenglulu wrote:
> 
> 在 2024/2/7 上午12:23, Xi Ruoyao 写道:
> > Hi Lulu,
> > 
> > I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
> > ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
> > reasons:
> > 
> > 1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
> > a correctness issue.  For example, a developer may use -falign-
> > functions=16 and then use the low 4 bits of a function pointer to encode
> > some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
> > really aligned to a 16 bytes boundary, causing some breakage.
> > 
> > 2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
> > opcodes.  For example:
> > 
> > .globl _start
> > _start:
> > .balign 32
> > nop
> > nop
> > nop
> > addi.d $a0, $r0, 1
> > .balign 16,54525952,4
> > addi.d $a0, $a0, 1
> > 
> > is assembled and linked to:
> > 
> > 0220 <_start>:
> >   220:  0340    nop
> >   224:  0340    nop
> >   228:  0340    nop
> >   22c:  02c00404    li.d$a0, 1
> >   230:      .word   0x   # <== OOPS!
> >   234:  02c00484    addi.d  $a0, $a0, 1
> > 
> > Arguably this is a bug in GAS (it should at least error out for the
> > unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
> > prefer it to support the 3-operand .align directive even -mrelax for
> > reasons I've given in [1]).  But we can at least work it around by
> > removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
> > 2.42.
> > 
> > 3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
> > ".align 5" which works as expected since Binutils-2.38.
> > 
> > 4. GCC < 14 does not have a default setting of -falign-*, so changing
> > this won't affect anyone who do not specify -falign-* explicitly.
> > 
> > [1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603
> > 
> > Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
> > then?
> > 
> Ok, I agree with you.
> 
> Thanks!

Oops, with Binutils-2.41 GAS will fail to assemble some conditional
branches if we do this :(.

Not sure what to do (maybe backporting both this and a simplified
version of PR112330 fix?)  Let's reconsider after the holiday...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-06 Thread Xi Ruoyao

On Tue, 2024-02-06 at 17:55 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> Ok for trunk?
> 
>  gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++
>  1 file changed, 39 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> new file mode 100644
> index 000..adb032f5c6a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */

This patch fails on Linaro CI for ARM.  I guess I need to remove this {
dg-do run } line and let the test framework to decide run or compile.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-06 Thread Xi Ruoyao

Hi Lulu,

I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
reasons:

1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
a correctness issue.  For example, a developer may use -falign-
functions=16 and then use the low 4 bits of a function pointer to encode
some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
really aligned to a 16 bytes boundary, causing some breakage.

2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
opcodes.  For example:

.globl _start
_start:
.balign 32
nop
nop
nop
addi.d $a0, $r0, 1
.balign 16,54525952,4
addi.d $a0, $a0, 1

is assembled and linked to:

0220 <_start>:
 220:   0340nop
 224:   0340nop
 228:   0340nop
 22c:   02c00404li.d$a0, 1
 230:   .word   0x   # <== OOPS!
 234:   02c00484addi.d  $a0, $a0, 1

Arguably this is a bug in GAS (it should at least error out for the
unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
prefer it to support the 3-operand .align directive even -mrelax for
reasons I've given in [1]).  But we can at least work it around by
removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
2.42.

3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
".align 5" which works as expected since Binutils-2.38.

4. GCC < 14 does not have a default setting of -falign-*, so changing
this won't affect anyone who do not specify -falign-* explicitly.

[1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603

Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
then?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 958 matches

Mail list logo