[PATCH] LoongArch: Add prefetch instruction

2022-09-25 Thread Xi Ruoyao via Gcc-patches
The test pr106397.c fails on LoongArch because we don't have defined
prefetch instruction.  We can silence the test for LoongArch, but it's
not too difficult to add the prefetch instruction so add it now.

-- >8 --

gcc/ChangeLog:

* config/loongarch/constraints.md (ZD): New address constraint.
* config/loongarch/loongarch.md (prefetch): New insn.
---
 gcc/config/loongarch/constraints.md |  6 ++
 gcc/config/loongarch/loongarch.md   | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 43cb7b5f0f5..93da5970958 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -190,3 +190,9 @@ (define_memory_constraint "ZB"
   The offset is zero"
   (and (match_code "mem")
(match_test "REG_P (XEXP (op, 0))")))
+
+(define_address_constraint "ZD"
+  "An address operand whose address is formed by a base register and offset
+   that is suitable for use in instructions with the same addressing mode
+   as @code{preld}."
+   (match_test "loongarch_12bit_offset_address_p (op, mode)"))
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 214b14bddd3..84c1bd1c0d6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2137,6 +2137,20 @@ (define_insn "loongarch_dbar"
   ""
   "dbar\t%0")
 
+(define_insn "prefetch"
+  [(prefetch (match_operand 0 "address_operand" "ZD")
+(match_operand 1 "const_uimm5_operand" "i")
+(match_operand 2 "const_int_operand" "n"))]
+  ""
+{
+  switch (INTVAL (operands[1]))
+  {
+case 0: return "preld\t0,%a0";
+case 1: return "preld\t8,%a0";
+default: gcc_unreachable ();
+  }
+})
+
 
 
 ;; Privileged state instruction
-- 
2.37.0



Re: [PATCH v2 0/9] fortran: clobber fixes [PR41453]

2022-09-25 Thread Mikael Morin

Le 23/09/2022 à 09:54, Mikael Morin a écrit :

Le 22/09/2022 à 22:42, Harald Anlauf via Fortran a écrit :

I was wondering if you could add a test for the change in patch 7
addressing the clobber generation for an associate-name, e.g. by
adding to testcase intent_optimize_7.f90 near the end:

   associate (av => ct)
 av = 111222333
 call foo(av)
   end associate
   if (ct /= 42) stop 3

plus the adjustments in the patterns.

Indeed, I didn't add a test because there was one already, but the 
existing test hasn't the check for clobber generation and store removal.
I prefer to create a new test though, so that the patch and the test 
come together, and the test for patch 8 is not encumbered with unrelated 
stuff.


By the way, the same could be said about patch 6.
I will create a test for that one as well.


Patches pushed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=77bbf69d2981dafc2ef3e59bfbefb645d88bab9d

Changes from v2:
 - patches 6 and 7: A test for each has been added.
 - patches 8 and 9: The tests have been renumbered.
 - patches 6 and 7: The PR number used in the subject line has been
 changed, from the different regression PRs to the one optimization PR.
 - patches 5 and 8: The commit message has been modified: the commit 
the patch

 partly reverts is mentioned, and the associated PR number as well.
 - patch 7: The regression PR number this refers to has been changed.



Re: [PATCH] Avoid depending on destructor order

2022-09-25 Thread Jeff Law via Gcc-patches



On 9/25/22 00:29, Iain Sandoe wrote:



On 23 Sep 2022, at 15:30, David Edelsohn via Gcc-patches 
 wrote:

On Fri, Sep 23, 2022 at 10:12 AM Thomas Neumann  wrote:


+static const bool in_shutdown = false;

I'll let Jason or others decide if this is the right solution.  It seems
that in_shutdown also could be declared outside the #ifdef and
initialized as "false".

sure, either is fine. Moving it outside the #ifdef wastes one byte in
the executable (while the compiler can eliminate the const), but it does
not really matter.

I have verified that the patch below fixes builds for both fast-path and
non-fast-path builds. But if you prefer I will move the in_shutdown
definition instead.

Best

Thomas

PS: in_shutdown is an int here instead of a bool because non-fast-path
builds do not include stdbool. Not a good reason, of course, but I
wanted to keep the patch minimal and it makes no difference in practice.


 When using the atomic fast path deregistering can fail during
 program shutdown if the lookup structures are already destroyed.
 The assert in __deregister_frame_info_bases takes that into
 account. In the non-fast-path case however is not aware of
 program shutdown, which caused a compiler error on such platforms.
 We fix that by introducing a constant for in_shutdown in
 non-fast-path builds.

 libgcc/ChangeLog:
 * unwind-dw2-fde.c: Introduce a constant for in_shutdown
 for the non-fast-path case.

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index d237179f4ea..0bcd5061d76 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -67,6 +67,8 @@ static void
  init_object (struct object *ob);

  #else
+/* Without fast path frame deregistration must always succeed.  */
+static const int in_shutdown = 0;

  /* The unseen_objects list contains objects that have been registered
 but not yet categorized in any way.  The seen_objects list has had


Thanks for the patch.  I'll let you and Jason decide which style solution
is preferred.

This also breaks bootstrap on Darwin at least, so an early solution would be
welcome (the fix here allows bootstrap to continue, testing on-going).
thanks,


I'm using it in the automated tester as well -- without all the *-elf 
targets would fail to build libgcc.




jeff




Re: [PATCH] mips: Add appropriate linker flags when compiling with -static-pie

2022-09-25 Thread linted via Gcc-patches
Hello,
I'm just checking to see if anyone has had a chance to look at this.

Thank you

On Wed, Sep 14, 2022 at 2:09 PM linted  wrote:

> Hello,
>
> This patch fixes missing flags when compiling with -static-pie on mips. I
> made these modifications based on the previously submitted static pie patch
> for arm as well as the working code for aarch64.
>
> I tested with a host of mips-elf and checked with mips-sim. This patch was
> also tested and used with uclibc-ng to generate static pie elfs.
>
> This is my first patch for gcc, so please let me know if there is anything
> I missed.
>
>
>
> Signed-off-by: linted 
> ---
>  gcc/config/mips/gnu-user.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/mips/gnu-user.h b/gcc/config/mips/gnu-user.h
> index 6aad7192e69..b1c665b7f37 100644
> --- a/gcc/config/mips/gnu-user.h
> +++ b/gcc/config/mips/gnu-user.h
> @@ -56,11 +56,12 @@ along with GCC; see the file COPYING3.  If not see
>  #define GNU_USER_TARGET_LINK_SPEC "\
>%{G*} %{EB} %{EL} %{mips*} %{shared} \
>%{!shared: \
> -%{!static: \
> +%{!static:%{!static-pie: \
>%{rdynamic:-export-dynamic} \
>%{mabi=n32: -dynamic-linker " GNU_USER_DYNAMIC_LINKERN32 "} \
>%{mabi=64: -dynamic-linker " GNU_USER_DYNAMIC_LINKER64 "} \
> -  %{mabi=32: -dynamic-linker " GNU_USER_DYNAMIC_LINKER32 "}} \
> +  %{mabi=32: -dynamic-linker " GNU_USER_DYNAMIC_LINKER32 "}}} \
> +%{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
>  %{static}} \
>%{mabi=n32:-m" GNU_USER_LINK_EMULATIONN32 "} \
>%{mabi=64:-m" GNU_USER_LINK_EMULATION64 "} \
> --
> 2.34.1
>
>
>


[RFA] Minor improvement to coremark, avoid unconditional jump to return

2022-09-25 Thread Jeff Law

This is a minor improvement for the core_list_find routine in coremark.


Basically for riscv, and likely other targets, we can end up with an 
unconditional jump to a return statement.    This is a result of 
compensation code created by bb-reorder, and no jump optimization pass 
runs after bb-reorder to clean this stuff up.


This patch utilizes preexisting code to identify suitable branch targets 
as well as preexisting code to emit a suitable return, so it's pretty 
simple.  Note that when we arrange to do this optimization, the original 
return block may become unreachable. So we conditionally call 
delete_unreachable_blocks to fix that up.


This triggers ~160 times during an x86_64 bootstrap.  Naturally it 
bootstraps and regression tests on x86_64.


I've also bootstrapped this on riscv64, regression testing with qemu 
shows some regressions, but AFAICT they're actually qemu bugs with 
signal handling/delivery -- qemu user mode emulation is not consistently 
calling user defined signal handlers.  Given the same binary, sometimes 
they'll get called and the test passes, other times the handler isn't 
called and the test (of course) fails. I'll probably spend some time to 
try and chase this down for the sake of making testing easier.



OK for the trunk?


Jeff




commit f9a9119fa47f94348305a883fd88c23647fb1b07
Author: Jeff Law 
Date:   Sun Sep 25 12:23:59 2022 -0400

gcc/
* cfgcleanup.cc (bb_is_just_return): No longer static.
* cfgcleanup.h (bb_is_just_return): Add prototype.
* cfgrtl.cc (fixup_reorder_chain): Do not create an
unconditional jump to a return block.  Conditionally
remove unreachable blocks.

gcc/testsuite/gcc.target/riscv/

* ret-1.c: New test.

diff --git a/gcc/cfgcleanup.cc b/gcc/cfgcleanup.cc
index a8b0139bb4d..a363e0b4da3 100644
--- a/gcc/cfgcleanup.cc
+++ b/gcc/cfgcleanup.cc
@@ -2599,7 +2599,7 @@ trivially_empty_bb_p (basic_block bb)
return value.  Fill in *RET and *USE with the return and use insns
if any found, otherwise NULL.  All CLOBBERs are ignored.  */
 
-static bool
+bool
 bb_is_just_return (basic_block bb, rtx_insn **ret, rtx_insn **use)
 {
   *ret = *use = NULL;
diff --git a/gcc/cfgcleanup.h b/gcc/cfgcleanup.h
index a6d882f98a4..f1021ca835f 100644
--- a/gcc/cfgcleanup.h
+++ b/gcc/cfgcleanup.h
@@ -30,5 +30,6 @@ extern int flow_find_head_matching_sequence (basic_block, 
basic_block,
 extern bool delete_unreachable_blocks (void);
 extern void delete_dead_jumptables (void);
 extern bool cleanup_cfg (int);
+extern bool bb_is_just_return (basic_block, rtx_insn **, rtx_insn **);
 
 #endif /* GCC_CFGCLEANUP_H */
diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index a05c338a4c8..90cd6ee56a7 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -3901,6 +3901,7 @@ fixup_reorder_chain (void)
   /* Now add jumps and labels as needed to match the blocks new
  outgoing edges.  */
 
+  bool remove_unreachable_blocks = false;
   for (bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; bb ; bb = (basic_block)
bb->aux)
 {
@@ -4043,10 +4044,30 @@ fixup_reorder_chain (void)
continue;
}
 
+  /* If E_FALL->dest is just a return block, then we can emit a
+return rather than a jump to the return block.  */
+  rtx_insn *ret, *use;
+  basic_block dest;
+  if (bb_is_just_return (e_fall->dest, &ret, &use)
+ && (PATTERN (ret) == simple_return_rtx || PATTERN (ret) == ret_rtx))
+   {
+ ret_label = PATTERN (ret);
+ dest = EXIT_BLOCK_PTR_FOR_FN (cfun);
+
+ /* E_FALL->dest might become unreachable as a result of
+replacing the jump with a return.  So arrange to remove
+unreachable blocks.  */
+ remove_unreachable_blocks = true;
+   }
+  else
+   {
+ dest = e_fall->dest;
+   }
+
   /* We got here if we need to add a new jump insn. 
 Note force_nonfallthru can delete E_FALL and thus we have to
 save E_FALL->src prior to the call to force_nonfallthru.  */
-  nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
+  nb = force_nonfallthru_and_redirect (e_fall, dest, ret_label);
   if (nb)
{
  nb->aux = bb->aux;
@@ -4134,6 +4155,12 @@ fixup_reorder_chain (void)
  ei_next (&ei2);
}
   }
+
+  /* Replacing a jump with a return may have exposed an unreachable
+ block.  Conditionally remove them if such transformations were
+ made.  */
+  if (remove_unreachable_blocks)
+delete_unreachable_blocks ();
 }
 
 /* Perform sanity checks on the insn chain.
diff --git a/gcc/testsuite/gcc.target/riscv/ret-1.c 
b/gcc/testsuite/gcc.target/riscv/ret-1.c
new file mode 100644
index 000..28133aa4226
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/ret-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -dp" } */
+/* This was extracted from coremark.  */
+
+
+typedef 

Proxy ping [PATCH] Fortran: Fix ICE and wrong code for assumed-rank arrays [PR100029, PR100040]

2022-09-25 Thread Harald Anlauf via Gcc-patches
Dear all,

the patch for these PRs was submitted for review by Jose here:

  https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html

but unfortunately was never reviewed.

I verified that the rebased patch still works on mainline and
x86_64-pc-linux-gnu, and I think that it is fine.  It is also
very simple and clear, but I repost it here to give others a
chance to provide comments.

The commit message needed a small correction to make it acceptable
to "git gcc-verify", but besides some whitespace-like changes and
clarifications this is Jose's patch.

OK for mainline?

Thanks,
Harald

From b3279399bbdd04f48eab82dcc3f2b2aba5a9b0a3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Rui=20Faustino=20de=20Sousa?=
 
Date: Sun, 25 Sep 2022 22:48:55 +0200
Subject: [PATCH] Fortran: Fix ICE and wrong code for assumed-rank arrays
 [PR100029, PR100040]

gcc/fortran/ChangeLog:

	PR fortran/100040
	PR fortran/100029
	* trans-expr.cc (gfc_conv_class_to_class): Add code to have
	assumed-rank arrays recognized as full arrays and fix the type
	of the array assignment.
	(gfc_conv_procedure_call): Change order of code blocks such that
	the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
	first.

gcc/testsuite/ChangeLog:

	PR fortran/100029
	* gfortran.dg/PR100029.f90: New test.

	PR fortran/100040
	* gfortran.dg/PR100040.f90: New test.
---
 gcc/fortran/trans-expr.cc  | 48 +++---
 gcc/testsuite/gfortran.dg/PR100029.f90 | 22 
 gcc/testsuite/gfortran.dg/PR100040.f90 | 36 +++
 3 files changed, 85 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/PR100029.f90
 create mode 100644 gcc/testsuite/gfortran.dg/PR100040.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 4f3ae82d39c..1551a2e4df4 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -1178,8 +1178,10 @@ gfc_conv_class_to_class (gfc_se *parmse, gfc_expr *e, gfc_typespec class_ts,
 return;

   /* Test for FULL_ARRAY.  */
-  if (e->rank == 0 && gfc_expr_attr (e).codimension
-  && gfc_expr_attr (e).dimension)
+  if (e->rank == 0
+  && ((gfc_expr_attr (e).codimension && gfc_expr_attr (e).dimension)
+	  || (class_ts.u.derived->components->as
+	  && class_ts.u.derived->components->as->type == AS_ASSUMED_RANK)))
 full_array = true;
   else
 gfc_is_class_array_ref (e, &full_array);
@@ -1227,8 +1229,12 @@ gfc_conv_class_to_class (gfc_se *parmse, gfc_expr *e, gfc_typespec class_ts,
 	  && e->rank != class_ts.u.derived->components->as->rank)
 	{
 	  if (e->rank == 0)
-	gfc_add_modify (&parmse->post, gfc_class_data_get (parmse->expr),
-			gfc_conv_descriptor_data_get (ctree));
+	{
+	  tmp = gfc_class_data_get (parmse->expr);
+	  gfc_add_modify (&parmse->post, tmp,
+			  fold_convert (TREE_TYPE (tmp),
+	 gfc_conv_descriptor_data_get (ctree)));
+	}
 	  else
 	class_array_data_assign (&parmse->post, parmse->expr, ctree, true);
 	}
@@ -6560,23 +6566,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		base_object = build_fold_indirect_ref_loc (input_location,
 			   parmse.expr);

-		  /* A class array element needs converting back to be a
-		 class object, if the formal argument is a class object.  */
-		  if (fsym && fsym->ts.type == BT_CLASS
-			&& e->ts.type == BT_CLASS
-			&& ((CLASS_DATA (fsym)->as
-			 && CLASS_DATA (fsym)->as->type == AS_ASSUMED_RANK)
-			|| CLASS_DATA (e)->attr.dimension))
-		gfc_conv_class_to_class (&parmse, e, fsym->ts, false,
- fsym->attr.intent != INTENT_IN
- && (CLASS_DATA (fsym)->attr.class_pointer
-	 || CLASS_DATA (fsym)->attr.allocatable),
- fsym->attr.optional
- && e->expr_type == EXPR_VARIABLE
- && e->symtree->n.sym->attr.optional,
- CLASS_DATA (fsym)->attr.class_pointer
- || CLASS_DATA (fsym)->attr.allocatable);
-
 		  /* If an ALLOCATABLE dummy argument has INTENT(OUT) and is
 		 allocated on entry, it must be deallocated.  */
 		  if (fsym && fsym->attr.intent == INTENT_OUT
@@ -6637,6 +6626,23 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		  gfc_add_expr_to_block (&se->pre, tmp);
 		}

+		  /* A class array element needs converting back to be a
+		 class object, if the formal argument is a class object.  */
+		  if (fsym && fsym->ts.type == BT_CLASS
+			&& e->ts.type == BT_CLASS
+			&& ((CLASS_DATA (fsym)->as
+			 && CLASS_DATA (fsym)->as->type == AS_ASSUMED_RANK)
+			|| CLASS_DATA (e)->attr.dimension))
+		gfc_conv_class_to_class (&parmse, e, fsym->ts, false,
+ fsym->attr.intent != INTENT_IN
+ && (CLASS_DATA (fsym)->attr.class_pointer
+	 || CLASS_DATA (fsym)->attr.allocatable),
+ fsym->attr.optional
+ && e->expr_type == EXPR_VARIABLE
+ && e->symtree->n.sym->attr.optional,
+ CLASS_DATA (fsym)->attr.class_pointer
+ || CLASS_DATA (fsym)->attr.allo

[PATCH] LoongArch: Libvtv add LoongArch support.

2022-09-25 Thread Lulu Cheng
Co-Authored-By: qijingwen 

include/ChangeLog:

* vtv-change-permission.h (defined):
(VTV_PAGE_SIZE): 16k pages under loongarch64.

libvtv/ChangeLog:

* configure.tgt: Add loongarch support.
---
 include/vtv-change-permission.h | 2 ++
 libvtv/configure.tgt| 3 +++
 2 files changed, 5 insertions(+)

diff --git a/include/vtv-change-permission.h b/include/vtv-change-permission.h
index 70bdad92bca..47bcdb8057a 100644
--- a/include/vtv-change-permission.h
+++ b/include/vtv-change-permission.h
@@ -48,6 +48,8 @@ extern void __VLTChangePermission (int);
 #else 
 #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
 #define VTV_PAGE_SIZE 8192
+#elif defined(__loongarch__)
+#define VTV_PAGE_SIZE 16384
 #else
 #define VTV_PAGE_SIZE 4096
 #endif
diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
index aa2a3f675b8..6cdd1e97ab1 100644
--- a/libvtv/configure.tgt
+++ b/libvtv/configure.tgt
@@ -50,6 +50,9 @@ case "${target}" in
;;
   x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
;;
+  loongarch*-*-linux*)
+   VTV_SUPPORTED=yes
+   ;;
   *)
;;
 esac
-- 
2.31.1



[PATCH] LoongArch: Libitm add LoongArch support.

2022-09-25 Thread Lulu Cheng
Co-Authored-By: Yang Yujie 

libitm/ChangeLog:

* configure.tgt: Add loongarch support.
* config/loongarch/asm.h: New file.
* config/loongarch/sjlj.S: New file.
* config/loongarch/target.h: New file.
---
 libitm/config/loongarch/asm.h|  54 +
 libitm/config/loongarch/sjlj.S   | 127 +++
 libitm/config/loongarch/target.h |  50 
 libitm/configure.tgt |   2 +
 4 files changed, 233 insertions(+)
 create mode 100644 libitm/config/loongarch/asm.h
 create mode 100644 libitm/config/loongarch/sjlj.S
 create mode 100644 libitm/config/loongarch/target.h

diff --git a/libitm/config/loongarch/asm.h b/libitm/config/loongarch/asm.h
new file mode 100644
index 000..a8e3304bb19
--- /dev/null
+++ b/libitm/config/loongarch/asm.h
@@ -0,0 +1,54 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Loongson Co. Ltd.
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _LA_ASM_H
+#define _LA_ASM_H
+
+#if defined(__loongarch_lp64)
+#  define GPR_L ld.d
+#  define GPR_S st.d
+#  define SZ_GPR 8
+#  define ADDSP(si)   addi.d  $sp, $sp, si
+#elif defined(__loongarch64_ilp32)
+#  define GPR_L ld.w
+#  define GPR_S st.w
+#  define SZ_GPR 4
+#  define ADDSP(si)   addi.w  $sp, $sp, si
+#else
+#  error Unsupported GPR size (must be 64-bit or 32-bit).
+#endif
+
+#if defined(__loongarch_double_float)
+#  define FPR_L fld.d
+#  define FPR_S fst.d
+#  define SZ_FPR 8
+#elif defined(__loongarch_single_float)
+#  define FPR_L fld.s
+#  define FPR_S fst.s
+#  define SZ_FPR 4
+#else
+#  define SZ_FPR 0
+#endif
+
+#endif  /* _LA_ASM_H */
diff --git a/libitm/config/loongarch/sjlj.S b/libitm/config/loongarch/sjlj.S
new file mode 100644
index 000..a5f9fadde34
--- /dev/null
+++ b/libitm/config/loongarch/sjlj.S
@@ -0,0 +1,127 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Loongson Co. Ltd.
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more 
details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "asmcfi.h"
+#include "asm.h"
+
+   .text
+   .align  2
+   .global _ITM_beginTransaction
+   .type   _ITM_beginTransaction, @function
+
+_ITM_beginTransaction:
+cfi_startproc
+move   $r5, $sp
+ADDSP(-(12*SZ_GPR+8*SZ_FPR))
+cfi_adjust_cfa_offset(12*SZ_GPR+8*SZ_FPR)
+
+/* Frame Pointer */
+GPR_S  $fp, $sp, 0*SZ_GPR
+cfi_rel_offset(22, 0)
+
+/* Return Address */
+GPR_S  $r1, $sp, 1*SZ_GPR
+cfi_rel_offset(1, SZ_GPR)
+
+/* Caller's $sp */
+GPR_S  $r5, $sp, 2*SZ_GPR
+
+/* Callee-saved scratch GPRs (r23-r31) */
+GPR_S  $s0, $sp, 3*SZ_GPR
+GPR_S  $s1, $sp, 4*SZ_GPR
+GPR_S  $s2, $sp, 5*SZ_GPR
+GPR_S  $s3, $sp, 6*SZ_GPR
+GPR_S  $s4, $sp, 7*SZ_GPR
+GPR_S  $s5, $sp, 8*SZ_GPR
+GPR_S  $s6, $sp, 9*SZ_GPR
+GPR_S  $s7, $sp, 10*SZ_GPR
+GPR_S  $s8, $sp, 11*SZ_GPR
+
+#if !defined(__loongarch_soft_float)
+/* Callee-saved scratch FPRs (f24-f31) 

[PATCH] [x86] Support 2-instruction vector shuffle for V4SI/V4SF in ix86_expand_vec_perm_const_1.

2022-09-25 Thread liuhongt via Gcc-patches
>Missing space before (
Changed.
>> +  /* shufps.  */
>> +  ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
>> +   perm1, d->nelt, false);
>
>Ditto.
Changed.
>
>> +  /* When lone_idx is not 0, it must from second op(count == 1).  */
>> +  gcc_assert ((lone_idx == 0 && count == 3)
>> +   || (lone_idx != 0 && count == 1));
>
>Perhaps write it more simply as
>  gcc_assert (count == (lone_idx ? 1 : 3));
>?
Changed.
>
>> +  /* shufps.  */
>> +  ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
>> +   perm1, d->nelt, false);
>
>Missing space before (
>
Changed.
>> +  gcc_assert (ok);
>> +
>> +  /* Refine lone and pair index to original order.  */
>> +  perm1[shift] = lone_idx << 1;
>> +  perm1[shift + 1] = pair_idx << 1;
>> +
>> +  /* Select the remaining 2 elements in another vector.  */
>> +  for (i = 2 - shift; i < 4 - shift; ++i)
>> + perm1[i] = (lone_idx == 1) ? (d->perm[i] + 4) : d->perm[i];
>
>All the ()s in the above line aren't needed.
>
Changed.
>> +  /* shufps.  */
>> +  ok = expand_vselect_vconcat(d->target, tmp, d->op1,
>> +   perm1, d->nelt, false);
>
>Again, missing space
>
>Otherwise LGTM
Thanks, here's the update patch i'm going to check in.

2022-09-23  Hongtao Liu  
Liwei Xu  

gcc/ChangeLog:

PR target/53346
* config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
New function.
(ix86_expand_vec_perm_const_1): Insert
expand_vec_perm_shufps_shufps at the end of 2-instruction
expand sequence.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr53346-1.c: New test.
* gcc.target/i386/pr53346-2.c: New test.
* gcc.target/i386/pr53346-3.c: New test.
* gcc.target/i386/pr53346-4.c: New test.
---
 gcc/config/i386/i386-expand.cc| 116 ++
 gcc/testsuite/gcc.target/i386/pr53346-1.c |  70 +
 gcc/testsuite/gcc.target/i386/pr53346-2.c |  59 +++
 gcc/testsuite/gcc.target/i386/pr53346-3.c |  69 +
 gcc/testsuite/gcc.target/i386/pr53346-4.c |  59 +++
 5 files changed, 373 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-4.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5334363e235..6baff6d0e61 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -19604,6 +19604,119 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   return false;
 }
 
+/* A subroutine of ix86_expand_vec_perm_const_1. Try to implement D
+   in terms of a pair of shufps+ shufps/pshufd instructions.  */
+static bool
+expand_vec_perm_shufps_shufps (struct expand_vec_perm_d *d)
+{
+  unsigned char perm1[4];
+  machine_mode vmode = d->vmode;
+  bool ok;
+  unsigned i, j, k, count = 0;
+
+  if (d->one_operand_p
+  || (vmode != V4SImode && vmode != V4SFmode))
+return false;
+
+  if (d->testing_p)
+return true;
+
+  for (i = 0; i < 4; ++i)
+count += d->perm[i] > 3 ? 1 : 0;
+
+  gcc_assert (count & 3);
+
+  rtx tmp = gen_reg_rtx (vmode);
+  /* 2 from op0 and 2 from op1.  */
+  if (count == 2)
+{
+  unsigned char perm2[4];
+  for (i = 0, j = 0, k = 2; i < 4; ++i)
+   if (d->perm[i] & 4)
+ {
+   perm1[k++] = d->perm[i];
+   perm2[i] = k - 1;
+ }
+   else
+ {
+   perm1[j++] = d->perm[i];
+   perm2[i] = j - 1;
+ }
+
+  /* shufps.  */
+  ok = expand_vselect_vconcat (tmp, d->op0, d->op1,
+ perm1, d->nelt, false);
+  gcc_assert (ok);
+  if (vmode == V4SImode && TARGET_SSE2)
+  /* pshufd.  */
+   ok = expand_vselect (d->target, tmp,
+perm2, d->nelt, false);
+  else
+   {
+ /* shufps.  */
+ perm2[2] += 4;
+ perm2[3] += 4;
+ ok = expand_vselect_vconcat (d->target, tmp, tmp,
+  perm2, d->nelt, false);
+   }
+  gcc_assert (ok);
+}
+  /* 3 from one op and 1 from another.  */
+  else
+{
+  unsigned pair_idx = 8, lone_idx = 8, shift;
+
+  /* Find the lone index.  */
+  for (i = 0; i < 4; ++i)
+   if ((d->perm[i] > 3 && count == 1)
+   || (d->perm[i] < 4 && count == 3))
+ lone_idx = i;
+
+  /* When lone_idx is not 0, it must from second op(count == 1).  */
+  gcc_assert (count == (lone_idx ? 1 : 3));
+
+  /* Find the pair index that sits in the same half as the lone index.  */
+  shift = lone_idx & 2;
+  pair_idx = 1 - lone_idx + 2 * shift;
+
+  /* First permutate lone index and pair index into the same vector as
+[ lone, lone, pair, pair ].  */
+

[PATCH v7, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-25 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
Tests show that outputs of xs[min/max]dp are consistent with the standard
of C99 fmin/max.

  This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
of smin/max when fast-math is not set. While fast-math is set, xs[min/max]dp
are folded to MIN/MAX_EXPR in gimple, and finally expanded to smin/max.

  Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-09-26 Haochen Gui 

gcc/
PR target/103605
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Gimple
fold RS6000_BIF_XSMINDP and RS6000_BIF_XSMAXDP when fast-math is set.
* config/rs6000/rs6000.md (FMINMAX): New int iterator.
(minmax_op): New int attribute.
(UNSPEC_FMAX, UNSPEC_FMIN): New unspecs.
(f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set
pattern to fmaxdf3.
(__builtin_vsx_xsmindp): Set pattern to fmindf3.

gcc/testsuite/
PR target/103605
* gcc.dg/powerpc/pr103605.h: New.
* gcc.dg/powerpc/pr103605-1.c: New.
* gcc.dg/powerpc/pr103605-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index e925ba9fad9..944ae9fe55c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -1588,6 +1588,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   gimple_set_location (g, gimple_location (stmt));
   gsi_replace (gsi, g, true);
   return true;
+/* fold into MIN_EXPR when fast-math is set.  */
+case RS6000_BIF_XSMINDP:
 /* flavors of vec_min.  */
 case RS6000_BIF_XVMINDP:
 case RS6000_BIF_XVMINSP:
@@ -1614,6 +1616,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   gimple_set_location (g, gimple_location (stmt));
   gsi_replace (gsi, g, true);
   return true;
+/* fold into MAX_EXPR when fast-math is set.  */
+case RS6000_BIF_XSMAXDP:
 /* flavors of vec_max.  */
 case RS6000_BIF_XVMAXDP:
 case RS6000_BIF_XVMAXSP:
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f4a9f24bcc5..8b735493b40 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1613,10 +1613,10 @@
 XSCVSPDP vsx_xscvspdp {}

   const double __builtin_vsx_xsmaxdp (double, double);
-XSMAXDP smaxdf3 {}
+XSMAXDP fmaxdf3 {}

   const double __builtin_vsx_xsmindp (double, double);
-XSMINDP smindf3 {}
+XSMINDP fmindf3 {}

   const double __builtin_vsx_xsrdpi (double);
 XSRDPI vsx_xsrdpi {}
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bf85baa5370..ae0dd98f0f9 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -158,6 +158,8 @@ (define_c_enum "unspec"
UNSPEC_HASHCHK
UNSPEC_XXSPLTIDP_CONST
UNSPEC_XXSPLTIW_CONST
+   UNSPEC_FMAX
+   UNSPEC_FMIN
   ])

 ;;
@@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
   DONE;
 })

+
+(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
+
+(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
+(UNSPEC_FMIN "min")])
+
+(define_insn "f3"
+  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
+   (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
+ (match_operand:SFDF 2 "vsx_register_operand" "wa")]
+FMINMAX))]
+  "TARGET_VSX && !flag_finite_math_only"
+  "xsdp %x0,%x1,%x2"
+  [(set_attr "type" "fp")]
+)
+
 (define_expand "movcc"
[(set (match_operand:GPR 0 "gpc_reg_operand")
 (if_then_else:GPR (match_operand 1 "comparison_operator")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr103605-1.c
new file mode 100644
index 000..923deec6a1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103605-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+/* { dg-final { scan-assembler-times {\mxsmaxdp\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxsmindp\M} 3 } } */
+
+#include "pr103605.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr103605-2.c
new file mode 100644
index 000..f50fe9468f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103605-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 3 { target has_arch_pwr9 } 
} } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 3 { target has_arch_pwr9 } 
} } */
+/* { dg-final { scan-assembler-times {\mxsmaxdp\M} 3 { target { ! 
has_arch_pwr9 } } } } */
+/

RE: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone

2022-09-25 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Andrew Pinski 
> Sent: Saturday, September 24, 2022 8:57 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
> BIT_FIELD_REFs alone
> 
> On Fri, Sep 23, 2022 at 4:43 AM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This adds a match.pd rule that can fold right shifts and
> > bit_field_refs of integers into just a bit_field_ref by adjusting the
> > offset and the size of the extract and adds an extend to the previous size.
> >
> > Concretely turns:
> >
> > #include 
> >
> > unsigned int foor (uint32x4_t x)
> > {
> > return x[1] >> 16;
> > }
> >
> > which used to generate:
> >
> >   _1 = BIT_FIELD_REF ;
> >   _3 = _1 >> 16;
> >
> > into
> >
> >   _4 = BIT_FIELD_REF ;
> >   _2 = (unsigned int) _4;
> >
> > I currently limit the rewrite to only doing it if the resulting
> > extract is in a mode the target supports. i.e. it won't rewrite it to
> > extract say 13-bits because I worry that for targets that won't have a
> > bitfield extract instruction this may be a de-optimization.
> 
> It is only a de-optimization for the following case:
> * vector extraction
> 
> All other cases should be handled correctly in the middle-end when
> expanding to RTL because they need to be handled for bit-fields anyways.
> Plus SIGN_EXTRACT and ZERO_EXTRACT would be used in the integer case
> for the RTL.
> Getting SIGN_EXTRACT/ZERO_EXTRACT early on in the RTL is better than
> waiting until combine really.
> 

Fair enough, I've dropped the constraint.

> 
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Testcase are added in patch 2/2.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add bitfield and shift folding.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03
> 761
> > 544bfd499c01 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >&& ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P
> (TREE_TYPE(@0)))
> >(IFN_REDUC_PLUS_WIDEN @0)))
> >
> > +/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */ (for
> > +shift (rshift)
> > + op (plus)
> > + (simplify
> > +  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
> > +  (if (INTEGRAL_TYPE_P (type))
> > +   (with { /* Can't use wide-int here as the precision differs between
> > + @1 and @3.  */
> > +  unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> > +  unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
> > +  unsigned HOST_WIDE_INT newsize = size - shiftc;
> > +  tree nsize = wide_int_to_tree (bitsizetype, newsize);
> > +  tree ntype
> > += build_nonstandard_integer_type (newsize, 1); }
> 
> Maybe use `build_nonstandard_integer_type (newsize, /* unsignedp = */
> true);` or better yet `build_nonstandard_integer_type (newsize,
> UNSIGNED);`

Ah, will do,
Tamar.

> 
> I had started to convert some of the unsignedp into enum signop but I never
> finished or submitted the patch.
> 
> Thanks,
> Andrew Pinski
> 
> 
> > +(if (ntype)
> > + (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2
> > + @3
> > +
> >  (simplify
> >   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
> >   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4);
> > }))
> >
> >
> >
> >
> > --


[PATCH] testsuite: Fix up avx256-unaligned-store-3.c test.

2022-09-25 Thread Hu, Lin1 via Gcc-patches
Hi all,

This patch aims to fix a problem that avx256-unaligned-store-3.c test reports 
two unexpected fails under "-march=cascadelake".

Regtested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Lin

gcc/testsuite/ChangeLog:

PR target/94962
* gcc.target/i386/avx256-unaligned-store-3.c: Add -mno-avx512f
---
 gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c 
b/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
index f909099bcb1..67635fb9e66 100644
--- a/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -mtune=generic 
-fno-common" } */
+/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -mtune=generic 
-fno-common -mno-avx512f" } */
 
 #define N 1024
 
-- 
2.18.2



RE: [PATCH] testsuite: Fix up avx256-unaligned-store-3.c test.

2022-09-25 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Hu, Lin1 
> Sent: Monday, September 26, 2022 1:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] testsuite: Fix up avx256-unaligned-store-3.c test.
> 
> Hi all,
> 
> This patch aims to fix a problem that avx256-unaligned-store-3.c test reports
> two unexpected fails under "-march=cascadelake".
> 
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
> 
> BRs,
> Lin
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/94962
>   * gcc.target/i386/avx256-unaligned-store-3.c: Add -mno-avx512f
> ---
>  gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> b/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> index f909099bcb1..67635fb9e66 100644
> --- a/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> +++ b/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -
> mtune=generic -fno-common" } */
> +/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -
> mtune=generic -fno-common -mno-avx512f" } */
> 
>  #define N 1024
> 
> --
> 2.18.2



Re: [PATCH] LoongArch: Libvtv add LoongArch support.

2022-09-25 Thread Xi Ruoyao via Gcc-patches
On Mon, 2022-09-26 at 10:00 +0800, Lulu Cheng wrote:
> Co-Authored-By: qijingwen 
> 
> include/ChangeLog:
> 
> * vtv-change-permission.h (defined):
> (VTV_PAGE_SIZE): 16k pages under loongarch64.

We have 4KB, 16KB, and 64KB page configurations, so is it possible to
support all of them without too much overhead?  If not, supporting only
16KB is OK as it's the default.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] rs6000: Fix condition of define_expand vec_shr_ [PR100645]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2022/9/23 05:39, Segher Boessenkool wrote:
> Hi!
> 
> Heh, I first thought I had mistyped thgew PR #, but it is this one after
> all :-)
> 
> On Thu, Sep 22, 2022 at 09:41:34AM +0800, Kewen.Lin wrote:
>> PR100645 exposes one latent bug in define_expand vec_shr_
>> that the current condition TARGET_ALTIVEC is too loose.  The
>> mode iterator VEC_L contains a few modes, they are not always
>> supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
>> be used like some other VEC_L usages.
> 
>> --- a/gcc/config/rs6000/vector.md
>> +++ b/gcc/config/rs6000/vector.md
>> @@ -1475,7 +1475,7 @@ (define_expand "vec_shr_"
>>[(match_operand:VEC_L 0 "vlogical_operand")
>> (match_operand:VEC_L 1 "vlogical_operand")
>> (match_operand:QI 2 "reg_or_short_operand")]
>> -  "TARGET_ALTIVEC"
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr100645.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power6 -maltivec" } */
> 
> This is a strange choice: we normally do not enable VMX on p6.  Just use
> p7 instead?  There is no need for altivec_ok in any case, the -mcpu=
> guarantees it is satisfied.

Unfortunately a single power7 doesn't work for this case, since it (VSX) makes
rs6000_vector_mem[TImode] not VECTOR_NONE any more, we need one extra -mno-vsx
to reproduce this.

As you mentioned above, power6 doesn't enable altivec by default, I noticed
altivec_ok excludes some envs like aix 5.3 etc., and also ensures it's fine
to have an explicit maltivec there, so I added it for robustness.

> 
>> +/* It's to verify no ICE here.  */
> 
> "This used to ICE."?

Updated.

> 
> Please commit this now, looks good.  Thanks!
> 

Committed in r13-2844.  Thanks!

BR,
Kewen


Re: [PATCH] rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2022/9/23 06:13, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 22, 2022 at 09:41:42AM +0800, Kewen.Lin wrote:
>>  * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
>>  condition for adding REG_CFA_DEF_CFA reg note with
>>  frame_pointer_needed_indeed.
> 
>> --- a/gcc/config/rs6000/rs6000-logue.cc
>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>> @@ -4956,7 +4956,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>>   a REG_CFA_DEF_CFA note, but that's OK;  A duplicate is
>>   discarded by dwarf2cfi.cc/dwarf2out.cc, and in any case would
>>   be harmless if emitted.  */
>> -  if (frame_pointer_needed)
>> +  if (frame_pointer_needed_indeed)
>>  {
>>insn = get_last_insn ();
> 
> I thought about adding an assert here, but the very next insn gives a
> clear enough message anyway, zo it would be just noise :-)
> 
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96072.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96072.c
>> new file mode 100644
>> index 000..23d1cc74ffd
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96072.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-options "-O1" } */
>> +
>> +/* Verify there is no ICE on 32 bit environment.  */
> 
> /* This used to ICE with the SYSV ABI (PR96072).  */

Updated.

> 
> Please use -O2 if that works here.
> 

Updated too.

> Okay for trunk.  Thank you!
> 

Comitted in r13-2846, since it's a regression causing ICE, I think we want
this to backport?  Is it ok to backport this after burn-in time?

Thanks again!

BR,
Kewen


Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/9/22 22:05, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 22, 2022 at 10:28:23AM +0800, Kewen.Lin wrote:
>> on 2022/9/22 05:56, Segher Boessenkool wrote:
>>> On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
>>> In the other direction I am worried that the unspecs will degrade
>>> performance (relative to smin/smax) when -ffast-math *is* active (and
>>> this new builtin code and pattern doesn't blow up).
>>
>> For fmin/fmax it would be fine, since they are transformed to {MAX,MIN}
>> EXPR in middle end, and yes, it can degrade for the bifs, although IMHO
>> the previous expansion to smin/smax contradicts with the bif names (users
>> expect to map them to xs{min,max}dp than others).
> 
> But builtins *never* say to generate any particular instruction.  They
> say to generate code that implements certain functionality.  For many
> builtins this does of course boil down to specific instructions, but
> even then it could be optimised away completely or replace with
> something more specific if things can be folded or such.

ah, your explanation refreshed my mind, thanks!  Previously I thought the
bifs with specific mnemonic as part of their names should be used to generate
specific instructions, it's to save users' efforts using inline-asm, if
we want them to represent the generic functionality (not bind with specific),
we can use some generic names instead.  As your explanation, binding at
fast-math isn't needed, then I think Haochen's patch v7 with gimple folding
can avoid the concern on degradation at fast-math (still smax/smin), nice. :)

BR,
Kewen


Re: [PATCH 1/2] cselib: Keep track of further subvalue relations

2022-09-25 Thread Stefan Schulze Frielinghaus via Gcc-patches
Ping.

On Wed, Sep 07, 2022 at 04:20:25PM +0200, Stefan Schulze Frielinghaus wrote:
> Whenever a new cselib value is created check whether a smaller value
> exists which is contained in the bigger one.  If so add a subreg
> relation to locs of the smaller one.
> 
> gcc/ChangeLog:
> 
>   * cselib.cc (new_cselib_val): Keep track of further subvalue
>   relations.
> ---
>  gcc/cselib.cc | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/gcc/cselib.cc b/gcc/cselib.cc
> index 6a5609786fa..9b582e5d3d6 100644
> --- a/gcc/cselib.cc
> +++ b/gcc/cselib.cc
> @@ -1569,6 +1569,26 @@ new_cselib_val (unsigned int hash, machine_mode mode, 
> rtx x)
>e->locs = 0;
>e->next_containing_mem = 0;
>  
> +  scalar_int_mode int_mode;
> +  if (REG_P (x) && is_int_mode (mode, &int_mode)
> +  && REG_VALUES (REGNO (x)) != NULL
> +  && (!cselib_current_insn || !DEBUG_INSN_P (cselib_current_insn)))
> +{
> +  rtx copy = shallow_copy_rtx (x);
> +  scalar_int_mode narrow_mode_iter;
> +  FOR_EACH_MODE_UNTIL (narrow_mode_iter, int_mode)
> + {
> +   PUT_MODE_RAW (copy, narrow_mode_iter);
> +   cselib_val *v = cselib_lookup (copy, narrow_mode_iter, 0, VOIDmode);
> +   if (v)
> + {
> +   rtx sub = lowpart_subreg (narrow_mode_iter, e->val_rtx, int_mode);
> +   if (sub)
> + new_elt_loc_list (v, sub);
> + }
> + }
> +}
> +
>if (dump_file && (dump_flags & TDF_CSELIB))
>  {
>fprintf (dump_file, "cselib value %u:%u ", e->uid, hash);
> -- 
> 2.37.2
> 


Re: [PATCH 2/2] var-tracking: Add entry values up to max register mode

2022-09-25 Thread Stefan Schulze Frielinghaus via Gcc-patches
Ping.

On Wed, Sep 07, 2022 at 04:20:26PM +0200, Stefan Schulze Frielinghaus wrote:
> For parameter of type integer which do not consume a whole register
> (modulo sign/zero extension) this patch adds entry values up to maximal
> register mode.
> 
> gcc/ChangeLog:
> 
>   * var-tracking.cc (vt_add_function_parameter): Add entry values
>   up to maximal register mode.
> ---
>  gcc/var-tracking.cc | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
> index 235981d100f..9c40ec4fb8b 100644
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -9906,6 +9906,23 @@ vt_add_function_parameter (tree parm)
>VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
>   }
>   }
> +
> +   if (GET_MODE_CLASS (mode) == MODE_INT)
> + {
> +   machine_mode wider_mode_iter;
> +   FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
> + {
> +   if (!HWI_COMPUTABLE_MODE_P (wider_mode_iter))
> + break;
> +   rtx wider_reg
> + = gen_rtx_REG (wider_mode_iter, REGNO (incoming));
> +   cselib_val *wider_val
> + = cselib_lookup_from_insn (wider_reg, wider_mode_iter, 1,
> +VOIDmode, get_insns ());
> +   preserve_value (wider_val);
> +   record_entry_value (wider_val, wider_reg);
> + }
> + }
>   }
>  }
>else if (GET_CODE (incoming) == PARALLEL && !dv_onepart_p (dv))
> -- 
> 2.37.2
> 


Re: [PATCH V3] rs6000: cannot_force_const_mem for HIGH code rtx[PR106460]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/9/7 15:08, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> As the issue in PR106460, a rtx 'high:DI (symbol_ref:DI ("var_48")' is tried
> to store into constant pool and ICE occur.  But actually, this rtx represents
> partial address and can not be put into a .rodata section.
> 
> This patch updates rs6000_cannot_force_const_mem to return true for rtx(s) 
> with
> HIGH code, because these rtx(s) indicate part of address and are not ok for
> constant pool.
> 
> Below are some examples:
> (high:DI (const:DI (plus:DI (symbol_ref:DI ("xx") (const_int 12 [0xc])
> (high:DI (symbol_ref:DI ("var_1")..)))
> 
> This patch updated the previous patch, and drafted an test case which ICE
> without the patch, and assoicated with one PR.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597712.html
> This patch also updated the message for previous patch V2.
> 
> I would ask help to review this patch one more time.
> 
> Bootstrap and regtest pass on ppc64 and ppc64le.
> Is this ok for trunk.
> 
> BR,
> Jeff(Jiufu)
> 
>   PR target/106460
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Return true
>   for HIGH code rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106460.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc |  7 +--
>  gcc/testsuite/gcc.target/powerpc/pr106460.c | 11 +++
>  2 files changed, 16 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106460.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2f3146e56f8..04e3a393147 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9643,8 +9643,11 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* If GET_CODE (x) is HIGH, the 'X' represets the high part of a 
> symbol_ref.
> + It indicates partial address,  which can not be put into a constant 
> pool.
> + e.g.  (high:DI (unspec:DI [(symbol_ref/u:DI ("*.LC0")..)
> + (high:DI (symbol_ref:DI ("var")..)).  */

Nit: Maybe it's good to align these two "(high:DI ... ?

> +  if (GET_CODE (x) == HIGH)
>  return true;
>  
>/* A TLS symbol in the TOC cannot contain a sum.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106460.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> new file mode 100644
> index 000..dfaffcb6e28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> @@ -0,0 +1,11 @@

Need a power10_ok effective target here.

/* { dg-require-effective-target power10_ok } */

> +/* { dg-options "-O1 -mdejagnu-cpu=power10" } */

Nit: As Segher's review on one of my patches, O2 is preferred against O1 if it
still works for this issue.  The point is to avoid some related optimization
(routines or passes) to be disabled at O1 one day and this becomes ineffective.

BR,
Kewen




[PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-09-25 Thread Liwei Xu via Gcc-patches
This patch implemented the optimization in PR 54346, which Merges

c = VEC_PERM_EXPR ;
d = VEC_PERM_EXPR ;
to
d = VEC_PERM_EXPR ;

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
is ok to removed it.

gcc/ChangeLog:

PR target/54346
* match.pd: Merge the index of VCST then generates the new vec_perm.

gcc/testsuite/ChangeLog:

PR target/54346
* gcc.dg/pr54346.c: New test.

Co-authored-by: liuhongt 
---
 gcc/match.pd   | 41 ++
 gcc/testsuite/gcc.dg/pr54346.c | 13 +++
 2 files changed, 54 insertions(+)
 create mode 100755 gcc/testsuite/gcc.dg/pr54346.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 345bcb701a5..9219b0a10e1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8086,6 +8086,47 @@ and,
   (minus (mult (vec_perm @1 @1 @3) @2) @4)))
 
 
+/* (PR54346) Merge 
+   c = VEC_PERM_EXPR ; 
+   d = VEC_PERM_EXPR ;
+   to
+   d = VEC_PERM_EXPR ; */
+   
+(simplify
+ (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (with
+  {
+if(!TYPE_VECTOR_SUBPARTS (type).is_constant())
+  return NULL_TREE;
+
+tree op0;
+machine_mode result_mode = TYPE_MODE (type);
+machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant();
+vec_perm_builder builder0;
+vec_perm_builder builder1;
+vec_perm_builder builder2 (nelts, nelts, 1);
+
+if (!tree_to_vec_perm_builder (&builder0, @3) 
+|| !tree_to_vec_perm_builder (&builder1, @4))
+  return NULL_TREE;
+
+vec_perm_indices sel0 (builder0, 2, nelts);
+vec_perm_indices sel1 (builder1, 1, nelts);
+   
+for (int i = 0; i < nelts; i++)
+  builder2.quick_push (sel0[sel1[i].to_constant()]);
+
+vec_perm_indices sel2 (builder2, 2, nelts);
+
+if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false))
+  return NULL_TREE;
+
+op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+  }
+  (vec_perm @1 @2 { op0; })))
+
+
 /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
new file mode 100755
index 000..d87dc3a79a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54346.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
+
+void fun (veci a, veci b, veci *i)
+{
+  veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7});
+  *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 });
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */
\ No newline at end of file
-- 
2.18.2



[PATCH] LoongArch: Pass cache information to optimizer

2022-09-25 Thread Xi Ruoyao via Gcc-patches
Currently our cache information from -mtune is not really used, pass it
to the optimizer so it will be really in-effect.

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the corresponding
params for L1D cache line size, L1D cache size, and L2D cache
size.
---
 gcc/config/loongarch/loongarch.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 98c0e26cdb9..81594cf5b98 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "opts.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -6096,6 +6097,16 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
   if (loongarch_branch_cost == 0)
 loongarch_branch_cost = loongarch_cost->branch_cost;
 
+  const loongarch_cache &tune_cache =
+loongarch_cpu_cache[la_target.cpu_tune];
+
+  SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size,
+  tune_cache.l1d_line_size);
+  SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_size,
+  tune_cache.l1d_size);
+  SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size,
+  tune_cache.l2d_size);
+
   if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
 error ("%qs cannot be used for compiling a shared library",
   "-mdirect-extern-access");
-- 
2.37.0