[PATCH] tree-optimization/107867 - failed abnormal cleanup in forwprop

2022-11-27 Thread Richard Biener via Gcc-patches
The following makes sure to perform abnormal cleanup when forwprop
propagates into a call.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107867
* tree-ssa-forwprop.cc (pass_forwprop::execute): Handle
abnormal cleanup after substitution.

* g++.dg/pr107867.C: New testcase.
---
 gcc/testsuite/g++.dg/pr107867.C | 19 +++
 gcc/tree-ssa-forwprop.cc| 10 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr107867.C

diff --git a/gcc/testsuite/g++.dg/pr107867.C b/gcc/testsuite/g++.dg/pr107867.C
new file mode 100644
index 000..16c7499cf7b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr107867.C
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-options "-Os -fno-tree-ccp -Wuninitialized" }
+
+void printf(...);
+void __sigsetjmp_cancel() __attribute__((__returns_twice__));
+int z, main_ret;
+void func(void *) {}
+
+int
+main()
+{
+  int x;
+  void (*__cancel_routine)(void *)(func);
+  __sigsetjmp_cancel();
+  __cancel_routine(0);
+  if (main_ret)
+x = z;
+  printf(x);
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 429f77f199c..160e49e097e 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3367,6 +3367,7 @@ pass_forwprop::execute (function *fun)
   auto_vec to_fixup;
   auto_vec to_remove;
   to_purge = BITMAP_ALLOC (NULL);
+  bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
   for (int i = 0; i < postorder_num; ++i)
 {
   gimple_stmt_iterator gsi;
@@ -3682,6 +3683,9 @@ pass_forwprop::execute (function *fun)
  /* Mark stmt as potentially needing revisiting.  */
  gimple_set_plf (stmt, GF_PLF_1, false);
 
+ bool can_make_abnormal_goto = (is_gimple_call (stmt)
+&& stmt_can_make_abnormal_goto (stmt));
+
  /* Substitute from our lattice.  We need to do so only once.  */
  bool substituted_p = false;
  use_operand_p usep;
@@ -3700,6 +3704,10 @@ pass_forwprop::execute (function *fun)
  && is_gimple_assign (stmt)
  && gimple_assign_rhs_code (stmt) == ADDR_EXPR)
recompute_tree_invariant_for_addr_expr (gimple_assign_rhs1 (stmt));
+ if (substituted_p
+ && can_make_abnormal_goto
+ && !stmt_can_make_abnormal_goto (stmt))
+   bitmap_set_bit (need_ab_cleanup, bb->index);
 
  bool changed;
  do
@@ -3901,7 +3909,9 @@ pass_forwprop::execute (function *fun)
 }
 
   cfg_changed |= gimple_purge_all_dead_eh_edges (to_purge);
+  cfg_changed |= gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
   BITMAP_FREE (to_purge);
+  BITMAP_FREE (need_ab_cleanup);
 
   if (cfg_changed)
 todoflags |= TODO_cleanup_cfg;
-- 
2.35.3


Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-27 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches  writes:

> Hi Segher!
>
> Thanks a lot for your comments!
>
> Segher Boessenkool  writes:
>
>> Hi guys,
>>
>> On Fri, Nov 25, 2022 at 04:11:49PM +0800, Kewen.Lin wrote:
>>> on 2022/10/26 19:40, Jiufu Guo wrote:
>>> for "li/lis + oris/xoris", I interpreted it into four combinations:
>>> 
>>>li + oris, lis + oris, li + xoris, lis + xoris.
>>> 
>>> not sure just me interpreting like that, but the actual combinations
>>> which this patch adopts are:
>>> 
>>>li + oris, li + xoris, lis + xoris.
>>> 
>>> It's a bit off, but not a big deal, up to you to reword it or not.  :)
>>
>> The first two are obvious, but the last one is almost never a good idea,
>> there usually are better ways to do the same.  I cannot even think of
>> any case where this is best?  A lis;rl* is always prefered (it can
>> optimise better, be combined with other insns).
> I understant your point here.  The first two: 'li' for lowest 16bits,
> 'oris/xoris' for next 16bits.
>
> While for 'lis + xoris', it may not obvious, because both 'lis' and
> 'xoris' operates on 17-31bits.
> 'lis + xoris' is for case "32(1) || 1(0) || 15(x) || 16(0)". xoris is
> used to clean bit31.  This case seems hard to be supported by 'rlxx'.
>
> I hit to find this case when I analyze what kind of constants can be
> build by two instructions. Checked the posssible combinations:
> "addi/addis" + "neg/ori/../xoris/rldX/rlwX/../sradi/extswsli"(those
> instructions which accept one register and one immediate).
>
> I also drafted the patch to use "li/lis+rlxx" to build constant.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601277.html
>
>>
>>> > +  HOST_WIDE_INT orig_c = c;
>>
>> If you ever feel you need a variable to hold an "orig" value, that is a
>> good hint that you should restructure the code a bit, perhaps even
>> factor it.  That often is overdue (like here), not caused by you, but
>> you could help solve it ;-)
>>
>> (This is what made this patch hard to review, btw).
> You are right.  Thanks for point out this!
>>
>>> >   gen_rtx_IOR (DImode, copy_rtx (temp),
>>> >GEN_INT (ud1)));
>>> >  }
>>> > +  else if ((ud4 == 0x && ud3 == 0x)
>>> > +&& ((ud1 & 0x8000) || (ud1 == 0 && !(ud2 & 0x8000
>>> > +{
>>> > +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>>> > +
>>> > +  HOST_WIDE_INT imm = (ud1 & 0x8000) ? ((ud1 ^ 0x8000) - 0x8000)
>>> > +  : ((ud2 << 16) - 0x8000);
>>
>> We really should have some "hwi::sign_extend (ud1, 16)" helper function,
>> heh.  Maybe there already is?  Ah, "sext_hwi".  Fixing that up
>> everywhere in this function is preapproved.
> Great, thanks! 
>>
>>> > +  else
>>> > + {
>>> > +   emit_move_insn (temp,
>>> > +   GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
>>> > +   if (ud1 != 0)
>>> > + emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT (ud1)));
>>> > +   emit_move_insn (dest,
>>> > +   gen_rtx_ZERO_EXTEND (DImode,
>>> > +gen_lowpart (SImode, temp)));
>>> > + }
>>
>> Why this?  Please just write it in DImode, do not go via SImode?
> Thanks for catch this. Yes, gen_lowpart with DImode would be ok.
Oh, Sorry. DImode can not be used here.  The genreated pattern with
DImode can not be recognized.  Using SImode is to match 'rlwxx'.

BR,
Jeff (Jiufu)
>>
>>> > --- /dev/null
>>> > +++ b/gcc/testsuite/gcc.target/powerpc/pr106708.h
>>> > @@ -0,0 +1,9 @@
>>> > +/* Test constants which can be built by li/lis + oris/xoris */
>>> > +void  __attribute__ ((__noinline__, __noclone__)) foo (long long *arg)
>>> > +{
>>> > +  *arg++ = 0x98765432ULL;
>>> > +  *arg++ = 0x7cdeab55ULL;
>>> > +  *arg++ = 0x6543ULL;
>>> > +}
>>
>> Use noipa please (it is shorter, simpler, and covers more :-) )
> Thanks!
>>
>> Could you comment what exact instructions are expected?
>> li;xoris and li;xoris and lis;xoris I guess?  It helps if you just tell
>> the reader here.
> Sure, thanks!
>>
>> The li;oris and li;xoris parts look good.
>>
>
> BR,
> Jeff (Jiufu)
>>
>> Segher


[Patch] gcn: Fix __builtin_gcn_first_call_this_thread_p

2022-11-27 Thread Tobias Burnus

It turned out that cprop cleverly propagated the unspec_volatile
to the preceding (pseudo)register, permitting to remove the
'set (s0) (pseudoregister)' at -O2.  Unfortunately, it does
matter whether the assignment is done to 's2' (previously: pseudoregister)
or to s1. – Just having a hard register is not enough ...

Solution: Use USE (alias gen_rtx_USE) instead.

Additionally, I removed the s0 modification (that should lead to the unchanged 
result)
by adding 'gcn_operand_part (DImode, reg, 1)' and then working with SImode. 
Result:

  if (__builtin_gcn_first_call_this_thread_p())
x = 42;

becomes now (with -O2) the following; the builtin code is up to to (and 
including)
'.L2', the rest is the 'if' and 'x=42':

s_lshr_b32  s2, s1, 16
s_cmpk_lg_u32   s2, 12345
s_mov_b32   s12, scc
s_mov_b32   vcc_lo, scc
s_mov_b32   vcc_hi, 0
s_cbranch_vccz  .L2
s_and_b32   s2, s1, 65535   (= 0x)
s_or_b32s1, s2, 809041920 (= 0x3039 = (12345 << 16))
.L2:
s_getpc_b64 s[2:3]
s_add_u32   s2, s2, x@rel32@lo+4
s_addc_u32  s3, s3, x@rel32@hi+4
s_mov_b32   vcc_lo, s12
s_mov_b32   vcc_hi, 0
s_cbranch_vccz  .L3
s_mov_b32   s12, 42
v_writelane_b32 v0, s12, 0
s_mov_b64   exec, 1
global_store_dword  v1, v0, s[2:3]
.L3:


OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Fix __builtin_gcn_first_call_this_thread_p

Contrary naive expectation, unspec_volatile (via prologue_use) did not
prevent the cprop pass (at -O2) to remove the access to the s[0:1]
(PRIVATE_SEGMENT_BUFFER_ARG) register as the volatile got just put on
the preceeding pseudoregister.  Solution: Use gen_rtx_USE instead.
Additionally, this patch removes (gen_)prologue_use_di as it is then no
longer used.

Finally, as we already do bit manipulation, instead of using the full
64bit side - and then just keeping the value of 's0', just move directly
to use only s1 of s[0:1] and do the bit manipulations there, generating
more readable assembly code and better matching the '#else' branch.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_expand_builtin_1): Work on s1 instead
	of s[0:1] and use USE to prevent removal of setting that register.
	* config/gcn/gcn.md (prologue_use_di): Remove.

 gcc/config/gcn/gcn.cc | 16 
 gcc/config/gcn/gcn.md | 13 -
 2 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6fb261318c4..c74fa007a21 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4556,8 +4556,9 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	rtx not_first = gen_label_rtx ();
 	rtx reg = gen_rtx_REG (DImode,
 			cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
-	rtx cmp = force_reg (DImode,
- gen_rtx_LSHIFTRT (DImode, reg, GEN_INT (48)));
+	reg = gcn_operand_part (DImode, reg, 1);
+	rtx cmp = force_reg (SImode,
+ gen_rtx_LSHIFTRT (SImode, reg, GEN_INT (16)));
 	emit_insn (gen_cstoresi4 (result, gen_rtx_NE (BImode, cmp,
 			  GEN_INT(12345)),
   cmp, GEN_INT(12345)));
@@ -4565,12 +4566,11 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 			  const0_rtx),
    result));
 	emit_move_insn (reg,
-	  force_reg (DImode,
-		gen_rtx_IOR (DImode,
-			 gen_rtx_AND (DImode, reg,
-	  GEN_INT (0xL)),
-			 GEN_INT (12345L << 48;
-	emit_insn (gen_prologue_use (reg));
+	  force_reg (SImode,
+		gen_rtx_IOR (SImode,
+			 gen_rtx_AND (SImode, reg, GEN_INT (0x)),
+			 GEN_INT (12345L << 16;
+	emit_insn (gen_rtx_USE (VOIDmode, reg));
 	emit_label (not_first);
 	  }
 	return result;
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index a8b9c28d115..92e9892c4f7 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -697,19 +697,6 @@
   ""
   [(set_attr "length" "0")])
 
-(define_insn_and_split "prologue_use_di"
-  [(unspec_volatile [(match_operand:DI 0 "register_operand")] UNSPECV_PROLOGUE_USE)]
-  ""
-  "#"
-  "reload_completed"
-  [(unspec_volatile [(match_dup 0)] UNSPECV_PROLOGUE_USE)
-   (unspec_volatile [(match_dup 1)] UNSPECV_PROLOGUE_USE)]
-  {
-operands[1] = gcn_operand_part (DImode, operands[0], 1);
-operands[0] = gcn_operand_part (DImode, operands[0], 0);
-  }
-  [(set_attr "length" "0")])
-
 (define_expand "prologue"
   [(const_int 0)]
   ""


Re: [PATCH 0/2] Support HWASAN with Intel LAM

2022-11-27 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 28, 2022 at 4:35 AM Hongtao Liu  wrote:
>
> On Fri, Nov 11, 2022 at 9:26 AM liuhongt  wrote:
> >
> >   2 years ago, ARM folks support HWASAN[1] in GCC[2], and introduced several
> > target hooks(Many thanks to their work) so other backends can do similar
> > things if they have similar feature.
> >   Intel LAM(linear Address Masking)[3 Charpter 14] supports similar feature 
> > with
> > the upper bits of pointers can be used as metadata, LAM support two modes:
> >   LAM_U48:bits 48-62 can be used as metadata
> >   LAM_U57:bits 57-62 can be used as metedata.
> >
> > These 2 patches mainly support those target hooks, but HWASAN is not really
> > enabled until the final decision for the LAM kernel interface which may take
> > quite a long time. We have verified our patches with a "fake" interface 
> > locally[4], and
> > decided to push the backend patches to the GCC13 to make other HWASAN 
> > developper's work
> > easy.
> >
> > [1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> > [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557857.html
> > [3] 
> > https://www.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > [4] https://gitlab.com/x86-gcc/gcc/-/tree/users/intel/lam/master
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> I'll install 2 patches if there's no objections in next 7 days.

FYI, I have no objection.

Uros.


Re: [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-27 Thread Kewen.Lin via Gcc-patches
Add more experts in CC.

on 2022/11/23 10:54, HAO CHEN GUI wrote:
> Hi,
>   I want to enable "have_cbranchcc4" on rs6000. But not all combinations of
> comparison codes and sub CC modes are benefited to generate cbranchcc4 insns
> on rs6000. There is an predicate for operand0 of cbranchcc4 to bypass
> some combinations. It gets assertion failure in prepare_cmp_insn. I think
> we shouldn't suppose that all comparison codes and sub CC modes are supported
> and throw an assertion failure in prepare_cmp_insn. It might check the
> predicate and go to fail if the predicate can't be satisfied. This patch
> changes the behavior of those codes.
> 
>   Bootstrapped and tested on powerpc64-linux BE/LE and x86 with no 
> regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> 
> ChangeLog
> 2022-11-23  Haochen Gui 
> 
> gcc/
>   * optabs.cc (prepare_cmp_insn): Go to fail other than assert it when
>   predicate check of "cbranchcc4" operand[0] fails.
> 
> patch.diff
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 165f8d1fa22..3ec8f6b17ba 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -4484,8 +4484,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>  {
>enum insn_code icode = optab_handler (cbranch_optab, CCmode);
>test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> -  gcc_assert (icode != CODE_FOR_nothing
> -  && insn_operand_matches (icode, 0, test));
> +  gcc_assert (icode != CODE_FOR_nothing);
> +  if (!insn_operand_matches (icode, 0, test))
> + goto fail;


IMHO, this change looks to accord with the other code in prepare_cmp_insn, which
allows the preparation to fail with NULL_RTX ptest.  Its caller can make its own
decision (ICE due to unexpected, or try other ways) when ptest is null.

If this direction is sensible, maybe we can make it goto fail too if the icode 
==
CODE_FOR_nothing, since we already try to relax the restriction.

BR,
Kewen

>*ptest = test;
>return;
>  }




[PATCH][_GLIBCXX_INLINE_VERSION] Adapt to_chars/from_chars symbols

2022-11-27 Thread François Dumont via Gcc-patches

This patch is fixing those tests:

20_util/to_chars/float128_c++23.cc
std/format/formatter/requirements.cc
std/format/functions/format.cc
std/format/functions/format_to_n.cc
std/format/functions/size.cc
std/format/functions/vformat_to.cc
std/format/string.cc

Note that symbols used in  for __ibm128 and __iee128 are untested.

I even wonder if the normal mode ones are because I cannot find the 
symbols used in gnu.ver.



    libstdc++: [_GLIBCXX_INLINE_VERSION] Add to_chars/from_chars 
symbols export


    libstdc++-v3/ChangeLog

    * include/std/format [_GLIBCXX_INLINE_VERSION](to_chars): 
Adapt __asm symbol

    specifications.
    * config/abi/pre/gnu-versioned-namespace.ver: Add 
to_chars/from_chars symbols

    export.

Ok to commit ?

François
diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 06ccaa80a58..7fc81514808 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -142,6 +142,12 @@ GLIBCXX_8.0 {
 _ZN14__gnu_parallel9_Settings3getEv;
 _ZN14__gnu_parallel9_Settings3setERS0_;
 
+# to_chars/from_chars _Float128
+_ZNSt3__88to_charsEPcS0_DF128_;
+_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatE;
+_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatEi;
+_ZNSt3__810from_charsEPKcS1_RDF128_NS_12chars_formatE;
+
   local:
 *;
 };
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 23ffbdabed8..a05f3981622 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1248,27 +1248,51 @@ namespace __format
   // Make them available as std::__format::to_chars.
   to_chars_result
   to_chars(char*, char*, __ibm128) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_e");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_e");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, __ibm128, chars_format) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_eSt12chars_format");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_eNS_12chars_formatE");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, __ibm128, chars_format, int) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_eSt12chars_formati");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_eNS_12chars_formatEi");
+#  endif
 #elif __cplusplus == 202002L
   to_chars_result
   to_chars(char*, char*, __ieee128) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_u9__ieee128");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_DF128_");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, __ieee128, chars_format) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_u9__ieee128St12chars_format");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_u9__ieee128NS_12chars_formatE");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, __ieee128, chars_format, int) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_u9__ieee128St12chars_formati");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_u9__ieee128NS_12chars_formatEi");
+#  endif
 #endif
 
 #elif defined _GLIBCXX_LDOUBLE_IS_IEEE_BINARY128
@@ -1288,15 +1312,27 @@ namespace __format
   // Make them available as std::__format::to_chars.
   to_chars_result
   to_chars(char*, char*, _Float128) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_DF128_");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_DF128_");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, _Float128, chars_format) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_DF128_St12chars_format");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatE");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, _Float128, chars_format, int) noexcept
+#  if !_GLIBCXX_INLINE_VERSION
 __asm("_ZSt8to_charsPcS_DF128_St12chars_formati");
+#  else
+__asm("_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatEi");
+#  endif
 # endif
 #endif
 


[PATCH][_GLIBCXX_INLINE_VERSION] Adapt dg error messages

2022-11-27 Thread François Dumont via Gcc-patches

libstdc++: [_GLIBCXX_INLINE_VERSION] Adapt dg error messages

libstdc++-v3/ChangeLog

    * testsuite/20_util/bind/ref_neg.cc: Adapt dg-prune-output message.
    * testsuite/20_util/function/cons/70692.cc: Adapt dg-error message.

Ok to commit ?

François
diff --git a/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc b/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc
index a78935775c2..830b30eae6c 100644
--- a/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/bind/ref_neg.cc
@@ -50,7 +50,7 @@ void test02()
 
 // Ignore the reasons for deduction/substitution failure in the headers.
 // Arrange for the match to work on installed trees as well as build trees.
-// { dg-prune-output "no type named 'type' in 'struct std::result_of" }
+// { dg-prune-output "no type named 'type' in 'struct std::(__8::)?result_of" }
 
 int main()
 {
diff --git a/libstdc++-v3/testsuite/20_util/function/cons/70692.cc b/libstdc++-v3/testsuite/20_util/function/cons/70692.cc
index f9e8fe31570..b15208a2531 100644
--- a/libstdc++-v3/testsuite/20_util/function/cons/70692.cc
+++ b/libstdc++-v3/testsuite/20_util/function/cons/70692.cc
@@ -11,4 +11,4 @@ int main()
   std::function ff(f);  // { dg-error "no matching function" }
   std::function f2(f);  // { dg-error "no matching function" }
 }
-// { dg-error "std::enable_if" "" { target *-*-* } 0 }
+// { dg-error "std::(__8::)?enable_if" "" { target *-*-* } 0 }


Ping [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-27 Thread HAO CHEN GUI via Gcc-patches
Hi,
   Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607083.html
Thanks
Gui Haochen

在 2022/11/23 10:54, HAO CHEN GUI 写道:
> Hi,
>   I want to enable "have_cbranchcc4" on rs6000. But not all combinations of
> comparison codes and sub CC modes are benefited to generate cbranchcc4 insns
> on rs6000. There is an predicate for operand0 of cbranchcc4 to bypass
> some combinations. It gets assertion failure in prepare_cmp_insn. I think
> we shouldn't suppose that all comparison codes and sub CC modes are supported
> and throw an assertion failure in prepare_cmp_insn. It might check the
> predicate and go to fail if the predicate can't be satisfied. This patch
> changes the behavior of those codes.
> 
>   Bootstrapped and tested on powerpc64-linux BE/LE and x86 with no 
> regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> 
> ChangeLog
> 2022-11-23  Haochen Gui 
> 
> gcc/
>   * optabs.cc (prepare_cmp_insn): Go to fail other than assert it when
>   predicate check of "cbranchcc4" operand[0] fails.
> 
> patch.diff
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 165f8d1fa22..3ec8f6b17ba 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -4484,8 +4484,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>  {
>enum insn_code icode = optab_handler (cbranch_optab, CCmode);
>test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> -  gcc_assert (icode != CODE_FOR_nothing
> -  && insn_operand_matches (icode, 0, test));
> +  gcc_assert (icode != CODE_FOR_nothing);
> +  if (!insn_operand_matches (icode, 0, test))
> + goto fail;
>*ptest = test;
>return;
>  }


[PATCH 1/1] RISC-V: fix stack access before allocation.

2022-11-27 Thread Fei Gao
In current riscv stack frame allocation, 2 steps are used. The first step 
allocates memories at least for callee saved GPRs and FPRs, and the second step 
allocates the rest if stack size is greater than signed 12-bit range. But it's 
observed in some cases, like gcc.target/riscv/stack_frame.c in my patch, callee 
saved FPRs fail to be included in the first step allocation, so we get 
generated instructions like this:

li  t0,-16384
addisp,sp,-48
addit0,t0,752
...
fsw fs4,-4(sp) #issue here of accessing before allocation
...
add sp,sp,t0

"fswfs4,-4(sp)" has issue here of accessing stack before allocation. 
Although "add  sp,sp,t0" reserves later the memory for fs4, it exposes a risk 
when an interrupt comes in between "fsw  fs4,-4(sp)" and "addsp,sp,t0", 
resulting in a corruption in the stack storing fs4 after interrupt context 
saving and a failure to get the correct value of fs4 later.

This patch fixes issue above, adapts testcases identified in regression tests, 
and add a new testcase for the change.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_first_stack_step):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr93304.c: Adapt testcase for the change, constrain 
match to assembly instructions only.
* gcc.target/riscv/rvv/base/spill-11.c: Adapt testcase for the change.
* gcc.target/riscv/stack_frame.c: New test.
---
 gcc/config/riscv/riscv.cc |  2 +-
 gcc/testsuite/gcc.target/riscv/pr93304.c  |  2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |  3 +--
 gcc/testsuite/gcc.target/riscv/stack_frame.c  | 26 +++
 4 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack_frame.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7ec4ce97e6c..8d94765b4ba 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4903,7 +4903,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame)
 return frame_total_constant_size;
 
   HOST_WIDE_INT min_first_step =
-RISCV_STACK_ALIGN ((frame->total_size - 
frame->fp_sp_offset).to_constant());
+RISCV_STACK_ALIGN ((frame->total_size - 
frame->frame_pointer_offset).to_constant());
   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
   HOST_WIDE_INT min_second_step = frame_total_constant_size - max_first_step;
   gcc_assert (min_first_step <= max_first_step);
diff --git a/gcc/testsuite/gcc.target/riscv/pr93304.c 
b/gcc/testsuite/gcc.target/riscv/pr93304.c
index ce2dc4d6921..76975ff81c5 100644
--- a/gcc/testsuite/gcc.target/riscv/pr93304.c
+++ b/gcc/testsuite/gcc.target/riscv/pr93304.c
@@ -16,4 +16,4 @@ foo (void)
regradless of the REG_ALLOC_ORDER.
In theory, t2 should not used in such small program if regrename
not executed incorrectly, because t0-a2 should be enough.  */
-/* { dg-final { scan-assembler-not "t2" } } */
+/* { dg-final { scan-assembler-not {\t[a-zA-Z0-9]+\t.*t2} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
index c2f68b86d90..f5223491665 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
@@ -21,9 +21,8 @@ void fn3 (char*);
 ** slli\tt1,t0,1
 ** add\tsp,sp,t1
 ** li\tt0,8192
-** addi\tt0,t0,-208
+** addi\tt0,t0,-192
 ** add\tsp,sp,t0
-** addi\tsp,sp,16
 ** tail\t__riscv_restore_2
 */
 int stack_save_restore_2 (float a1, float a2, float a3, float a4,
diff --git a/gcc/testsuite/gcc.target/riscv/stack_frame.c 
b/gcc/testsuite/gcc.target/riscv/stack_frame.c
new file mode 100644
index 000..485a52d5133
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/stack_frame.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32imafc -mabi=ilp32f" } */
+char my_getchar();
+float getf();
+
+float foo()
+{
+  char volatile array[3120];
+  float volatile farray[3120];
+  float sum = 0;
+  float f1 = getf();
+  float f2 = getf();
+  float f3 = getf();
+  float f4 = getf();
+
+  for (int i = 0; i < 3120; i++)
+  {
+array[i] = my_getchar();
+farray[i] = my_getchar() * 1.2;
+sum += array[i] + farray[i] + f1 + f2 + f3 + f4;
+  }
+
+  return sum;
+}
+
+/* { dg-final { scan-assembler-not {,-[0-9]+\(sp\)} } } */
-- 
2.17.1



[PATCH 0/1] RISC-V: fix stack access before allocation.

2022-11-27 Thread Fei Gao
In current riscv stack frame allocation, 2 steps are used. The first step 
allocates memories at least for callee saved GPRs and FPRs, and the second step 
allocates the rest if stack size is greater than signed 12-bit range. But it's 
observed in some cases, like gcc.target/riscv/stack_frame.c in my patch, callee 
saved FPRs fail to be included in the first step allocation, so we get 
generated instructions like this:

li  t0,-16384
addisp,sp,-48
addit0,t0,752
...
fsw fs4,-4(sp) #issue here of accessing before allocation
...
add sp,sp,t0

"fswfs4,-4(sp)" has issue here of accessing stack before allocation. 
Although "add  sp,sp,t0" reserves later the memory for fs4, it exposes a risk 
when an interrupt comes in between "fsw  fs4,-4(sp)" and "addsp,sp,t0", 
resulting in a corruption in the stack storing fs4 after interrupt context 
saving and a failure to get the correct value of fs4 later.

This patch fixes issue above, adapts testcases identified in regression tests, 
and add a new testcase for the change.

Fei Gao (1):
  RISC-V: fix stack access before allocation.

 gcc/config/riscv/riscv.cc |  2 +-
 gcc/testsuite/gcc.target/riscv/pr93304.c  |  2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |  3 +--
 gcc/testsuite/gcc.target/riscv/stack_frame.c  | 26 +++
 4 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack_frame.c

-- 
2.17.1



[PATCH 1/1] RISC-V: fix stack access before allocation.

2022-11-27 Thread Fei Gao
In current riscv stack frame allocation, 2 steps are used. The first step 
allocates memories at least for callee saved GPRs and FPRs, and the second step 
allocates the rest if stack size is greater than signed 12-bit range. But it's 
observed in some cases, like gcc.target/riscv/stack_frame.c in my patch, callee 
saved FPRs fail to be included in the first step allocation, so we get 
generated instructions like this:

li  t0,-16384
addisp,sp,-48
addit0,t0,752
...
fsw fs4,-4(sp) #issue here of accessing before allocation
...
add sp,sp,t0

"fswfs4,-4(sp)" has issue here of accessing stack before allocation. 
Although "add  sp,sp,t0" reserves later the memory for fs4, it exposes a risk 
when an interrupt comes in between "fsw  fs4,-4(sp)" and "addsp,sp,t0", 
resulting in a corruption in the stack storing fs4 after interrupt context 
saving and a failure to get the correct value of fs4 later.

This patch fixes issue above, adapts testcases identified in regression tests, 
and add a new testcase for the change.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_first_stack_step):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr93304.c: Adapt testcase for the change, constrain 
match to assembly instructions only.
* gcc.target/riscv/rvv/base/spill-11.c: Adapt testcase for the change.
* gcc.target/riscv/stack_frame.c: New test.
---
 gcc/config/riscv/riscv.cc |  2 +-
 gcc/testsuite/gcc.target/riscv/pr93304.c  |  2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |  3 +--
 gcc/testsuite/gcc.target/riscv/stack_frame.c  | 26 +++
 4 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack_frame.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7ec4ce97e6c..8d94765b4ba 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4903,7 +4903,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame)
 return frame_total_constant_size;
 
   HOST_WIDE_INT min_first_step =
-RISCV_STACK_ALIGN ((frame->total_size - 
frame->fp_sp_offset).to_constant());
+RISCV_STACK_ALIGN ((frame->total_size - 
frame->frame_pointer_offset).to_constant());
   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
   HOST_WIDE_INT min_second_step = frame_total_constant_size - max_first_step;
   gcc_assert (min_first_step <= max_first_step);
diff --git a/gcc/testsuite/gcc.target/riscv/pr93304.c 
b/gcc/testsuite/gcc.target/riscv/pr93304.c
index ce2dc4d6921..76975ff81c5 100644
--- a/gcc/testsuite/gcc.target/riscv/pr93304.c
+++ b/gcc/testsuite/gcc.target/riscv/pr93304.c
@@ -16,4 +16,4 @@ foo (void)
regradless of the REG_ALLOC_ORDER.
In theory, t2 should not used in such small program if regrename
not executed incorrectly, because t0-a2 should be enough.  */
-/* { dg-final { scan-assembler-not "t2" } } */
+/* { dg-final { scan-assembler-not {\t[a-zA-Z0-9]+\t.*t2} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
index c2f68b86d90..f5223491665 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/spill-11.c
@@ -21,9 +21,8 @@ void fn3 (char*);
 ** slli\tt1,t0,1
 ** add\tsp,sp,t1
 ** li\tt0,8192
-** addi\tt0,t0,-208
+** addi\tt0,t0,-192
 ** add\tsp,sp,t0
-** addi\tsp,sp,16
 ** tail\t__riscv_restore_2
 */
 int stack_save_restore_2 (float a1, float a2, float a3, float a4,
diff --git a/gcc/testsuite/gcc.target/riscv/stack_frame.c 
b/gcc/testsuite/gcc.target/riscv/stack_frame.c
new file mode 100644
index 000..485a52d5133
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/stack_frame.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32imafc -mabi=ilp32f" } */
+char my_getchar();
+float getf();
+
+float foo()
+{
+  char volatile array[3120];
+  float volatile farray[3120];
+  float sum = 0;
+  float f1 = getf();
+  float f2 = getf();
+  float f3 = getf();
+  float f4 = getf();
+
+  for (int i = 0; i < 3120; i++)
+  {
+array[i] = my_getchar();
+farray[i] = my_getchar() * 1.2;
+sum += array[i] + farray[i] + f1 + f2 + f3 + f4;
+  }
+
+  return sum;
+}
+
+/* { dg-final { scan-assembler-not {,-[0-9]+\(sp\)} } } */
-- 
2.17.1



[PATCH 0/1] RISC-V: fix stack access before allocation.

2022-11-27 Thread Fei Gao
In current riscv stack frame allocation, 2 steps are used. The first step 
allocates memories at least for callee saved GPRs and FPRs, and the second step 
allocates the rest if stack size is greater than signed 12-bit range. But it's 
observed in some cases, like gcc.target/riscv/stack_frame.c in my patch, callee 
saved FPRs fail to be included in the first step allocation, so we get 
generated instructions like this:

li  t0,-16384
addisp,sp,-48
addit0,t0,752
...
fsw fs4,-4(sp) #issue here of accessing before allocation
...
add sp,sp,t0

"fswfs4,-4(sp)" has issue here of accessing stack before allocation. 
Although "add  sp,sp,t0" reserves later the memory for fs4, it exposes a risk 
when an interrupt comes in between "fsw  fs4,-4(sp)" and "addsp,sp,t0", 
resulting in a corruption in the stack storing fs4 after interrupt context 
saving and a failure to get the correct value of fs4 later.

This patch fixes issue above, adapts testcases identified in regression tests, 
and add a new testcase for the change.

Fei Gao (1):
  RISC-V: fix stack access before allocation.

 gcc/config/riscv/riscv.cc |  2 +-
 gcc/testsuite/gcc.target/riscv/pr93304.c  |  2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |  3 +--
 gcc/testsuite/gcc.target/riscv/stack_frame.c  | 26 +++
 4 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack_frame.c

-- 
2.17.1



Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-27 Thread Jiufu Guo via Gcc-patches
Hi Segher!

Thanks a lot for your comments!

Segher Boessenkool  writes:

> Hi guys,
>
> On Fri, Nov 25, 2022 at 04:11:49PM +0800, Kewen.Lin wrote:
>> on 2022/10/26 19:40, Jiufu Guo wrote:
>> for "li/lis + oris/xoris", I interpreted it into four combinations:
>> 
>>li + oris, lis + oris, li + xoris, lis + xoris.
>> 
>> not sure just me interpreting like that, but the actual combinations
>> which this patch adopts are:
>> 
>>li + oris, li + xoris, lis + xoris.
>> 
>> It's a bit off, but not a big deal, up to you to reword it or not.  :)
>
> The first two are obvious, but the last one is almost never a good idea,
> there usually are better ways to do the same.  I cannot even think of
> any case where this is best?  A lis;rl* is always prefered (it can
> optimise better, be combined with other insns).
I understant your point here.  The first two: 'li' for lowest 16bits,
'oris/xoris' for next 16bits.

While for 'lis + xoris', it may not obvious, because both 'lis' and
'xoris' operates on 17-31bits.
'lis + xoris' is for case "32(1) || 1(0) || 15(x) || 16(0)". xoris is
used to clean bit31.  This case seems hard to be supported by 'rlxx'.

I hit to find this case when I analyze what kind of constants can be
build by two instructions. Checked the posssible combinations:
"addi/addis" + "neg/ori/../xoris/rldX/rlwX/../sradi/extswsli"(those
instructions which accept one register and one immediate).

I also drafted the patch to use "li/lis+rlxx" to build constant.
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601277.html

>
>> > +  HOST_WIDE_INT orig_c = c;
>
> If you ever feel you need a variable to hold an "orig" value, that is a
> good hint that you should restructure the code a bit, perhaps even
> factor it.  That often is overdue (like here), not caused by you, but
> you could help solve it ;-)
>
> (This is what made this patch hard to review, btw).
You are right.  Thanks for point out this!
>
>> >gen_rtx_IOR (DImode, copy_rtx (temp),
>> > GEN_INT (ud1)));
>> >  }
>> > +  else if ((ud4 == 0x && ud3 == 0x)
>> > + && ((ud1 & 0x8000) || (ud1 == 0 && !(ud2 & 0x8000
>> > +{
>> > +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
>> > +
>> > +  HOST_WIDE_INT imm = (ud1 & 0x8000) ? ((ud1 ^ 0x8000) - 0x8000)
>> > +   : ((ud2 << 16) - 0x8000);
>
> We really should have some "hwi::sign_extend (ud1, 16)" helper function,
> heh.  Maybe there already is?  Ah, "sext_hwi".  Fixing that up
> everywhere in this function is preapproved.
Great, thanks! 
>
>> > +  else
>> > +  {
>> > +emit_move_insn (temp,
>> > +GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
>> > +if (ud1 != 0)
>> > +  emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT (ud1)));
>> > +emit_move_insn (dest,
>> > +gen_rtx_ZERO_EXTEND (DImode,
>> > + gen_lowpart (SImode, temp)));
>> > +  }
>
> Why this?  Please just write it in DImode, do not go via SImode?
Thanks for catch this. Yes, gen_lowpart with DImode would be ok.
>
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/powerpc/pr106708.h
>> > @@ -0,0 +1,9 @@
>> > +/* Test constants which can be built by li/lis + oris/xoris */
>> > +void  __attribute__ ((__noinline__, __noclone__)) foo (long long *arg)
>> > +{
>> > +  *arg++ = 0x98765432ULL;
>> > +  *arg++ = 0x7cdeab55ULL;
>> > +  *arg++ = 0x6543ULL;
>> > +}
>
> Use noipa please (it is shorter, simpler, and covers more :-) )
Thanks!
>
> Could you comment what exact instructions are expected?
> li;xoris and li;xoris and lis;xoris I guess?  It helps if you just tell
> the reader here.
Sure, thanks!
>
> The li;oris and li;xoris parts look good.
>

BR,
Jeff (Jiufu)
>
> Segher


Re: [PATCH 0/2] Support HWASAN with Intel LAM

2022-11-27 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 11, 2022 at 9:26 AM liuhongt  wrote:
>
>   2 years ago, ARM folks support HWASAN[1] in GCC[2], and introduced several
> target hooks(Many thanks to their work) so other backends can do similar
> things if they have similar feature.
>   Intel LAM(linear Address Masking)[3 Charpter 14] supports similar feature 
> with
> the upper bits of pointers can be used as metadata, LAM support two modes:
>   LAM_U48:bits 48-62 can be used as metadata
>   LAM_U57:bits 57-62 can be used as metedata.
>
> These 2 patches mainly support those target hooks, but HWASAN is not really
> enabled until the final decision for the LAM kernel interface which may take
> quite a long time. We have verified our patches with a "fake" interface 
> locally[4], and
> decided to push the backend patches to the GCC13 to make other HWASAN 
> developper's work
> easy.
>
> [1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557857.html
> [3] 
> https://www.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> [4] https://gitlab.com/x86-gcc/gcc/-/tree/users/intel/lam/master
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
I'll install 2 patches if there's no objections in next 7 days.
>
> liuhongt (2):
>   Implement hwasan target_hook.
>   Enable hwasan for x86-64.
>
>  gcc/config/i386/i386-expand.cc  |  12 
>  gcc/config/i386/i386-options.cc |   3 +
>  gcc/config/i386/i386-opts.h |   6 ++
>  gcc/config/i386/i386-protos.h   |   2 +
>  gcc/config/i386/i386.cc | 123 
>  gcc/config/i386/i386.opt|  16 +
>  libsanitizer/configure.tgt  |   1 +
>  7 files changed, 163 insertions(+)
>
> --
> 2.18.1
>


-- 
BR,
Hongtao


Re: [PATCH] vect: Fold LEN_{LOAD, STORE} if it's for the whole vector [PR107412]

2022-11-27 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2022/11/24 17:24, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi,
>>
>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
>> STORE} into normal vector load/store if the given length is known
>> to be equal to the length of the whole vector.  It would help to
>> improve overall cycles as normally the latency of vector access
>> with length in bytes is bigger than normal vector access, and it
>> also saves the preparation for length if constant length can not
>> be encoded into instruction (such as on power).
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>>  PR tree-optimization/107412
>>
>> gcc/ChangeLog:
>>
>>  * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>>  (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>>  mask_p indicating it's for mask or length, and add some handlings for
>>  IFN LEN_{LOAD,STORE}.
>>  (gimple_fold_mask_load): Rename to ...
>>  (gimple_fold_partial_load): ... this, add one parameter mask_p.
>>  (gimple_fold_mask_store): Rename to ...
>>  (gimple_fold_partial_store): ... this, add one parameter mask_p.
>>  (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>>  and adjust calls on gimple_fold_mask_load_store_mem_ref to
>>  gimple_fold_partial_load_store_mem_ref.
> 
> Sorry to reply to late (still catching up on email), but:
> 
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/pr107412.c: New test.
>>  * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>>  folded LEN_LOAD.
>> ---
>>  gcc/gimple-fold.cc| 57 ++-
>>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++
>>  3 files changed, 64 insertions(+), 14 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>>
>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> index a1704784bc9..e3a087defa6 100644
>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree 
>> type,
>>return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>>  }
>>
>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
>> for the memory it references, otherwise return null.  VECTYPE is the
>> -   type of the memory vector.  */
>> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
>> +   otherwise it's for LEN.  */
>>
>>  static tree
>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool 
>> mask_p)
>>  {
>>tree ptr = gimple_call_arg (call, 0);
>>tree alias_align = gimple_call_arg (call, 1);
>> -  tree mask = gimple_call_arg (call, 2);
>> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
>> +  if (!tree_fits_uhwi_p (alias_align))
>>  return NULL_TREE;
>>
>> +  if (mask_p)
>> +{
>> +  tree mask = gimple_call_arg (call, 2);
>> +  if (!integer_all_onesp (mask))
>> +return NULL_TREE;
>> +} else {
> 
> Minor nit: }, else, and { should be on separate lines.  But the thing
> I actually wanted to say was...

Thanks for catching, I must have forgotten to reformat these lines.

> 
>> +  tree basic_len = gimple_call_arg (call, 2);
>> +  if (!tree_fits_uhwi_p (basic_len))
>> +return NULL_TREE;
>> +  unsigned int nargs = gimple_call_num_args (call);
>> +  tree bias = gimple_call_arg (call, nargs - 1);
>> +  gcc_assert (tree_fits_uhwi_p (bias));
>> +  tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
>> +  unsigned int len = tree_to_uhwi (biased_len);
>> +  unsigned int vect_len
>> += GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
>> +  if (vect_len != len)
>> +return NULL_TREE;
> 
> Using "unsigned int" truncates the value.  I realise that's probably
> safe in this context, since large values have undefined behaviour.
> But it still seems better to use an untruncated type, so that it
> looks less like an oversight.  (I think this is one case where "auto"
> can help, since it gets the type right automatically.)
> 
> It would also be better to avoid the to_constant, since we haven't
> proven is_constant.  How about:
> 
>   tree basic_len = gimple_call_arg (call, 2);
>   if (!poly_int_tree_p (basic_len))
>   return NULL_TREE;
>   unsigned int nargs = gimple_call_num_args (call);
>   tree bias = gimple_call_arg (call, nargs - 1);
>   gcc_assert (TREE_CODE (bias) == INTEGER_CST);
>   if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
>   GET_MODE_SIZE (TYPE_MODE (vectype
> 

Re: [pushed][PATCH v4] LoongArch: Optimize immediate load.

2022-11-27 Thread Lulu Cheng

Pushed r13-4315.

在 2022/11/23 上午12:44, Xi Ruoyao 写道:

On Tue, 2022-11-22 at 22:03 +0800, Xi Ruoyao via Gcc-patches wrote:

While I still can't fully understand the immediate load issue and how
this patch fix it, I've tested this patch (alongside the prefetch
instruction patch) with bootstrap-ubsan.  And the compiled result of
imm-load1.c seems OK.

And it's doing correct thing for Glibc "improved generic string
functions" patch, producing some really tight loop now.


On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote:

v1 -> v2:
1. Change the code format.
2. Fix bugs in the code.

v2 -> v3:
Modifying a code implementation of an undefined behavior.

v3 -> v4:
Move the part of the immediate number decomposition from expand pass
to split
pass.

Both regression tests and spec2006 passed.

The problem mentioned in the link does not move the four immediate
load
instructions out of the loop. It has been optimized. Now, as in the
test case,
four immediate load instructions are generated outside the loop.
(
https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html
)


Because loop2_invariant pass will extract the instructions that do
not
change
in the loop out of the loop, some instructions will not meet the
extraction
conditions if the machine performs immediate decomposition while
expand pass,
so the immediate decomposition will be transferred to the split
process.

gcc/ChangeLog:

 * config/loongarch/loongarch.cc (enum
loongarch_load_imm_method):
 Remove the member METHOD_INSV that is not currently used.
 (struct loongarch_integer_op): Define a new member
curr_value,
 that records the value of the number stored in the
destination
 register immediately after the current instruction has run.
 (loongarch_build_integer): Assign a value to the curr_value
member variable.
 (loongarch_move_integer): Adds information for the immediate
load instruction.
 * config/loongarch/loongarch.md (*movdi_32bit): Redefine as
define_insn_and_split.
 (*movdi_64bit): Likewise.
 (*movsi_internal): Likewise.
 (*movhi_internal): Likewise.
 * config/loongarch/predicates.md: Return true as long as it
is
CONST_INT, ensure
 that the immediate number is not optimized by decomposition
during expand
 optimization loop.

gcc/testsuite/ChangeLog:

 * gcc.target/loongarch/imm-load.c: New test.
 * gcc.target/loongarch/imm-load1.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 62 ++--
--
-
  gcc/config/loongarch/loongarch.md | 44 +++--
  gcc/config/loongarch/predicates.md    |  2 +-
  gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++
  .../gcc.target/loongarch/imm-load1.c  | 26 
  5 files changed, 110 insertions(+), 34 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
index 8ee32c90573..9e0d6c7c3ea 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -139,22 +139,21 @@ struct loongarch_address_info
  
     METHOD_LU52I:

   Load 52-63 bit of the immediate number.
-
-   METHOD_INSV:
- immediate like 0xfff0fxxx
-   */
+*/
  enum loongarch_load_imm_method
  {
    METHOD_NORMAL,
    METHOD_LU32I,
-  METHOD_LU52I,
-  METHOD_INSV
+  METHOD_LU52I
  };
  
  struct loongarch_integer_op

  {
    enum rtx_code code;
    HOST_WIDE_INT value;
+  /* Represent the result of the immediate count of the load
instruction at
+ each step.  */
+  HOST_WIDE_INT curr_value;
    enum loongarch_load_imm_method method;
  };
  
@@ -1475,24 +1474,27 @@ loongarch_build_integer (struct

loongarch_integer_op *codes,
  {
    /* The value of the lower 32 bit be loaded with one
instruction.
  lu12i.w.  */
-  codes[0].code = UNKNOWN;
-  codes[0].method = METHOD_NORMAL;
-  codes[0].value = low_part;
+  codes[cost].code = UNKNOWN;
+  codes[cost].method = METHOD_NORMAL;
+  codes[cost].value = low_part;
+  codes[cost].curr_value = low_part;
    cost++;
  }
    else
  {
    /* lu12i.w + ior.  */
-  codes[0].code = UNKNOWN;
-  codes[0].method = METHOD_NORMAL;
-  codes[0].value = low_part & ~(IMM_REACH - 1);
+  codes[cost].code = UNKNOWN;
+  codes[cost].method = METHOD_NORMAL;
+  codes[cost].value = low_part & ~(IMM_REACH - 1);
+  codes[cost].curr_value = codes[cost].value;
    cost++;
    HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1);
    if (iorv != 0)
 {
- codes[1].code = IOR;
- codes[1].method = METHOD_NORMAL;
- codes[1].value = iorv;
+ codes[cost].code = IOR;
+ codes[cost].method = METHOD_NORMAL;
+ 

Re: [PATCH]rs6000: Load high and low part of 64bit constant independently

2022-11-27 Thread Jiufu Guo via Gcc-patches


Hi Kewen/Segher,

Thanks a lot for your review!

I updated the patch accordingly as below for message/code/testcase:

For a complicate 64bit constant, blow is one instruction-sequence to
build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
This sequence is using 2 registers to build high and low part firstly,
and then merge them.

In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
register with potential register pressure).

The instruction sequence with two registers for parallel version can be
generated only if can_create_pseudo_p.  Otherwise, the one register
sequence is generated.

Bootstrap and regtest pass on ppc64le.
Is this ok for trunk?

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate
more parallel code if can_create_pseudo_p.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/parall_5insn_const.c: New test.

---
 gcc/config/rs6000/rs6000.cc   | 45 +++
 .../gcc.target/powerpc/parall_5insn_const.c   | 27 +++
 2 files changed, 53 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index eb7ad5e954f..877b314a57a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10337,26 +10337,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 }
   else
 {
-  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
-
-  emit_move_insn (copy_rtx (temp),
- GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
-  if (ud3 != 0)
-   emit_move_insn (copy_rtx (temp),
-   gen_rtx_IOR (DImode, copy_rtx (temp),
-GEN_INT (ud3)));
+  if (can_create_pseudo_p ())
+   {
+ /* lis H,U4; ori H,U3; lis L,U2; ori L,U1; rldimi L,H,32,0.  */
+ rtx high = gen_reg_rtx (DImode);
+ rtx low = gen_reg_rtx (DImode);
+ HOST_WIDE_INT num = (ud2 << 16) | ud1;
+ rs6000_emit_set_long_const (low, (num ^ 0x8000) - 0x8000);
+ num = (ud4 << 16) | ud3;
+ rs6000_emit_set_long_const (high, (num ^ 0x8000) - 0x8000);
+ emit_insn (gen_rotldi3_insert_3 (dest, high, GEN_INT (32), low,
+  GEN_INT (0x)));
+   }
+  else
+   {
+ /* lis A,U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1.  */
+ emit_move_insn (dest,
+ GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
+ if (ud3 != 0)
+   emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3)));
 
-  emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest,
- gen_rtx_ASHIFT (DImode, copy_rtx (temp),
- GEN_INT (32)));
-  if (ud2 != 0)
-   emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
-   gen_rtx_IOR (DImode, copy_rtx (temp),
-GEN_INT (ud2 << 16)));
-  if (ud1 != 0)
-   emit_move_insn (dest,
-   gen_rtx_IOR (DImode, copy_rtx (temp),
-GEN_INT (ud1)));
+ emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32)));
+ if (ud2 != 0)
+   emit_move_insn (dest,
+   gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16)));
+ if (ud1 != 0)
+   emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1)));
+   }
 }
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
new file mode 100644
index 000..e3a9a7264cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mno-prefixed -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mori\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
+
+void __attribute__ ((noinline)) foo (unsigned long long *a)
+{
+  /* 2 lis + 2 ori + 1 rldimi for each constant.  */
+  *a++ = 0x800aabcdc167fa16ULL;
+  *a++ = 0x7543a876867f616ULL;
+}
+
+long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL};
+int
+main ()
+{
+  long long res[2];
+
+  foo (res);
+  if (__builtin_memcmp (res, A, sizeof (res)) != 0)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.17.1

BR,
Jeff (Jiufu)


"Kewen.Lin"  writes:

> Hi Segher,
>
> on 2022/11/25 23:46, Segher Boessenkool wrote:
>> Hi!
>> 
>> On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu 

[PATCH] RISC-V: Support the ins "rol" with immediate operand

2022-11-27 Thread Feng Wang
From: wangfeng 

There is no Immediate operand of ins "rol" accroding to the B-ext,
so the immediate operand should be loaded into register at first.
But we can convert it to the ins "rori" or "roriw", and then one
immediate load ins can be reduced.

Please refer to the following use cases:
unsigned long foo2(unsigned long rs1)
{
return (rs1 << 10) | (rs1 >> 54);
}

The complier result is:
li  a1,10
rol a0,a0,a1

This patch will generate one ins
rori a0,a0,54

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add immediate_operand support in rotl RTL 
pattern

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-rol-ror-04.c: New test.
* gcc.target/riscv/zbb-rol-ror-05.c: New test.
---
 gcc/config/riscv/bitmanip.md  | 36 +++
 .../gcc.target/riscv/zbb-rol-ror-04.c | 24 +
 .../gcc.target/riscv/zbb-rol-ror-05.c | 15 
 3 files changed, 69 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d17133d58c1..cddfa2a4b19 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -300,25 +300,49 @@
 (define_insn "rotlsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
(rotate:SI (match_operand:SI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" "r")))]
+  (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_ZBB"
-  "rol%~\t%0,%1,%2"
+  {
+if (immediate_operand(operands[2], QImode))
+{
+  operands[2] = GEN_INT(GET_MODE_BITSIZE (SImode) - INTVAL(operands[2]));
+   return "rori\t%0,%1,%2";
+}
+else
+  return "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "rotldi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
(rotate:DI (match_operand:DI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" "r")))]
+  (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_64BIT && TARGET_ZBB"
-  "rol\t%0,%1,%2"
+  {
+if (immediate_operand(operands[2], QImode))
+{
+  operands[2] = GEN_INT(GET_MODE_BITSIZE (DImode) - INTVAL(operands[2]));
+   return "rori\t%0,%1,%2";
+}
+else
+  return "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "rotlsi3_sext"
   [(set (match_operand:DI 0 "register_operand" "=r")
(sign_extend:DI (rotate:SI (match_operand:SI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" 
"r"]
+  (match_operand:QI 2 "arith_operand" "rI"]
   "TARGET_64BIT && TARGET_ZBB"
-  "rolw\t%0,%1,%2"
+  {
+if (immediate_operand(operands[2], QImode))
+{
+  operands[2] = GEN_INT(GET_MODE_BITSIZE (SImode) - INTVAL(operands[2]));
+   return "roriw\t%0,%1,%2";
+}
+else
+  return "rolw\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c 
b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c
new file mode 100644
index 000..23883cc3a5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64d -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-g" } } */
+
+unsigned long foo1 (unsigned long rs1)
+{ return (rs1 >> (34)) | (rs1 << 30); }
+
+unsigned long foo2(unsigned long rs1)
+{
+return (rs1 << 10) | (rs1 >> 54);
+}
+
+unsigned int foo3(unsigned int rs1)
+{
+return (rs1 >> 20) | (rs1 << 12);
+}
+
+unsigned int foo4(unsigned int rs1)
+{
+return (rs1 << 10) | (rs1 >> 22);
+}
+
+/* { dg-final { scan-assembler-times "rori\t" 2 } } */
+/* { dg-final { scan-assembler-times "roriw" 2 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c 
b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c
new file mode 100644
index 000..3e300a30b9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zbb -mabi=ilp32" } */
+/* { dg-skip-if "" { rv64-*-* } { "-g" } } */
+
+unsigned int foo1(unsigned int rs1)
+{
+return (rs1 >> 20) | (rs1 << 12);
+}
+
+unsigned int foo2(unsigned int rs1)
+{
+return (rs1 << 10) | (rs1 >> 22);
+}
+
+/* { dg-final { scan-assembler-times "rori" 2 } } */
\ No newline at end of file
-- 
2.17.1



Re: [PATCH]rs6000: Load high and low part of 64bit constant independently

2022-11-27 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/11/25 23:46, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote:
>> "Kewen.Lin"  writes:
>>> on 2022/9/15 16:30, Jiufu Guo wrote:
 For a complicate 64bit constant, blow is one instruction-sequence to
 build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

 while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
 This sequence is using 2 registers to build high and low part firstly,
 and then merge them.
 In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
 register with potential register pressure).
> 
> And crucially this patch only uses two registers if can_create_pseudo_p.
> Please mention that.
> 
* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit
constant build.
> 
> If you don't give details of what this does, just say "Update." please.
> But update to what?
> 
> "Generate more parallel code if can_create_pseudo_p." maybe?
> 
 +rtx H = gen_reg_rtx (DImode);
 +rtx L = gen_reg_rtx (DImode);
> 
> Please don't use all-uppercase variable names, those are for macros.  In
> fact, don't use uppercase in variable (and function etc.) names at all,
> unless there is a really good reason to.
> 
> Just call it "high" and "low", or "hi" and "lo", or something?
> 
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
 @@ -0,0 +1,27 @@
 +/* { dg-do run } */
 +/* { dg-options "-O2 -mdejagnu-cpu=power7  -save-temps" } */
>>>
>>> Why do we need power7 here?
>> power8/9 are also ok for this case.  Actually, O just want to
>> avoid to use new p10 instruction, like "pli", and then selected
>> an old arch option.
> 
> Why does it need _at least_ p7, is the question (as I understand it).
> 

Yeah, that's what I was intended to ask, since those insns to be scanned
don't actually require Power7 or later.

> To prohibit pli etc. you can do -mno-prefixed (which works on all older
> CPUs just as well), or skip the test if prefixed insns are enabled, or
> scan for the then generated code as well.  The first option is by far
> the simplest.

Yeah, using -mno-prefixed is perfect here, nice!

BR,
Kewen


[PATCH] Fortran: ICE with elemental and dummy argument with VALUE attribute [PR107819]

2022-11-27 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

in dependency checking of arguments of elemental prodecures
we should treat dummy arguments with the value attribute as
implicitly having intent(in).  This is simple and obvious.

The PR by Gerhard provides a series of testcases that are
either valid (like the one in the attached patch), or
arguably non-conforming.  The issue is related to the
standard prescribing a temporary (in standardese language)
for the argument with the value attribute, while the
elemental attribute prescribes an application order.

Playing with other compiler brands, there seemed to be an
obvious discrepancy between NAG and Intel on the one side
and Intel on the other.  Steve Lionel attributed this to
non-conformance for the discussed case (see link in PR).

I therefore decided to only use a conforming testcase
for the testsuite, as this is sufficient to check for
the fix for the ICE.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From c6f9d47f2e631a7ace71466ba6ec6f323d791b1b Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 27 Nov 2022 21:10:18 +0100
Subject: [PATCH] Fortran: ICE with elemental and dummy argument with VALUE
 attribute [PR107819]

gcc/fortran/ChangeLog:

	PR fortran/107819
	* trans-stmt.cc (gfc_conv_elemental_dependencies): In checking for
	elemental dependencies, treat dummy argument with VALUE attribute
	as implicitly having intent(in).

gcc/testsuite/ChangeLog:

	PR fortran/107819
	* gfortran.dg/elemental_dependency_7.f90: New test.
---
 gcc/fortran/trans-stmt.cc |  1 +
 .../gfortran.dg/elemental_dependency_7.f90| 28 +++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/elemental_dependency_7.f90

diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index fd6d294147e..b288f1f9050 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -264,6 +264,7 @@ gfc_conv_elemental_dependencies (gfc_se * se, gfc_se * loopse,
   if (e->expr_type == EXPR_VARIABLE
 	&& e->rank && fsym
 	&& fsym->attr.intent != INTENT_IN
+	&& !fsym->attr.value
 	&& gfc_check_fncall_dependency (e, fsym->attr.intent,
 	sym, arg0, check_variable))
 	{
diff --git a/gcc/testsuite/gfortran.dg/elemental_dependency_7.f90 b/gcc/testsuite/gfortran.dg/elemental_dependency_7.f90
new file mode 100644
index 000..ad45ea5271b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/elemental_dependency_7.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+! PR fortran/107819 - ICE in gfc_check_argument_var_dependency
+! Contributed by G.Steinmetz
+!
+! Note: the testcase is considered non-conforming for m>1 due to aliasing
+
+program p
+  implicit none
+  integer, parameter :: m = 1
+  integer :: i
+  integer :: a(m) = [(-i,i=1,m)]
+  integer :: n(m) = [(i,i=m,1,-1)]
+  integer :: b(m)
+  b = a
+  call s (a(n), a) ! { dg-warning "might interfere with actual argument" }
+
+  ! Compare to separate application of subroutine in element order
+  do i = 1, size (b)
+ call s (b(n(i)), b(i))
+  end do
+  if (any (a /= b)) stop 1
+contains
+  elemental subroutine s (x, y)
+integer, value   :: x
+integer, intent(out) :: y
+y = x
+  end
+end
--
2.35.3



Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare

2022-11-27 Thread Segher Boessenkool
Hi!

Whoops I missed following up to this.

On Mon, Nov 21, 2022 at 10:01:14AM +0800, Kewen.Lin wrote:
> on 2022/11/18 23:10, Segher Boessenkool wrote:
> > ge is nasty for float, it means something different with and without
> > -ffast-math (with fast-math ge means not lt, le means not gt; both can
> > be done with a simple single condition, no cror needed.  (Compare to ne
> > which is the same with and without -ffast-math, that is because it has a
> > "not" in its definition!)
> 
> It's true for scalar float comparison, but the context here is for vector
> comparison, the result of comparison is still vector (of boolean), and we
> have the corresponding vector comparison instruction for ge, so I think it
> should be fine here.

It is fine if all contexts it is used in allow ge insns, sure.  But you
have to make sure that is true; ge still is nasty, it truly means
something different with fastmath (which applies to vector float just\
the same as it does to scalar float).

> > Thanks for the explanation!
> > 
> > Can you do this in multiple steps, which will make it much easier to
> > review, and to spot the problem if some unexpected problem shows up?
> 
> Sure, I'll try my best to separate it into some steps and show how it
> evolves gradually.

If you can make the bulk of the series not actually change code
generation, just rearrange and massage the compiler code, that is much
easier to review (and it also helps to spot the problems in if there are
regressions, as a bonus).

Cheers,


Segher


Re: [Patch] OpenMP/Fortran: Permit end-clause on directive

2022-11-27 Thread Tobias Burnus

Updated patch – taking the comments below into account – and the remark
by Harald, second by Jakub. Namely:

I have now split the pre-existing nowait-2.f90 into nowait-2.f90 (with
only valid usage) and nowait-4.f90 (with the dg-error tests). In the
previous version of the patch, nowait-4.f90 was a variant of
nowait-2.f90 that used 'nowait' on the directive line. - And Harald
suggested to split the latter, which I now did – into nowait-{5,6}.f90.

Cf. Harald's email at
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600539.html and
two emails by Jakub ("Otherwise LGTM"), first at
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601304.html +
the next email in the thread.

I intent to commit the attached patch tomorrow, unless there are further
comments.

Thanks for the reviews (and I know that the follow up is very belated)!

Tobias


On 08.09.22 17:21, Jakub Jelinek via Fortran wrote:

On Fri, Aug 26, 2022 at 08:21:26PM +0200, Tobias Burnus wrote:

I did run into some issues related to this; those turned out to be
unrelated, but I end ended up implementing this feature.

Side remark: 'omp parallel workshare' seems to actually permit 'nowait'
now, but I guess that's an unintended change due to the
syntax-representation change. Hence, it is now tracked as Spec Issue
3338 and I do not permit it.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Permit end-clause on directive

gcc/fortran/ChangeLog:

 * openmp.cc (OMP_DO_CLAUSES, OMP_SCOPE_CLAUSES,
 OMP_SECTIONS_CLAUSES, OMP_SINGLE_CLAUSES): Add 'nowait'.

This doesn't describe what the patch actually does, Add 'nowait'.
is only true for the first 3, for OMP_SINGLE_CLAUSES IMHO you
want a separate
  (OMP_SINGLE_CLAUSES): Add 'nowait' and 'copyprivate'.
entry.


@@ -3855,7 +3857,7 @@ cleanup:
 | OMP_CLAUSE_ORDER | OMP_CLAUSE_ALLOCATE)
  #define OMP_SINGLE_CLAUSES \
(omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE \
-   | OMP_CLAUSE_ALLOCATE)
+   | OMP_CLAUSE_ALLOCATE | OMP_CLAUSE_NOWAIT | OMP_CLAUSE_COPYPRIVATE)
  #define OMP_ORDERED_CLAUSES \
(omp_mask (OMP_CLAUSE_THREADS) | OMP_CLAUSE_SIMD)
  #define OMP_DECLARE_TARGET_CLAUSES \
@@ -5909,13 +5915,11 @@ gfc_match_omp_teams_distribute_simd (void)
  match
  gfc_match_omp_workshare (void)
  {
-  if (gfc_match_omp_eos () != MATCH_YES)
-{
-  gfc_error ("Unexpected junk after $OMP WORKSHARE statement at %C");
-  return MATCH_ERROR;
-}
+  gfc_omp_clauses *c;
+  if (gfc_match_omp_clauses (, omp_mask (OMP_CLAUSE_NOWAIT)) != MATCH_YES)
+return MATCH_ERROR;
new_st.op = EXEC_OMP_WORKSHARE;
-  new_st.ext.omp_clauses = gfc_get_omp_clauses ();
+  new_st.ext.omp_clauses = c;
return MATCH_YES;
  }

I think it would be better to introduce OMP_WORKSHARE_CLAUSES and use
it in both gfc_match_omp_workshare and just use
   return match_omp (EXEC_OMP_WORKSHARE, OMP_WORKSHARE_CLAUSES);
?


@@ -6954,6 +6952,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   }
 break;
   case OMP_LIST_COPYPRIVATE:
+if (omp_clauses->nowait)
+  gfc_error ("NOWAIT clause must not be be used with COPYPRIVATE "

s/be be/be/

+ "clause at %L", >where);
 for (; n != NULL; n = n->next)
   {
 if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
@@ -5284,7 +5285,13 @@ parse_omp_do (gfc_statement omp_st)
if (st == omp_end_st)
  {
if (new_st.op == EXEC_OMP_END_NOWAIT)
-cp->ext.omp_clauses->nowait |= new_st.ext.omp_bool;
+{
+  if (cp->ext.omp_clauses->nowait && new_st.ext.omp_bool)
+gfc_error_now ("Duplicated NOWAIT clause on %s and %s at %C",
+   gfc_ascii_statement (omp_st),
+   gfc_ascii_statement (omp_end_st));
+  cp->ext.omp_clauses->nowait |= new_st.ext.omp_bool;
+}
else
 gcc_assert (new_st.op == EXEC_NOP);
gfc_clear_new_st ();

Not sure if the standard is clear enough that unique clauses can't be
repeated on both directive and corresponding end directive.  But let's
assume that is the case.


--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/copyprivate-2.f90
@@ -0,0 +1,69 @@
+  FUNCTION t()
+INTEGER :: a, b, t
+a = 0
+t = b
+b = 0
+!$OMP PARALLEL REDUCTION(+:b)
+  !$OMP SINGLE COPYPRIVATE (b) NOWAIT  ! { dg-error "NOWAIT clause must not be 
be used with COPYPRIVATE clause" }

Here too (several times).


+!$OMP ATOMIC WRITE
+b = 6
+  !$OMP END SINGLE
+!$OMP END PARALLEL
+t = t + b
+  END FUNCTION
+
+  FUNCTION t2()
+INTEGER :: a, b, t2
+a = 0
+t2 = b
+b = 0
+!$OMP PARALLEL REDUCTION(+:b)
+  !$OMP SINGLE NOWAIT COPYPRIVATE (b)  ! { dg-error 

Re: [PATCH] RISC-V: Add support for AIA ISA extensions (Ssaia and Smaia)

2022-11-27 Thread Christoph Müllner
On Fri, Nov 18, 2022 at 10:08 AM Christoph Müllner
 wrote:
>
>
>
> On Fri, Nov 18, 2022 at 6:09 AM Palmer Dabbelt  wrote:
>>
>> On Thu, 17 Nov 2022 18:12:23 PST (-0800), christoph.muell...@vrull.eu wrote:
>> > From: Christoph Müllner 
>> >
>> > This patch adds support for the two AIA ISA extensions Ssaia and Smaia.
>> > They are not relelvant for the compiler, but the assembler might want
>> > to validate the CSRs. Therefore, all this patch does is recognize the
>> > extension name, emit a feature macro (incl. a test).
>>
>> This is pretty far in the weeds, but the AIA PDF says
>>
>> extension Smaia encompasses all added CSRs and all modifications to
>> interrupt response behavior that the AIA specifies for a hart, over
>> all privilege levels
>>
>> but only a subset of AIA has been frozen.  I think that's fine, assuming
>> we're decoupling ourselves from the ISA strings (and thus extension
>> names).  We just need to document it somewhere -- presumably invoke, but
>> that doesn't document anything else yet so we don't really have a
>> pattern to match.
>
>
> Thanks for highlighting this!
> We could model this such that Smaia implies Ssaia.
> Since the tool's interpretation of these extensions is "availability of 
> extension's CSRs",
> this should work.
> But it is mostly irrelevant for GCC, as Binutils does the CSR checking, and 
> we need
> to model it there.
>
> I see what you mean with the "subset of AIA has been frozen".
> I would expect that the draft chapters ("Duo-PLIC" and "IOMMU Support") will
> introduce new CSRs in the future. They might get included in separate 
> extensions,
> be available only if another extension is enabled (like the hypervisor CSRs), 
> or
> they will be put into the existing Smaia and Ssaia extensions.
> The last case is problematic, as it would change the behavior of the CSR 
> checker.
> We could therefore document that the CSR checker strictly follows the latest
> specs and that changing behavior is possible for that reason.
> Not perfect, but reasonable and a method to permanently solve the recurring
> CSR discussions.

Palmer, since you did not respond since 9 days,
I tried to guess what you want to have documented and made a change in
invoke.texi:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607326.html

The Binutils patch landed already:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ac8df5a1921904b3928429e696ad8b40c612f829

>
>
>
>
>>
>>
>> > Signed-off-by: Christoph Müllner 
>> > ---
>> >  gcc/common/config/riscv/riscv-common.cc |  2 ++
>> >  gcc/testsuite/gcc.target/riscv/smaia.c  | 13 +
>> >  gcc/testsuite/gcc.target/riscv/ssaia.c  | 13 +
>> >  3 files changed, 28 insertions(+)
>> >  create mode 100644 gcc/testsuite/gcc.target/riscv/smaia.c
>> >  create mode 100644 gcc/testsuite/gcc.target/riscv/ssaia.c
>> >
>> > diff --git a/gcc/common/config/riscv/riscv-common.cc 
>> > b/gcc/common/config/riscv/riscv-common.cc
>> > index 4b7f777c103..674eded07b7 100644
>> > --- a/gcc/common/config/riscv/riscv-common.cc
>> > +++ b/gcc/common/config/riscv/riscv-common.cc
>> > @@ -219,6 +219,8 @@ static const struct riscv_ext_version 
>> > riscv_ext_version_table[] =
>> >
>> >{"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
>> >
>> > +  {"smaia", ISA_SPEC_CLASS_NONE, 1, 0},
>> > +  {"ssaia", ISA_SPEC_CLASS_NONE, 1, 0},
>> >{"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
>> >{"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
>> >
>> > diff --git a/gcc/testsuite/gcc.target/riscv/smaia.c 
>> > b/gcc/testsuite/gcc.target/riscv/smaia.c
>> > new file mode 100644
>> > index 000..9ca80236245
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/riscv/smaia.c
>> > @@ -0,0 +1,13 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-march=rv64gc_smaia" { target { rv64 } } } */
>> > +/* { dg-options "-march=rv32gc_smaia" { target { rv32 } } } */
>> > +
>> > +#ifndef __riscv_smaia
>> > +#error Feature macro not defined
>> > +#endif
>> > +
>> > +int
>> > +foo (int a)
>> > +{
>> > +  return a;
>> > +}
>> > diff --git a/gcc/testsuite/gcc.target/riscv/ssaia.c 
>> > b/gcc/testsuite/gcc.target/riscv/ssaia.c
>> > new file mode 100644
>> > index 000..b20e0eb10f5
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/riscv/ssaia.c
>> > @@ -0,0 +1,13 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-march=rv64gc_ssaia" { target { rv64 } } } */
>> > +/* { dg-options "-march=rv32gc_ssaia" { target { rv32 } } } */
>> > +
>> > +#ifndef __riscv_ssaia
>> > +#error Feature macro not defined
>> > +#endif
>> > +
>> > +int
>> > +foo (int a)
>> > +{
>> > +  return a;
>> > +}


[PATCH v2] RISC-V: Add support for AIA ISA extensions (Ssaia and Smaia)

2022-11-27 Thread Christoph Muellner
From: Christoph Müllner 

This patch adds support for the two AIA ISA extensions Ssaia and Smaia.
They are not relelvant for the compiler, but the assembler might want
to validate the CSRs. Therefore, all this patch does is recognize the
extension name, emit a feature macro (incl. a test).

Changes for v2:
- Imply "ssaia" with "smaia"
- Adding comment to invoke.texi as requested by Palmer

Signed-off-by: Christoph Müllner 
---
 gcc/common/config/riscv/riscv-common.cc |  4 
 gcc/doc/invoke.texi |  4 
 gcc/testsuite/gcc.target/riscv/smaia.c  | 18 ++
 gcc/testsuite/gcc.target/riscv/ssaia.c  | 13 +
 4 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/smaia.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/ssaia.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..a1e7d9c3787 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -108,6 +108,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zhinx", "zhinxmin"},
   {"zhinxmin", "zfinx"},
 
+  {"smaia", "ssaia"},
+
   {NULL, NULL}
 };
 
@@ -219,6 +221,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"smaia", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssaia", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 202388b3fb8..a24c6fe2499 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -28662,6 +28662,10 @@ If both @option{-march} and @option{-mcpu=} are not 
specified, the default for
 this argument is system dependent, users who want a specific architecture
 extensions should specify one explicitly.
 
+Note, that AIA support (@samp{Smaia} and @samp{Ssaia}) is based on an AIA
+specification, which is frozen but contains draft chapters ("Duo-PLIC" and
+"IOMMU Support").
+
 @item -mcpu=@var{processor-string}
 @opindex mcpu
 Use architecture of and optimize the output for the given processor, specified
diff --git a/gcc/testsuite/gcc.target/riscv/smaia.c 
b/gcc/testsuite/gcc.target/riscv/smaia.c
new file mode 100644
index 000..497b4133c22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/smaia.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_smaia" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_smaia" { target { rv32 } } } */
+
+#ifndef __riscv_smaia
+#error Feature macro not defined
+#endif
+
+// Smaia implies Ssaia
+#ifndef __riscv_ssaia
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/ssaia.c 
b/gcc/testsuite/gcc.target/riscv/ssaia.c
new file mode 100644
index 000..b20e0eb10f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/ssaia.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssaia" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_ssaia" { target { rv32 } } } */
+
+#ifndef __riscv_ssaia
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
-- 
2.38.1



Re: [PATCH v2 16/19] modula2 front end: bootstrap and documentation tools

2022-11-27 Thread Gaius Mulley via Gcc-patches
David Malcolm  writes:

>> Once modula-2 is in master I'd like to revisit rst in devel/modula-2
>> along with analyzer patches and m2 generics.  If successful then
>> submit
>> patches in early stage 1.
>
> Am I right in thinking the analyzer stuff would be an updated version
> of the work you posted here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567953.html ?

Sure, yes.

regards,
Gaius


Re: [PATCH]rs6000: Load high and low part of 64bit constant independently

2022-11-27 Thread Jiufu Guo via Gcc-patches


Hi Segher!

Thanks for your helpful comments!

Segher Boessenkool  writes:

> Hi!
>
> On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote:
>> "Kewen.Lin"  writes:
>> > on 2022/9/15 16:30, Jiufu Guo wrote:
>> >> For a complicate 64bit constant, blow is one instruction-sequence to
>> >> build:
>> >>   lis 9,0x800a
>> >>   ori 9,9,0xabcd
>> >>   sldi 9,9,32
>> >>   oris 9,9,0xc167
>> >>   ori 9,9,0xfa16
>> >> 
>> >> while we can also use below sequence to build:
>> >>   lis 9,0xc167
>> >>   lis 10,0x800a
>> >>   ori 9,9,0xfa16
>> >>   ori 10,10,0xabcd
>> >>   rldimi 9,10,32,0
>> >> This sequence is using 2 registers to build high and low part firstly,
>> >> and then merge them.
>> >> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
>> >> register with potential register pressure).
>
> And crucially this patch only uses two registers if can_create_pseudo_p.
> Please mention that.
OK.
>
>> >>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit
>> >>   constant build.
>
> If you don't give details of what this does, just say "Update." please.
> But update to what?
>
> "Generate more parallel code if can_create_pseudo_p." maybe?
Thanks.
>
>> >> +   rtx H = gen_reg_rtx (DImode);
>> >> +   rtx L = gen_reg_rtx (DImode);
>
> Please don't use all-uppercase variable names, those are for macros.  In
> fact, don't use uppercase in variable (and function etc.) names at all,
> unless there is a really good reason to.
>
> Just call it "high" and "low", or "hi" and "lo", or something?
Thanks.
>
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>> >> @@ -0,0 +1,27 @@
>> >> +/* { dg-do run } */
>> >> +/* { dg-options "-O2 -mdejagnu-cpu=power7  -save-temps" } */
>> >
>> > Why do we need power7 here?
>> power8/9 are also ok for this case.  Actually, O just want to
>> avoid to use new p10 instruction, like "pli", and then selected
>> an old arch option.
>
> Why does it need _at least_ p7, is the question (as I understand it).
>
> To prohibit pli etc. you can do -mno-prefixed (which works on all older
> CPUs just as well), or skip the test if prefixed insns are enabled, or
> scan for the then generated code as well.  The first option is by far
> the simplest.
Thanks for your suggestion! -mno-prefixed is great, it meets the
intention here.

>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>> @@ -0,1 +1,27 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8  -save-temps" } */
>> +/* { dg-require-effective-target has_arch_ppc64 } */
>
> p8 here makes no sense, we also want good and correct code generated for
> older CPUs, so we should not preevent testing on those.  For that reason
> you shouldn't use -mcpu= when not needed.  Like here :-)
Yeap, thanks!

BR,
Jeff (Jiufu)


[RFA] src-release.sh: Fix gdb source tarball build failure due to libsframe

2022-11-27 Thread Joel Brobecker via Gcc-patches
Hello,

This script was recently changed as follow:

| commit e619dddb3a45780ae66d762756882a3b896b617d
| Date:   Tue Nov 15 15:07:13 2022 -0800
| Subject: src-release.sh: Add libsframe
|
| Add libsframe to the list of top level directories that will be included
| in a release.

Since then, the gdb source tarball has been failing with the error
below during the "make configure-host configure-target" phase:

| make[3]: *** No rule to make target '../libsframe/libsframe.la',
| needed by 'libbfd.la'.  Stop.
| make[3]: Leaving directory '/tmp/gdb-public/bfd'

This patch fixes the issue by adding libsframe to the list of
GDB_SUPPORT_DIRS, similar to what was done for BINUTILS.

ChangeLog:

* src-release.sh (GDB_SUPPORT_DIRS): Add libsframe.

Ok to apply to master?

NB: Once approved, I will take care of applying the patch to both
binutils-gdb and gcc (hence the Cc: of gcc-patches).

Thank you,
---
 src-release.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src-release.sh b/src-release.sh
index 050a8eb958d..ec28f8691c7 100755
--- a/src-release.sh
+++ b/src-release.sh
@@ -322,7 +322,7 @@ gas_release()
 tar_compress $package $tool "$GAS_SUPPORT_DIRS" "$compressors"
 }
 
-GDB_SUPPORT_DIRS="bfd include libiberty libctf opcodes readline sim intl 
libdecnumber cpu zlib contrib gnulib gdbsupport gdbserver libbacktrace"
+GDB_SUPPORT_DIRS="libsframe bfd include libiberty libctf opcodes readline sim 
intl libdecnumber cpu zlib contrib gnulib gdbsupport gdbserver libbacktrace"
 gdb_release()
 {
 compressors=$1
-- 
2.34.1



Re: [PATCH] rtl: add predicates for addition, subtraction & multiplication

2022-11-27 Thread Xi Ruoyao via Gcc-patches
On Sat, 2022-11-26 at 21:16 -0500, Charlie Sale via Gcc-patches wrote:
> This is my first contribution to GCC :) one of the beginner projects
> suggested on the website was to add and use RTL type predicates. I
> added predicates for addition, subtraction and multiplication. I also
> went through and used them in the code.
> 
> I did not add tests because I'm not addding/modifying any behavior.
> All existings tests did pass.
> 
> Like I said, this is my first patch. Please let me know if I did
> anything wrong or if there's anything I can improve for next time.
> 
> Signed-off-by: Charlie Sale 
> ---
>  gcc/ChangeLog    |  43 +++
>  gcc/alias.cc |  30 +++--
>  gcc/auto-inc-dec.cc  |  11 +-
>  gcc/calls.cc |   8 +-
>  gcc/cfgexpand.cc |  16 +--
>  gcc/combine-stack-adj.cc |  39 +++
>  gcc/combine.cc   | 241 +--
>  gcc/compare-elim.cc  |   3 +-
>  gcc/cse.cc   |  66 +--
>  gcc/cselib.cc    |  37 +++---
>  gcc/dce.cc   |   4 +-
>  gcc/dwarf2cfi.cc |   2 +-
>  gcc/dwarf2out.cc |  11 +-
>  gcc/emit-rtl.cc  |   6 +-
>  gcc/explow.cc    |  31 ++---
>  gcc/expr.cc  |  23 ++--
>  gcc/final.cc |  20 ++--
>  gcc/function.cc  |   7 +-
>  gcc/fwprop.cc    |   2 +-
>  gcc/haifa-sched.cc   |  10 +-
>  gcc/ifcvt.cc |  11 +-
>  gcc/ira.cc   |   6 +-
>  gcc/loop-doloop.cc   |  70 ++--
>  gcc/loop-iv.cc   |  21 +---
>  gcc/lra-constraints.cc   |  34 +++---
>  gcc/lra-eliminations.cc  |  25 ++--
>  gcc/lra.cc   |   6 +-
>  gcc/modulo-sched.cc  |   2 +-
>  gcc/postreload.cc    |  25 ++--
>  gcc/reginfo.cc   |  12 +-
>  gcc/reload.cc    | 180 +
>  gcc/reload1.cc   |  85 ++
>  gcc/reorg.cc |  12 +-
>  gcc/rtl.cc   |   3 +-
>  gcc/rtl.h    |  11 ++
>  gcc/rtlanal.cc   |  25 ++--
>  gcc/sched-deps.cc    |   8 +-
>  gcc/simplify-rtx.cc  | 143 +--
>  gcc/var-tracking.cc  |  37 +++---
>  39 files changed, 595 insertions(+), 731 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog

Do not edit this file.  It's automatically generated from Git commit
log.  Write the ChangeLog entries in the commit message instead.
 
/* snip */

> - return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
> - op0, XEXP (x, 1));
> + return simplify_gen_binary (GET_CODE (x), GET_MODE (x), op0,
> + XEXP (x, 1));

Don't update things irrelevant to your topic stealthy.

/* snip */

> -  && (t = get_reg_known_value (REGNO (XEXP (src, 
> 0
> +  && (t = get_reg_known_value (
> +    REGNO (XEXP (src, 0

Likewise.

/* snip */

> - op0 = expand_debug_expr (TREE_OPERAND (TREE_OPERAND (exp, 0),
> -    0));
> + op0 = expand_debug_expr (TREE_OPERAND (TREE_OPERAND (exp, 0), 
> 0));

Likewise.

/* snip */

> - else if (dest == stack_pointer_rtx
> -  && REG_P (src)
> + else if (dest == stack_pointer_rtx && REG_P (src)

Likewise, and GCC has its own coding style which is not "what looks
better in your opinion".

I'll stop reading the patch here because it's too long and there is
already too much stealth changes.

Try to break the patch into a series of multiple small patches because
it's difficult to review a very large diff.

Do not randomly modify coding style.  Even if you have a good reason to
make style changes, you should submit them as a separate patch (or
separate patch series) because mixing style changes and real code
changes makes it difficult to review the code.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Ping: [PATCH] d: Update __FreeBSD_version values [PR107469]

2022-11-27 Thread Lorenzo Salvadore via Gcc-patches
Hello,

Ping https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605685.html

I would like to remind that Gerald Pfeifer already volunteered to commit this 
patch
when it is approved. However the patch has not been approved yet.

Thanks,

Lorenzo  Salvadore

--- Original Message ---
On Friday, November 11th, 2022 at 12:07 AM, Lorenzo Salvadore 
 wrote:

> Update __FreeBSD_version values for the latest FreeBSD supported
> versions. In particular, add __FreeBSD_version for FreeBSD 14, which is
> necessary to compile libphobos successfully on FreeBSD 14.
> 
> The patch has already been applied successfully in the official FreeBSD
> ports tree for the ports lang/gcc11 and lang/gcc11-devel. Please see the
> following commits:
> 
> https://cgit.freebsd.org/ports/commit/?id=f61fb49b2e76fd4f7a5b7a11510b5109206c19f2
> https://cgit.freebsd.org/ports/commit/?id=57936dba89ea208e5dbc1bd2d7fda3d29a1838b3
> 
> libphobos/ChangeLog:
> 
> 2022-11-10 Lorenzo Salvadore develo...@lorenzosalvadore.it
> 
> 
> PR d/107469.
> * libdruntime/core/sys/freebsd/config.d: Update __FreeBSD_version.
> 
> ---
> libphobos/libdruntime/core/sys/freebsd/config.d | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libphobos/libdruntime/core/sys/freebsd/config.d 
> b/libphobos/libdruntime/core/sys/freebsd/config.d
> index 5e3129e2422..9d502e52e32 100644
> --- a/libphobos/libdruntime/core/sys/freebsd/config.d
> +++ b/libphobos/libdruntime/core/sys/freebsd/config.d
> @@ -14,8 +14,9 @@ public import core.sys.posix.config;
> // NOTE: When adding newer versions of FreeBSD, verify all current versioned
> // bindings are still compatible with the release.
> 
> - version (FreeBSD_13) enum __FreeBSD_version = 130;
> -else version (FreeBSD_12) enum __FreeBSD_version = 1202000;
> + version (FreeBSD_14) enum __FreeBSD_version = 140;
> +else version (FreeBSD_13) enum __FreeBSD_version = 1301000;
> +else version (FreeBSD_12) enum __FreeBSD_version = 1203000;
> else version (FreeBSD_11) enum __FreeBSD_version = 1104000;
> else version (FreeBSD_10) enum __FreeBSD_version = 1004000;
> else version (FreeBSD_9) enum __FreeBSD_version = 903000;
> --
> 2.38.0