Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2023-01-11 Thread Robin Dapp via Gcc-patches
Hi,
 
> On optimizing for speed, default_noce_conversion_profitable_p() allows
> plenty of headroom, so this patch has little impact.
> 
> Also, if the target-specific cost estimate is accurate or allows for
> margins, the impact should be similarly small.
I believe this part of ifcvt does/did not use the costing on purpose.
It will generally convert more sequences than other paths that compare
before and after costs since we just count the number of converted
insns comparing them against the "branch costs".  Similar to rtx costs
they are kind of relative to a single insn but AFAIK it's not used
consistently everywhere.  All the major platforms have low branch costs
nowadays (0 or 1?) thus we won't emit too many conditional moves here.

In general I agree that we should compare costs everywhere and not just
count (the costing should include the branch costs as well) but this would
be a major overhaul.  For your case (assuming xtensa), could you not
tune xtensa_branch_cost?  It is currently 3 allowing up to 4 conditional
moves to be generated.  optimize_function_for_speed_p is already being
passed to the hook so you could make use of that and decrease branch
costs when optimizing for size only.

Regards
 Robin


Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread NightStrike via Gcc-patches
On Fri, Jan 6, 2023 at 4:56 AM Jakub Jelinek via Gcc-patches
 wrote:
> --- gcc/testsuite/gcc.dg/pr108308.c.jj  2023-01-06 10:43:45.793009294 +0100
> +++ gcc/testsuite/gcc.dg/pr108308.c 2023-01-06 10:43:40.218090375 +0100
> @@ -0,0 +1,39 @@
> +/* PR target/108308 */
> +/* { dg-do run { target { ilp32 || lp64 } } } */

This test passes on Windows, and I don't see anything in the test that
jumps out at me as being affected by storing pointers in longs.  Is
there something I'm missing about why this would be disabled on LLP64?


Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 11, 2023 at 03:58:40AM -0500, NightStrike wrote:
> On Fri, Jan 6, 2023 at 4:56 AM Jakub Jelinek via Gcc-patches
>  wrote:
> > --- gcc/testsuite/gcc.dg/pr108308.c.jj  2023-01-06 10:43:45.793009294 +0100
> > +++ gcc/testsuite/gcc.dg/pr108308.c 2023-01-06 10:43:40.218090375 +0100
> > @@ -0,0 +1,39 @@
> > +/* PR target/108308 */
> > +/* { dg-do run { target { ilp32 || lp64 } } } */
> 
> This test passes on Windows, and I don't see anything in the test that
> jumps out at me as being affected by storing pointers in longs.  Is
> there something I'm missing about why this would be disabled on LLP64?

Maybe the test just needs int32, it didn't look important enough to me.
ilp32 || lp64 covers most of important targets.

Jakub



[PATCH] fortran: Fix up function types for realloc and sincos{,f,l} builtins [PR108349]

2023-01-11 Thread Jakub Jelinek via Gcc-patches
Hi!

As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the
Fortran FE is wrong since r0-100026-gb64fca63690ad which changed
-  tmp = tree_cons (NULL_TREE, pvoid_type_node, void_list_node);
-  tmp = tree_cons (NULL_TREE, size_type_node, tmp);
-  ftype = build_function_type (pvoid_type_node, tmp);
+  ftype = build_function_type_list (pvoid_type_node,
+size_type_node, pvoid_type_node,
+NULL_TREE);
   gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,
  "realloc", false);
The return type is correct, void *, but the first argument should be
void * too and only second one size_t, while the above change changed
realloc to be void *__builtin_realloc (size_t, void *);
I went through all other changes from that commit and found that
__builtin_sincos{,f,l} got broken as well, instead of the former
void __builtin_sincos{,f,l} (ftype, ftype *, ftype *);
where ftype is {double,float,long double} it is now incorrectly
void __builtin_sincos{,f,l} (ftype *, ftype *);

The following patch fixes that, plus some formatting issues around
the spots I've changed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-11  Jakub Jelinek  

PR fortran/108349
* f95-lang.cc (gfc_init_builtin_function): Fix up function types
for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}.  Formatting fixes.

--- gcc/fortran/f95-lang.cc.jj  2022-11-15 22:57:18.247210671 +0100
+++ gcc/fortran/f95-lang.cc 2023-01-10 11:31:43.787266346 +0100
@@ -714,31 +714,34 @@ gfc_init_builtin_functions (void)
 float_type_node, NULL_TREE);
 
   func_cdouble_double = build_function_type_list (double_type_node,
-  complex_double_type_node,
-  NULL_TREE);
+ complex_double_type_node,
+ NULL_TREE);
 
   func_double_cdouble = build_function_type_list (complex_double_type_node,
-  double_type_node, NULL_TREE);
+ double_type_node, NULL_TREE);
 
-  func_clongdouble_longdouble =
-build_function_type_list (long_double_type_node,
-  complex_long_double_type_node, NULL_TREE);
-
-  func_longdouble_clongdouble =
-build_function_type_list (complex_long_double_type_node,
-  long_double_type_node, NULL_TREE);
+  func_clongdouble_longdouble
+= build_function_type_list (long_double_type_node,
+   complex_long_double_type_node, NULL_TREE);
+
+  func_longdouble_clongdouble
+= build_function_type_list (complex_long_double_type_node,
+   long_double_type_node, NULL_TREE);
 
   ptype = build_pointer_type (float_type_node);
-  func_float_floatp_floatp =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_float_floatp_floatp
+= build_function_type_list (void_type_node, float_type_node, ptype, ptype,
+   NULL_TREE);
 
   ptype = build_pointer_type (double_type_node);
-  func_double_doublep_doublep =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_double_doublep_doublep
+= build_function_type_list (void_type_node, double_type_node, ptype,
+   ptype, NULL_TREE);
 
   ptype = build_pointer_type (long_double_type_node);
-  func_longdouble_longdoublep_longdoublep =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_longdouble_longdoublep_longdoublep
+= build_function_type_list (void_type_node, long_double_type_node, ptype,
+   ptype, NULL_TREE);
 
 /* Non-math builtins are defined manually, so they're not included here.  */
 #define OTHER_BUILTIN(ID,NAME,TYPE,CONST)
@@ -992,9 +995,8 @@ gfc_init_builtin_functions (void)
  "calloc", ATTR_NOTHROW_LEAF_MALLOC_LIST);
   DECL_IS_MALLOC (builtin_decl_explicit (BUILT_IN_CALLOC)) = 1;
 
-  ftype = build_function_type_list (pvoid_type_node,
-size_type_node, pvoid_type_node,
-NULL_TREE);
+  ftype = build_function_type_list (pvoid_type_node, pvoid_type_node,
+   size_type_node, NULL_TREE);
   gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,
  "realloc", ATTR_NOTHROW_LEAF_LIST);
 

Jakub



Re: [PATCH] fortran: Fix up function types for realloc and sincos{,f,l} builtins [PR108349]

2023-01-11 Thread Tobias Burnus

Hi,

On 11.01.23 10:18, Jakub Jelinek via Gcc-patches wrote:

As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the
Fortran FE is wrong since r0-100026-gb64fca63690ad [...]
I went through all other changes from that commit and found that
__builtin_sincos{,f,l} got broken as well, [...]

The following patch fixes that, plus some formatting issues around
the spots I've changed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK. Thanks for the patch!

Tobias


2023-01-11  Jakub Jelinek  

  PR fortran/108349
  * f95-lang.cc (gfc_init_builtin_function): Fix up function types
  for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}.  Formatting fixes.

--- gcc/fortran/f95-lang.cc.jj2022-11-15 22:57:18.247210671 +0100
+++ gcc/fortran/f95-lang.cc   2023-01-10 11:31:43.787266346 +0100
@@ -714,31 +714,34 @@ gfc_init_builtin_functions (void)
  float_type_node, NULL_TREE);

func_cdouble_double = build_function_type_list (double_type_node,
-  complex_double_type_node,
-  NULL_TREE);
+   complex_double_type_node,
+   NULL_TREE);

func_double_cdouble = build_function_type_list (complex_double_type_node,
-  double_type_node, NULL_TREE);
+   double_type_node, NULL_TREE);

-  func_clongdouble_longdouble =
-build_function_type_list (long_double_type_node,
-  complex_long_double_type_node, NULL_TREE);
-
-  func_longdouble_clongdouble =
-build_function_type_list (complex_long_double_type_node,
-  long_double_type_node, NULL_TREE);
+  func_clongdouble_longdouble
+= build_function_type_list (long_double_type_node,
+ complex_long_double_type_node, NULL_TREE);
+
+  func_longdouble_clongdouble
+= build_function_type_list (complex_long_double_type_node,
+ long_double_type_node, NULL_TREE);

ptype = build_pointer_type (float_type_node);
-  func_float_floatp_floatp =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_float_floatp_floatp
+= build_function_type_list (void_type_node, float_type_node, ptype, ptype,
+ NULL_TREE);

ptype = build_pointer_type (double_type_node);
-  func_double_doublep_doublep =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_double_doublep_doublep
+= build_function_type_list (void_type_node, double_type_node, ptype,
+ ptype, NULL_TREE);

ptype = build_pointer_type (long_double_type_node);
-  func_longdouble_longdoublep_longdoublep =
-build_function_type_list (void_type_node, ptype, ptype, NULL_TREE);
+  func_longdouble_longdoublep_longdoublep
+= build_function_type_list (void_type_node, long_double_type_node, ptype,
+ ptype, NULL_TREE);

  /* Non-math builtins are defined manually, so they're not included here.  */
  #define OTHER_BUILTIN(ID,NAME,TYPE,CONST)
@@ -992,9 +995,8 @@ gfc_init_builtin_functions (void)
"calloc", ATTR_NOTHROW_LEAF_MALLOC_LIST);
DECL_IS_MALLOC (builtin_decl_explicit (BUILT_IN_CALLOC)) = 1;

-  ftype = build_function_type_list (pvoid_type_node,
-size_type_node, pvoid_type_node,
-NULL_TREE);
+  ftype = build_function_type_list (pvoid_type_node, pvoid_type_node,
+ size_type_node, NULL_TREE);
gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,
"realloc", ATTR_NOTHROW_LEAF_LIST);


  Jakub


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread NightStrike via Gcc-patches
On Wed, Jan 11, 2023 at 4:07 AM Jakub Jelinek  wrote:
>
> On Wed, Jan 11, 2023 at 03:58:40AM -0500, NightStrike wrote:
> > On Fri, Jan 6, 2023 at 4:56 AM Jakub Jelinek via Gcc-patches
> >  wrote:
> > > --- gcc/testsuite/gcc.dg/pr108308.c.jj  2023-01-06 10:43:45.793009294 
> > > +0100
> > > +++ gcc/testsuite/gcc.dg/pr108308.c 2023-01-06 10:43:40.218090375 
> > > +0100
> > > @@ -0,0 +1,39 @@
> > > +/* PR target/108308 */
> > > +/* { dg-do run { target { ilp32 || lp64 } } } */
> >
> > This test passes on Windows, and I don't see anything in the test that
> > jumps out at me as being affected by storing pointers in longs.  Is
> > there something I'm missing about why this would be disabled on LLP64?
>
> Maybe the test just needs int32, it didn't look important enough to me.
> ilp32 || lp64 covers most of important targets.

Could you change to int32plus, then?

-/* { dg-do run { target { ilp32 || lp64 } } } */
+/* { dg-do run { target { int32plus } } } */

Windows is still a secondary platform, so it'd be nice to keep as many
tests working (and supported) as possible.  I don't know what
qualifies as "important targets", but this is an easy win (pun
intended!)


Re: [x86 PATCH] PR rtl-optimization/107991: peephole2 to tweak register allocation.

2023-01-11 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 10, 2023 at 4:01 PM Roger Sayle  wrote:
>
>
> Hi Richard and Uros,
> I believe I've managed to reduce a minimal test case that exhibits the
> underlying
> problem with reload.   The following snippet when compiled on x86-64 with
> -O2:
>
> void ext(int x);
> void foo(int x, int y) { ext(y - x); }
>
> produces the following 5 instructions prior to reload:
> insn 13: r86:SI=di:SI   // REG_DEAD di:SI
> insn 14: r87:SI=si:SI   // REG_READ si:SI
> insn 7: {r85:SI=r87:SI-r86:SI;clobber flags:CC;}// REG_DEAD r86:SI,
> r87:SI
> insn 8: di:SI=r85:SI// REG_READ r85:SI
> insn 9: call [`ext'] argc:0
>
> Hence there are three pseudos (allocnos) to be register allocated; r85, r86
> & r87.
>
> Currently, reload produces the following assignments/colouring using 3 hard
> regs.
> r85 in di
> r86 in ax
> r87 in si
>
> A better (optimal) register allocation requires only 2 hard regs.
> r85 in di
> r86 in si
> r87 in di
>
> Fortunately, this over-allocation is cleaned up later (during
> cprop_hardreg), but
> as pointed out by Uros, there's little benefit in reducing register pressure
> this
> late (after peephole2).
>
> As far as I understand it, Richard's patch to handle fully-tied destinations
> looks
> very reasonable (and is impressively tested/benchmarked):
> https://gcc.gnu.org/pipermail/gcc-patches/2019-September/530743.html
> but in the prototypical 0:"=r", 1:"0", 2:"r" constraint case, as used in the
> problematic subsi3_1 pattern (of insn 7), I'm trying to figure out why r85
> and r87 don't get allocated to the same register [given the local spilling
> of non-eliminable hard regs in insn 7, temporarily introducing a new pseudo
> r89].
>
> In closing, reload is a complex piece of code that's shared between a large
> number of backends; if Richard's patch is a win "statistically", then it's
> not unreasonable to use a peephole2 to clean-up/catch the corner cases
> on class_likely_spilled_p targets [indeed many of the peephole2s in i386.md
> tidy up register allocation issues], and such a "specialized" fix is more
> suitable
> for stage 3, than a potentially disruptive tweak to reload.  At worst, the
> peephole2 becomes dead if/when the problem is fixed upstream.
>
> Or put another way, if reload worked perfectly, i386.md wouldn't need
> many of the peephole2s that it currently has.  Oh, for such an ideal world.

I have benchmarked the new peephole a bit and during the build of
linux kernel and during the whole gcc bootstrap, it didn't trigger
even once. It looks to me that the compiler produces the problematic
sequence only for specially crafted testcases, when argument setup is
involved. These testcases expose a minor annoyance with the reload
(which IMO should be fixed in the reload and not papered over with a
peephole).

Technically, the pattern is OK, but it really doesn't bring much to
the table. OTOH, the pattern is simple enough that it won't hurt if we
have another specialized pattern in the .md file. I'll leave the
decision to you.

Uros.


Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 11, 2023 at 04:27:11AM -0500, NightStrike wrote:
> On Wed, Jan 11, 2023 at 4:07 AM Jakub Jelinek  wrote:
> >
> > On Wed, Jan 11, 2023 at 03:58:40AM -0500, NightStrike wrote:
> > > On Fri, Jan 6, 2023 at 4:56 AM Jakub Jelinek via Gcc-patches
> > >  wrote:
> > > > --- gcc/testsuite/gcc.dg/pr108308.c.jj  2023-01-06 10:43:45.793009294 
> > > > +0100
> > > > +++ gcc/testsuite/gcc.dg/pr108308.c 2023-01-06 10:43:40.218090375 
> > > > +0100
> > > > @@ -0,0 +1,39 @@
> > > > +/* PR target/108308 */
> > > > +/* { dg-do run { target { ilp32 || lp64 } } } */
> > >
> > > This test passes on Windows, and I don't see anything in the test that
> > > jumps out at me as being affected by storing pointers in longs.  Is
> > > there something I'm missing about why this would be disabled on LLP64?
> >
> > Maybe the test just needs int32, it didn't look important enough to me.
> > ilp32 || lp64 covers most of important targets.
> 
> Could you change to int32plus, then?

I think int32plus would be wrong, the testcase has some overlarge constant
and I doubt it would work correctly on the hypothetical target with 64-bit
ints where the overlarge constant would fit into int.

Jakub



[PATCH] c++: Avoid some false positive -Wfloat-conversion warnings with extended precision [PR108285]

2023-01-11 Thread Jakub Jelinek via Gcc-patches
Hi!

On the following testcase trunk emits a false positive warning on ia32.
convert_like_internal is there called with type of double and
expr EXCESS_PRECISION_EXPR with float type with long double operand
2.L * (long double) x.
Now, for the code generation we do the right thing, cp_convert
to double from that 2.L * (long double) x, but we call even
cp_convert_and_check with that and that emits the -Wfloat-conversion
warning.  Looking at what the C FE does in this case, it calls
convert_and_check with the EXCESS_PRECISION_EXPR expression rather
than its operand, and essentially uses the operand for code generation
and EXCESS_PRECISION_EXPR itself for warnings.

The following patch does that too for the C++ FE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-11  Jakub Jelinek  

PR c++/108285
* cvt.cc (cp_convert_and_check): For EXCESS_PRECISION_EXPR
use its operand except that for warning purposes use the original
EXCESS_PRECISION_EXPR.
* call.cc (convert_like_internal): Only look through
EXCESS_PRECISION_EXPR when calling cp_convert, not when calling
cp_convert_and_check.

* g++.dg/warn/pr108285.C: New test.

--- gcc/cp/cvt.cc.jj2022-10-14 09:32:32.403797521 +0200
+++ gcc/cp/cvt.cc   2023-01-10 13:53:00.639130717 +0100
@@ -652,8 +652,10 @@ cp_convert (tree type, tree expr, tsubst
 tree
 cp_convert_and_check (tree type, tree expr, tsubst_flags_t complain)
 {
-  tree result;
+  tree result, expr_for_warning = expr;
 
+  if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
+expr = TREE_OPERAND (expr, 0);
   if (TREE_TYPE (expr) == type)
 return expr;
   if (expr == error_mark_node)
@@ -663,7 +665,7 @@ cp_convert_and_check (tree type, tree ex
   if ((complain & tf_warning)
   && c_inhibit_evaluation_warnings == 0)
 {
-  tree folded = cp_fully_fold (expr);
+  tree folded = cp_fully_fold (expr_for_warning);
   tree folded_result;
   if (folded == expr)
folded_result = result;
--- gcc/cp/call.cc.jj   2023-01-09 23:41:11.135159084 +0100
+++ gcc/cp/call.cc  2023-01-10 13:50:09.277640628 +0100
@@ -8863,12 +8863,14 @@ convert_like_internal (conversion *convs
 return error_mark_node;
 
   warning_sentinel w (warn_zero_as_null_pointer_constant);
-  if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
-expr = TREE_OPERAND (expr, 0);
   if (issue_conversion_warnings)
 expr = cp_convert_and_check (totype, expr, complain);
   else
-expr = cp_convert (totype, expr, complain);
+{
+  if (TREE_CODE (expr) == EXCESS_PRECISION_EXPR)
+   expr = TREE_OPERAND (expr, 0);
+  expr = cp_convert (totype, expr, complain);
+}
 
   return expr;
 }
--- gcc/testsuite/g++.dg/warn/pr108285.C.jj 2023-01-10 16:52:06.115345345 
+0100
+++ gcc/testsuite/g++.dg/warn/pr108285.C2023-01-10 16:39:26.646532929 
+0100
@@ -0,0 +1,11 @@
+// PR c++/108285
+// { dg-do compile }
+// { dg-options "-fexcess-precision=standard -Wfloat-conversion" }
+
+void bar (double);
+
+void
+foo (float x)
+{
+  bar (2 * x); // { dg-bogus "conversion from '\[^\n\r]\*' to 'double' may 
change value" }
+}

Jakub



Re: [PATCH 9/15] arm: Set again stack pointer as CFA reg when popping if necessary

2023-01-11 Thread Andrea Corallo via Gcc-patches
Richard Earnshaw  writes:

> On 09/01/2023 16:48, Richard Earnshaw via Gcc-patches wrote:
>> On 09/01/2023 14:58, Andrea Corallo via Gcc-patches wrote:
>>> Andrea Corallo via Gcc-patches  writes:
>>>
 Richard Earnshaw  writes:

> On 27/09/2022 16:24, Kyrylo Tkachov via Gcc-patches wrote:
>>
>>> -Original Message-
>>> From: Andrea Corallo 
>>> Sent: Tuesday, September 27, 2022 11:06 AM
>>> To: Kyrylo Tkachov 
>>> Cc: Andrea Corallo via Gcc-patches ; Richard
>>> Earnshaw ; nd 
>>> Subject: Re: [PATCH 9/15] arm: Set again stack pointer as CFA
>>> reg when
>>> popping if necessary
>>>
>>> Kyrylo Tkachov  writes:
>>>
 Hi Andrea,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Andrea
> Corallo via Gcc-patches
> Sent: Friday, August 12, 2022 4:34 PM
> To: Andrea Corallo via Gcc-patches 
> Cc: Richard Earnshaw ; nd 
> Subject: [PATCH 9/15] arm: Set again stack pointer as CFA reg when
>>> popping
> if necessary
>
> Hi all,
>
> this patch enables 'arm_emit_multi_reg_pop' to set again the stack
> pointer as CFA reg when popping if this is necessary.
>

   From what I can tell from similar functions this is correct,
 but could you
>>> elaborate on why this change is needed for my understanding please?
 Thanks,
 Kyrill
>>>
>>> Hi Kyrill,
>>>
>>> sure, if the frame pointer was set, than it is the current CFA
>>> register.
>>> If we request to adjust the current CFA register offset indicating it
>>> being SP (while it's actually FP) that is indeed not correct and the
>>> incoherence we will be detected by an assertion in the dwarf emission
>>> machinery.
>> Thanks,  the patch is ok
>> Kyrill
>>
>>>
>>> Best Regards
>>>
>>>     Andrea
>
> Hmm, wait.  Why would a multi-reg pop be updating the stack pointer?

 Hi Richard,

 not sure I understand, isn't any pop updating SP by definition?
>>>
>>>
>>> Back on this,
>>>
>>> compiling:
>>>
>>> ===
>>> int i;
>>>
>>> void foo (int);
>>>
>>> int bar()
>>> {
>>>    foo (i);
>>>    return 0;
>>> }
>>> ===
>>>
>>> With -march=armv8.1-m.main+fp -mbranch-protection=pac-ret+leaf
>>> -mthumb -O0 -g
>>>
>>> Produces the following asm for bar.
>>>
>>> bar:
>>> @ args = 0, pretend = 0, frame = 0
>>> @ frame_needed = 1, uses_anonymous_args = 0
>>> pac    ip, lr, sp
>>> push    {r3, r7, ip, lr}
>>> add    r7, sp, #0
>>> ldr    r3, .L3
>>> ldr    r3, [r3]
>>> mov    r0, r3
>>> bl    foo
>>> movs    r3, #0
>>> mov    r0, r3
>>> pop    {r3, r7, ip, lr}
>>> aut    ip, lr, sp
>>> bx    lr
>>>
>>> The offending instruction causing the ICE (without this patch) when
>>> emitting dwarf is "pop {r3, r7, ip, lr}".
>>>
>>> The current CFA reg when emitting the multipop is R7 (the frame
>>> pointer).  If is not the multipop that has the duty to restore SP as
>>> current CFA here which other instruction should do it?
>>>
>> Digging a bit deeper, I'm now even more confused. 
>> arm_expand_epilogue contains (parphrasing the code):
>>   if frame_pointer_needed
>>     {
>>   if arm
>>     {}
>>   else
>>     {
>>   if adjust
>>     r7 += adjust
>>   mov sp, r7    // Reset CFA to SP
>>     }
>>      }
>> so there should always be a move of r7 into SP, even if this is
>> strictly redundant.  I don't understand why this doesn't happen for
>> your testcase.  Can you dig a bit deeper?  I wonder if we've
>> (probably incorrectly) assumed that this function doesn't need an
>> epilogue but can use a simple return?  I don't think we should do
>> that when authentication is needed: a simple return should really be
>> one instruction.
>> 
>
> So I strongly suspect the real problem here is that use_return_insn ()
> in arm.cc needs to be updated to return false when using pointer
> authentication.  The specification for this function says that a
> return can be done in one instruction; and clearly when we need
> authentication more than one is needed.
>
> R.

So yes I agree with your analysis.  I'm respinning 10/15 to include your
suggestion and I believe we can just drop this patch.

Thanks

  Andrea


[Patch] Resolve bugzilla #108150 and #108192 for mingw

2023-01-11 Thread Jonathan Yong via Gcc-patches

Are the patches and changelogs OK?From 6edfba9e9a5f8fddc45d137b9f2d07c7f9065eaa Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Sun, 8 Jan 2023 01:28:34 +
Subject: [PATCH 1/2] PR c/108150 - Fix alignment test for Windows targets

gcc/testsuite/ChangeLog:

	PR c/108150
	* gcc.dg/attr-aligned.c: Make errors emitted on Windows
	target same as on Linux.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/testsuite/gcc.dg/attr-aligned.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/attr-aligned.c b/gcc/testsuite/gcc.dg/attr-aligned.c
index a2e11c96180..887bdd0f379 100644
--- a/gcc/testsuite/gcc.dg/attr-aligned.c
+++ b/gcc/testsuite/gcc.dg/attr-aligned.c
@@ -22,6 +22,9 @@
 #  define ALIGN_MAX_STATIC  2
 /* Work around a pdp11 ICE (see PR target/87821).  */
 #  define ALIGN_MAX_AUTO(ALIGN_MAX_HARD >> 14)
+#elif __WIN32__ || __CYGWIN__
+#  define ALIGN_MAX_STATIC  8192
+#  define ALIGN_MAX_AUTO8192
 #elif __powerpc64__ || __x86_64__
 /* Is this processor- or operating-system specific?  */
 #  define ALIGN_MAX_STATIC  ALIGN_MAX_HARD
-- 
2.39.0

From 1c9781f7af30e600367682fe0e47128ea85552ab Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Wed, 11 Jan 2023 09:51:02 +
Subject: [PATCH 2/2] PR c/108192 - Fix test for mingw

gcc/testsuite/ChangeLog:

	PR c/108192
	* g++.dg/cet-notrack-1.C: Use puts instead of printf,
	so function call is not mangled by __mingw_printf when
	doing assembly symbol inspection.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/testsuite/g++.dg/cet-notrack-1.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/cet-notrack-1.C b/gcc/testsuite/g++.dg/cet-notrack-1.C
index ee98fd43d58..a19eed0fb82 100644
--- a/gcc/testsuite/g++.dg/cet-notrack-1.C
+++ b/gcc/testsuite/g++.dg/cet-notrack-1.C
@@ -18,8 +18,8 @@ B b;
 A& a = b;
 int (A::*amem) () __attribute__((nocf_check)) = &A::foo; // take address
 if ((a.*amem)() == 73) // use the address
-  printf("pass\n");
+  puts("pass\n");
 else
-  printf("fail\n");
+  puts("fail\n");
 return 0;
 }
-- 
2.39.0



[PATCH 10/15 V7] arm: Implement cortex-M return signing address codegen

2023-01-11 Thread Andrea Corallo via Gcc-patches
Richard Earnshaw  writes:

[...]

>
> Otherwise ok with that change.
>
> R.

Minor respin of this patch addressing the suggestion to have
'use_return_insn' return zero when PAC is enabled.

BR

  Andrea

>From 0a894f73fc09be865b7a7cb205e871bf82f8abba Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 20 Jan 2022 15:36:23 +0100
Subject: [PATCH] [PATCH 10/15] arm: Implement cortex-M return signing address
 codegen

Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

void foo (void);

int main()
{
  foo ();
  return 0;
}

Compiled with '-march=armv8.1-m.main -mbranch-protection=pac-ret
-mthumb' translates into:

main:
pac ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.

Ex. the previous example compiled with '-march=armv8.1-m.main
-mbranch-protection=pac-ret+bti -mthumb' translates into:

main:
pacbti  ip, lr, sp
push{r3, r7, ip, lr}
add r7, sp, #0
bl  foo
movsr3, #0
mov r0, r3
pop {r3, r7, ip, lr}
aut ip, lr, sp
bx  lr

As part of previous upstream suggestions a test for varargs has been
added and '-mtpcs-frame' is deemed being incompatible with this return
signing address feature being introduced.

[1] 


gcc/Changelog

2021-11-03  Andrea Corallo  

* config/arm/arm.h (arm_arch8m_main): Declare it.
* config/arm/arm.cc (arm_arch8m_main): Define it.
(arm_option_reconfigure_globals): Set arm_arch8m_main.
(arm_compute_frame_layout, arm_expand_prologue)
(thumb2_expand_return, arm_expand_epilogue)
(arm_conditional_register_usage): Update for pac codegen.
(arm_current_function_pac_enabled_p): New function.
(aarch_bti_enabled) New function.
(use_return_insn): Return zero when pac is enabled.
* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
Add new patterns.
* config/arm/unspecs.md (UNSPEC_PAC_NOP)
(VUNSPEC_PACBTI_NOP, VUNSPEC_AUT_NOP): Add unspecs.

gcc/testsuite/Changelog

2021-11-03  Andrea Corallo  

* gcc.target/arm/pac.h : New file.
* gcc.target/arm/pac-1.c : New test case.
* gcc.target/arm/pac-2.c : Likewise.
* gcc.target/arm/pac-3.c : Likewise.
* gcc.target/arm/pac-4.c : Likewise.
* gcc.target/arm/pac-5.c : Likewise.
* gcc.target/arm/pac-6.c : Likewise.
* gcc.target/arm/pac-7.c : Likewise.
* gcc.target/arm/pac-8.c : Likewise.
* gcc.target/arm/pac-9.c : Likewise.
* gcc.target/arm/pac-10.c : Likewise.
* gcc.target/arm/pac-11.c : Likewise.
---
 gcc/config/arm/arm-protos.h   |  1 +
 gcc/config/arm/arm.cc | 79 ---
 gcc/config/arm/arm.h  |  4 ++
 gcc/config/arm/arm.md | 23 
 gcc/config/arm/unspecs.md |  3 +
 gcc/testsuite/gcc.target/arm/pac-1.c  | 11 
 gcc/testsuite/gcc.target/arm/pac-10.c | 10 
 gcc/testsuite/gcc.target/arm/pac-11.c | 10 
 gcc/testsuite/gcc.target/arm/pac-2.c  | 11 
 gcc/testsuite/gcc.target/arm/pac-3.c  | 11 
 gcc/testsuite/gcc.target/arm/pac-4.c  | 10 
 gcc/testsuite/gcc.target/arm/pac-5.c  | 28 ++
 gcc/testsuite/gcc.target/arm/pac-6.c  | 18 ++
 gcc/testsuite/gcc.target/arm/pac-7.c  | 32 +++
 gcc/testsuite/gcc.target/arm/pac-8.c  | 34 
 gcc/testsuite/gcc.target/arm/pac-9.c  | 11 
 gcc/testsuite/gcc.target/arm/pac.h| 17 ++
 17 files changed, 304 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-10.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-11.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-4.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-6.c
 create mode 100644 gcc/testsuite/gcc.target/

[PATCH] Fix PR tree-optimization/108199

2023-01-11 Thread Eric Botcazou via Gcc-patches
Hi,

this fixes the problematic interaction between bitfields, unions, SSO and SRA.

Tested on x86-64/Linux and SPARC/Solaris, OK for all active branches?


2023-01-11  Eric Botcazou  
Andreas Krebbel  

PR tree-optimization/108199
* tree-sra.cc (sra_modify_expr): Deal with reverse storage order
for bit-field references.


2023-01-11  Eric Botcazou  

* gcc.dg/sso-17.c: New test.

-- 
Eric Botcazoudiff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index f0182a45485..ad0c738645d 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -3858,7 +3858,23 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	}
 	}
   else
-	*expr = repl;
+	{
+	  /* If we are going to replace a scalar field in a structure with
+	 reverse storage order by a stand-alone scalar, we are going to
+	 effectively byte-swap the scalar and we also need to byte-swap
+	 the portion of it represented by the bit-field.  */
+	  if (bfr && REF_REVERSE_STORAGE_ORDER (bfr))
+	{
+	  REF_REVERSE_STORAGE_ORDER (bfr) = 0;
+	  TREE_OPERAND (bfr, 2)
+		= size_binop (MINUS_EXPR, TYPE_SIZE (TREE_TYPE (repl)),
+			  size_binop (PLUS_EXPR, TREE_OPERAND (bfr, 1),
+		 TREE_OPERAND (bfr, 2)));
+	}
+
+	  *expr = repl;
+	}
+
   sra_stats.exprs++;
 }
   else if (write && access->grp_to_be_debug_replaced)
--  { dg-do run }
--  { dg-options "-gnatws -O" }

with System;

procedure SSO17 is

  type My_Float is new Float range 0.0 .. 359.99;

  type Rec is record
Az : My_Float;
El : My_Float;
  end record;
  for Rec'Bit_Order use System.High_Order_First;
  for Rec'Scalar_Storage_Order use System.High_Order_First;

  R : Rec;

  procedure Is_True (B : Boolean);
  pragma No_Inline (Is_True);

  procedure Is_True (B : Boolean) is
  begin
if not B then
  raise Program_Error;
end if;
  end;

begin
  R := (Az => 1.1, El => 2.2);
  Is_True (R.Az'Valid);
  R := (Az => 3.3, El => 4.4);
  Is_True (R.Az'Valid);
end;


Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread NightStrike via Gcc-patches
On Wed, Jan 11, 2023 at 4:43 AM Jakub Jelinek  wrote:
>
> On Wed, Jan 11, 2023 at 04:27:11AM -0500, NightStrike wrote:
> > On Wed, Jan 11, 2023 at 4:07 AM Jakub Jelinek  wrote:
> > >
> > > On Wed, Jan 11, 2023 at 03:58:40AM -0500, NightStrike wrote:
> > > > On Fri, Jan 6, 2023 at 4:56 AM Jakub Jelinek via Gcc-patches
> > > >  wrote:
> > > > > --- gcc/testsuite/gcc.dg/pr108308.c.jj  2023-01-06 10:43:45.793009294 
> > > > > +0100
> > > > > +++ gcc/testsuite/gcc.dg/pr108308.c 2023-01-06 10:43:40.218090375 
> > > > > +0100
> > > > > @@ -0,0 +1,39 @@
> > > > > +/* PR target/108308 */
> > > > > +/* { dg-do run { target { ilp32 || lp64 } } } */
> > > >
> > > > This test passes on Windows, and I don't see anything in the test that
> > > > jumps out at me as being affected by storing pointers in longs.  Is
> > > > there something I'm missing about why this would be disabled on LLP64?
> > >
> > > Maybe the test just needs int32, it didn't look important enough to me.
> > > ilp32 || lp64 covers most of important targets.
> >
> > Could you change to int32plus, then?
>
> I think int32plus would be wrong, the testcase has some overlarge constant
> and I doubt it would work correctly on the hypothetical target with 64-bit
> ints where the overlarge constant would fit into int.

Ok, then:

/* { dg-do run { target { { ilp32 || lp64 } || llp64 } } } */

or even:

/* { dg-do run { target { ! int16 } } } */

Though I'd point out that in your original message, you only cared
about the "important targets".  I don't think nonexistent ones where
sizeof(int) == 8 qualifies :)


Re: [PATCH] libsanitizer/mips: always build with largefile support

2023-01-11 Thread YunQiang Su
Hans-Peter Nilsson  于2023年1月11日周三 08:53写道:
>
> On Fri, 6 Jan 2023, YunQiang Su wrote:
>
> > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 is always used for mips
> > when build libsanitizer in LLVM. Thus
> >FIRST_32_SECOND_64((_MIPS_SIM == _ABIN32) ? 176 : 160, 216);
> > instead of
> >FIRST_32_SECOND_64((_MIPS_SIM == _ABIN32) ? 160 : 144, 216);
> > in sanitizer_platform_limits_posix.h.
> >
> > To keep sync with LLVM and to make the code simple, we use the
> > largefile options always.
> >
> > libsanitizer/
> >   * configure.ac: set -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
> > always for mips*.
> >   * configure: Regenerate.
>
> Hm, yes, that might be the most pragmatic way to solve the mips
> stat-size issue...  But shouldn't then largefile-options also be
> forced when libsanitizer is *used*?  IOW, mips*-linux
> gcc-options be tweaked to include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 conditional on sanitizer-options?
>

Sound a good idea...
While I am worrying about some application may fail to build or
trigger some other problems.

> brgds, H-P


[Committed] IBM zSystems: Use NAND instruction to implement bit not

2023-01-11 Thread Andreas Krebbel via Gcc-patches
Bootstrapped and regression tested on s390x.

Committed to mainline.

gcc/ChangeLog:

* config/s390/s390.md (*not): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/s390/not.c: New test.
---
 gcc/config/s390/s390.md |  8 
 gcc/testsuite/gcc.target/s390/not.c | 11 +++
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/not.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 0e56fbad44d..4828aa08be6 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -8302,6 +8302,14 @@
   "nrk\t%0,%1,%2"
   [(set_attr "op_type" "RRF")])
 
+; Use NAND for bit inversion
+(define_insn "*not"
+  [(set (match_operand:GPR  0 "register_operand" "=d")
+   (not:GPR (match_operand:GPR 1 "register_operand"  "d")))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_Z15"
+  "nnrk\t%0,%1,%1"
+  [(set_attr "op_type" "RRF")])
 
 ;
 ; Block inclusive or (OC) patterns.
diff --git a/gcc/testsuite/gcc.target/s390/not.c 
b/gcc/testsuite/gcc.target/s390/not.c
new file mode 100644
index 000..dae95f7d8a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/not.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z15 -mzarch" } */
+
+unsigned long
+foo (unsigned long a)
+{
+  return ~a;
+}
+
+/* { dg-final { scan-assembler-times "\tnngrk\t" 1 { target { lp64 } } } } */
+/* { dg-final { scan-assembler-times "\tnnrk\t" 1 { target { ! lp64 } } } } */
-- 
2.39.0



[PATCH] switch expansion: limit JT growth param values

2023-01-11 Thread Martin Liška
Currently, one can request a huge jump table creation which
leads to a non-sensual huge output. Moreover, use auto_vec rather
than a stack-allocated array.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR middle-end/107976

gcc/ChangeLog:

* params.opt: Limit JT params.
* stmt.cc (emit_case_dispatch_table): Use auto_vec.
---
 gcc/params.opt | 4 ++--
 gcc/stmt.cc| 9 -
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index e178dec1600..3454700eb91 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -327,11 +327,11 @@ Common Joined UInteger Var(param_iv_max_considered_uses) 
Init(250) Param Optimiz
 Bound on number of iv uses in loop optimized in iv optimizations.
 
 -param=jump-table-max-growth-ratio-for-size=
-Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_size) 
Init(300) Param Optimization
+Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_size) 
Init(300) IntegerRange(0, 1) Param Optimization
 The maximum code size growth ratio when expanding into a jump table (in 
percent).  The parameter is used when optimizing for size.
 
 -param=jump-table-max-growth-ratio-for-speed=
-Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_speed) 
Init(800) Param Optimization
+Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_speed) 
Init(800) IntegerRange(0, 1) Param Optimization
 The maximum code size growth ratio when expanding into a jump table (in 
percent).  The parameter is used when optimizing for speed.
 
 -param=l1-cache-line-size=
diff --git a/gcc/stmt.cc b/gcc/stmt.cc
index 82a3e1035ec..b239c02018a 100644
--- a/gcc/stmt.cc
+++ b/gcc/stmt.cc
@@ -746,7 +746,7 @@ emit_case_dispatch_table (tree index_expr, tree index_type,
  tree range, basic_block stmt_bb)
 {
   int i, ncases;
-  rtx *labelvec;
+  auto_vec labelvec;
   rtx_insn *fallback_label = label_rtx (case_list[0].m_code_label);
   rtx_code_label *table_label = gen_label_rtx ();
   bool has_gaps = false;
@@ -779,8 +779,7 @@ emit_case_dispatch_table (tree index_expr, tree index_type,
   /* Get table of labels to jump to, in order of case index.  */
 
   ncases = tree_to_shwi (range) + 1;
-  labelvec = XALLOCAVEC (rtx, ncases);
-  memset (labelvec, 0, ncases * sizeof (rtx));
+  labelvec.safe_grow_cleared (ncases);
 
   for (unsigned j = 0; j < case_list.length (); j++)
 {
@@ -860,11 +859,11 @@ emit_case_dispatch_table (tree index_expr, tree 
index_type,
 emit_jump_table_data (gen_rtx_ADDR_DIFF_VEC (CASE_VECTOR_MODE,
 gen_rtx_LABEL_REF (Pmode,

table_label),
-gen_rtvec_v (ncases, labelvec),
+gen_rtvec_v (ncases, 
labelvec.address ()),
 const0_rtx, const0_rtx));
   else
 emit_jump_table_data (gen_rtx_ADDR_VEC (CASE_VECTOR_MODE,
-   gen_rtvec_v (ncases, labelvec)));
+   gen_rtvec_v (ncases, 
labelvec.address (;
 
   /* Record no drop-through after the table.  */
   emit_barrier ();
-- 
2.39.0



Re: [PATCH 10/15 V7] arm: Implement cortex-M return signing address codegen

2023-01-11 Thread Richard Earnshaw via Gcc-patches




On 11/01/2023 09:58, Andrea Corallo via Gcc-patches wrote:

Richard Earnshaw  writes:

[...]



Otherwise ok with that change.

R.


Minor respin of this patch addressing the suggestion to have
'use_return_insn' return zero when PAC is enabled.

BR

   Andrea



+  /* Never use a return instruction when return address signing
+ mechanism is enabled.  */
+  if (arm_current_function_pac_enabled_p ())
+return 0;
+

I can see what it does.  It would be better to explain why it does: ie 
that return address authentication needs more than one instruction.


OK with that change.


Re: [PATCH 1/15 V2] arm: Make mbranch-protection opts parsing common to AArch32/64

2023-01-11 Thread Richard Earnshaw via Gcc-patches




On 22/12/2022 17:04, Andrea Corallo via Gcc-patches wrote:

Hi all,

respinning this as a rebase was necessary, also now is setting
'aarch_enable_bti' to zero as default for arm as suggested during the
review of 12/15.

Best Regards

   Andrea




gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc: Include aarch-common.h.
(all_architectures): Fix comment.
(aarch64_parse_extension): Rename return type, enum value names.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Rename
factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
Also rename corresponding enum values.
* config/aarch64/aarch64-opts.h (aarch64_function_type): Factor
out aarch64_function_type and move it to common code as
aarch_function_type in aarch-common.h.
* config/aarch64/aarch64-protos.h: Include common types header,
move out types aarch64_parse_opt_result and aarch64_key_type to
aarch-common.h
* config/aarch64/aarch64.cc: Move mbranch-protection parsing types
and functions out into aarch-common.h and aarch-common.cc.  Fix up
all the name changes resulting from the move.
* config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name change
and enum value.
* config/aarch64/aarch64.opt: Include aarch-common.h to import
type move.  Fix up name changes from factoring out common code and
data.
* config/arm/aarch-common-protos.h: Export factored out routines to both
backends.
* config/arm/aarch-common.cc: Include newly factored out types.
Move all mbranch-protection code and data structures from
aarch64.cc.
* config/arm/aarch-common.h: New header that declares types shared
between aarch32 and aarch64 backends.
* config/arm/arm-protos.h: Declare types and variables that are
made common to aarch64 and aarch32 backends - aarch_ra_sign_key,
aarch_ra_sign_scope and aarch_enable_bti.

I don't see an entry for config/arm/arm.opt.  Please make sure you 
patches pass "git gcc-verify".


Otherwise, this is OK.

R.


[PATCH] tree-optimization/108353 - copyprop iteration order

2023-01-11 Thread Richard Biener via Gcc-patches
After recent improvements to copyprop to catch more constants
it shows that the current iteration order prefering forward
progress over iterating doesn't make much sense for an SSA
propagator.  The following instead first iterates cycles which
makes sure to not start with optimistically constant PHIs out
of cycles that optimistically do not exit.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/108353
* tree-ssa-propagate.cc (cfg_blocks_back, ssa_edge_worklist_back):
Remove.
(add_ssa_edge): Simplify.
(add_control_edge): Likewise.
(ssa_prop_init): Likewise.
(ssa_prop_fini): Likewise.
(ssa_propagation_engine::ssa_propagate): Likewise.

* gcc.dg/tree-ssa/ssa-copyprop-3.c: New testcase.
---
 .../gcc.dg/tree-ssa/ssa-copyprop-3.c  | 38 +++
 gcc/tree-ssa-propagate.cc | 35 ++---
 2 files changed, 42 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c
new file mode 100644
index 000..d22b39294ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c
@@ -0,0 +1,38 @@
+/* { dg-do link } */
+/* { dg-require-effective-target int32plus } */
+/* { dg-options "-O -fdump-tree-copyprop2" } */
+
+#include 
+enum { a } b();
+int d;
+int e;
+int f;
+void foo();
+[[gnu::noipa]]
+void bar49_(void){}
+[[gnu::noipa]]
+void(c)(void){}
+static short g(int h, int i) {
+  int j = -1420678603, k = 1;
+  if (h)
+for (; j < INT_MAX-18; j = j + 9) {
+  f = 0;
+  for (; f <= 1; c())
+k = 90;
+}
+  i = k;
+  for (; e; ++e) {
+if (i)
+  continue;
+foo();
+i = b();
+  }
+  return 4;
+}
+int l() {
+  bar49_();
+  return 1;
+}
+int main() { d = d || g(d, l()); }
+
+/* { dg-final { scan-tree-dump-not "foo" "copyprop2" } } */
diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index 472c4bcb540..76708ca185f 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -113,7 +113,6 @@
order by visiting in bit-order.  We use two worklists to
first make forward progress before iterating.  */
 static bitmap cfg_blocks;
-static bitmap cfg_blocks_back;
 static int *bb_to_cfg_order;
 static int *cfg_order_to_bb;
 
@@ -123,7 +122,6 @@ static int *cfg_order_to_bb;
UID in a bitmap.  UIDs order stmts in execution order.  We use
two worklists to first make forward progress before iterating.  */
 static bitmap ssa_edge_worklist;
-static bitmap ssa_edge_worklist_back;
 static vec uid_to_stmt;
 
 /* Current RPO index in the iteration.  */
@@ -159,12 +157,7 @@ add_ssa_edge (tree var)
   & EDGE_EXECUTABLE))
continue;
 
-  bitmap worklist;
-  if (bb_to_cfg_order[gimple_bb (use_stmt)->index] < curr_order)
-   worklist = ssa_edge_worklist_back;
-  else
-   worklist = ssa_edge_worklist;
-  if (bitmap_set_bit (worklist, gimple_uid (use_stmt)))
+  if (bitmap_set_bit (ssa_edge_worklist, gimple_uid (use_stmt)))
{
  uid_to_stmt[gimple_uid (use_stmt)] = use_stmt;
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -193,10 +186,7 @@ add_control_edge (edge e)
   e->flags |= EDGE_EXECUTABLE;
 
   int bb_order = bb_to_cfg_order[bb->index];
-  if (bb_order < curr_order)
-bitmap_set_bit (cfg_blocks_back, bb_order);
-  else
-bitmap_set_bit (cfg_blocks, bb_order);
+  bitmap_set_bit (cfg_blocks, bb_order);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "Adding destination of edge (%d -> %d) to worklist\n",
@@ -380,9 +370,7 @@ ssa_prop_init (void)
 
   /* Worklists of SSA edges.  */
   ssa_edge_worklist = BITMAP_ALLOC (NULL);
-  ssa_edge_worklist_back = BITMAP_ALLOC (NULL);
   bitmap_tree_view (ssa_edge_worklist);
-  bitmap_tree_view (ssa_edge_worklist_back);
 
   /* Worklist of basic-blocks.  */
   bb_to_cfg_order = XNEWVEC (int, last_basic_block_for_fn (cfun) + 1);
@@ -392,7 +380,6 @@ ssa_prop_init (void)
   for (int i = 0; i < n; ++i)
 bb_to_cfg_order[cfg_order_to_bb[i]] = i;
   cfg_blocks = BITMAP_ALLOC (NULL);
-  cfg_blocks_back = BITMAP_ALLOC (NULL);
 
   /* Initially assume that every edge in the CFG is not executable.
  (including the edges coming out of the entry block).  Mark blocks
@@ -430,11 +417,9 @@ static void
 ssa_prop_fini (void)
 {
   BITMAP_FREE (cfg_blocks);
-  BITMAP_FREE (cfg_blocks_back);
   free (bb_to_cfg_order);
   free (cfg_order_to_bb);
   BITMAP_FREE (ssa_edge_worklist);
-  BITMAP_FREE (ssa_edge_worklist_back);
   uid_to_stmt.release ();
 }
 
@@ -453,8 +438,7 @@ ssa_propagation_engine::ssa_propagate (void)
   curr_order = 0;
 
   /* Iterate until the worklists are empty.  We iterate both blocks
- and stmts in RPO order, using sets of two worklists to first
- complete the current iteration before iterating over backedges

[gcc-12 backport] strlen: do not use cond_expr for boundaries

2023-01-11 Thread Martin Liška
Tested, I'm going to push it.

Martin

PR tree-optimization/108137

gcc/ChangeLog:

* tree-ssa-strlen.cc (get_range_strlen_phi): Reject anything
different from INTEGER_CST.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr108137.c: New test.

(cherry picked from commit ee6f262b87fef590729e96e999f1c3b207c251c0)
---
 gcc/testsuite/gcc.dg/tree-ssa/pr108137.c |  8 
 gcc/tree-ssa-strlen.cc   | 13 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108137.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr108137.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr108137.c
new file mode 100644
index 000..f0cb71b2267
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr108137.c
@@ -0,0 +1,8 @@
+// PR tree-optimization/108137
+// { dg-do compile }
+// { dg-options "-Wformat-overflow" }
+
+void f(unsigned short x_port, unsigned int x_host)
+{
+__builtin_printf("missing %s", x_port ? "host" : &"host:port"[x_host ? 5 : 
0]);
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 9ae25d1dde2..2d7db6da5bc 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -1136,14 +1136,15 @@ get_range_strlen_phi (tree src, gphi *phi,
 
   /* Adjust the minimum and maximum length determined so far and
 the upper bound on the array size.  */
-  if (!pdata->minlen
- || tree_int_cst_lt (argdata.minlen, pdata->minlen))
+  if (TREE_CODE (argdata.minlen) == INTEGER_CST
+ && (!pdata->minlen
+ || tree_int_cst_lt (argdata.minlen, pdata->minlen)))
pdata->minlen = argdata.minlen;
 
-  if (!pdata->maxlen
- || (argdata.maxlen
- && TREE_CODE (argdata.maxlen) == INTEGER_CST
- && tree_int_cst_lt (pdata->maxlen, argdata.maxlen)))
+  if (TREE_CODE (argdata.maxlen) == INTEGER_CST
+ && (!pdata->maxlen
+ || (argdata.maxlen
+ && tree_int_cst_lt (pdata->maxlen, argdata.maxlen
pdata->maxlen = argdata.maxlen;
 
   if (!pdata->maxbound
-- 
2.39.0



Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2023-01-11 Thread Martin Liška
On 1/10/23 19:12, Jakub Jelinek via Gcc-patches wrote:
> Anyway, the sooner this makes it into gcc trunk, the better, it breaks quite
> a lot of stuff.

Yep, please, we're also waiting for this patch for pushing to our gcc13 package.

Cheers,
Martin


[PING^2] nvptx: Re-enable a number of test cases

2023-01-11 Thread Thomas Schwinge
Hi!

Ping this whole series.


Grüße
 Thomas


On 2022-12-20T08:56:42+0100, I wrote:
> Hi!
>
> Ping this whole series.
>
>
> Grüße
>  Thomas
>
>
> On 2022-12-02T13:03:06+0100, I wrote:
>> Hi!
>>
>> I'm proposing to re-enable a number of test cases for nvptx.  OK to push?
>>
>>
>> Grüße
>>  Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PING] nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution (was: [committed][nvptx] Add uniform_warp_check insn)

2023-01-11 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-12-15T19:27:08+0100, I wrote:
> Hi Tom!
>
> First "a bit" of context; skip to "the proposed patch" if you'd like to
> see just that.
>
>
> On 2022-02-01T19:31:27+0100, Tom de Vries via Gcc-patches 
>  wrote:
>> On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
>> ...
>> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
>>   -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
>>   -O2 execution test
>> ...
>> which minimizes to the same test-case as listed in commit "[nvptx]
>> Update default ptx isa to 6.3".
>>
>> The problem is again that the first diverging branch is not handled as such 
>> in
>> SASS, which causes problems with a subsequent shfl insn, but given that we
>> have -mptx=3.1 we can't use the bar.warp.sync insn.
>>
>> Given that the default is now -mptx=6.3, and consequently -mptx=3.1 is of a
>> lesser importance, implement the next best thing: abort when detecting
>> non-convergence using this insn:
>> ...
>>   { .reg.b32 act;
>> vote.ballot.b32 act,1;
>> .reg.pred uni;
>> setp.eq.b32 uni,act,0x;
>> @ !uni trap;
>> @ !uni exit;
>>   }
>> ...
>>
>> Interestingly, the effect of this is that rather than aborting, the test-case
>> now passes.
>
> (I suppose this "nudges" the PTX -> SASS compiler into the right
> direction?)
>
>
> For avoidance of doubt, my following discussion is not about the specific
> (first) use of 'nvptx_uniform_warp_check' introduced here in this
> commit r12-6971-gf32f74c2e8cef5fe37af6d4e8d7e8f6b4c8ae9a8
> "[nvptx] Add uniform_warp_check insn":
>
>> --- a/gcc/config/nvptx/nvptx.cc
>> +++ b/gcc/config/nvptx/nvptx.cc
>> @@ -4631,15 +4631,29 @@ nvptx_single (unsigned mask, basic_block from, 
>> basic_block to)
>>  if (tail_branch)
>>{
>>  label_insn = emit_label_before (label, before);
>> -if (TARGET_PTX_6_0 && mode == GOMP_DIM_VECTOR)
>> -  warp_sync = emit_insn_after (gen_nvptx_warpsync (), label_insn);
>> +if (mode == GOMP_DIM_VECTOR)
>> +  {
>> +if (TARGET_PTX_6_0)
>> +  warp_sync = emit_insn_after (gen_nvptx_warpsync (),
>> +   label_insn);
>> +else
>> +  warp_sync = emit_insn_after (gen_nvptx_uniform_warp_check (),
>> +   label_insn);
>> +  }
>>  before = label_insn;
>>}
>>  else
>>{
>>  label_insn = emit_label_after (label, tail);
>> -if (TARGET_PTX_6_0 && mode == GOMP_DIM_VECTOR)
>> -  warp_sync = emit_insn_after (gen_nvptx_warpsync (), label_insn);
>> +if (mode == GOMP_DIM_VECTOR)
>> +  {
>> +if (TARGET_PTX_6_0)
>> +  warp_sync = emit_insn_after (gen_nvptx_warpsync (),
>> +   label_insn);
>> +else
>> +  warp_sync = emit_insn_after (gen_nvptx_uniform_warp_check (),
>> +   label_insn);
>> +  }
>>  if ((mode == GOMP_DIM_VECTOR || mode == GOMP_DIM_WORKER)
>>  && CALL_P (tail) && find_reg_note (tail, REG_NORETURN, NULL))
>>emit_insn_after (gen_exit (), label_insn);
>
> Later, other uses have been added, for example in OpenMP '-muniform-simt'
> code generation.
>
> My following discussion is about the implementation of
> 'nvptx_uniform_warp_check', originally introduced as follows:
>
>> --- a/gcc/config/nvptx/nvptx.md
>> +++ b/gcc/config/nvptx/nvptx.md
>> @@ -57,6 +57,7 @@ (define_c_enum "unspecv" [
>> UNSPECV_XCHG
>> UNSPECV_BARSYNC
>> UNSPECV_WARPSYNC
>> +   UNSPECV_UNIFORM_WARP_CHECK
>> UNSPECV_MEMBAR
>> UNSPECV_MEMBAR_CTA
>> UNSPECV_MEMBAR_GL
>> @@ -1985,6 +1986,23 @@ (define_insn "nvptx_warpsync"
>>"\\tbar.warp.sync\\t0x;"
>>[(set_attr "predicable" "false")])
>>
>> +(define_insn "nvptx_uniform_warp_check"
>> +  [(unspec_volatile [(const_int 0)] UNSPECV_UNIFORM_WARP_CHECK)]
>> +  ""
>> +  {
>> +output_asm_insn ("{", NULL);
>> +output_asm_insn ("\\t"   ".reg.b32""\\t" "act;", NULL);
>> +output_asm_insn ("\\t"   "vote.ballot.b32" "\\t" "act,1;", NULL);
>> +output_asm_insn ("\\t"   ".reg.pred"   "\\t" "uni;", NULL);
>> +output_asm_insn ("\\t"   "setp.eq.b32" "\\t" "uni,act,0x;",
>> + NULL);
>> +output_asm_insn ("@ !uni\\t" "trap;", NULL);
>> +output_asm_insn ("@ !uni\\t" "exit;", NULL);
>> +output_asm_insn ("}", NULL);
>> +return "";
>> +  }
>> +  [(set_attr "predicable" "false")])
>
> Later adjusted, but the fundamental idea is still the same.
>
>
> Via temporarily disabling 'nvptx_uniform_warp_check':
>
>  (define_insn "nvptx_uniform_warp_check"
>[(unspec_volatile [(const_int 0)] UNSPECV_UNIFORM_WARP_CHECK)]
>""
>{
> +#if 0
>  const char *insns[] = {
>"{",
>"\\t

[PING] Add '-Wno-complain-wrong-lang', and use it in 'gcc/testsuite/lib/target-supports.exp:check_compile' and elsewhere

2023-01-11 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-12-16T15:10:12+0100, I wrote:
> Hi!
>
> On 2022-12-15T16:17:05+0100, Jakub Jelinek  wrote:
>> On Thu, Dec 15, 2022 at 04:01:33PM +0100, Thomas Schwinge wrote:
>>> Or, options are applicable to just one front end, and can just be a no-op
>>> for others, for shared-language compilation.  For example, '-nostdinc++',
>>> or '-frust-incomplete-and-experimental-compiler-do-not-use' need not
>>> necessarily emit a diagnostic, but can just just be ignored by 'cc1',
>>> 'f951', 'lto1'.
>>
>> One simple change could be to add a new warning option and use it for
>> complain_wrong_lang warnings:
>>   else if (ok_langs[0] != '\0')
>> /* Eventually this should become a hard error IMO.  */
>> warning (0, "command-line option %qs is valid for %s but not for %s",
>>  text, ok_langs, bad_lang);
>
> (By the way, that comment was originally added in 2003-06-07
> commit 2772ef3ef33609dd64209323e9418a847685971a
> "Move handling of lang-specific switches to toplev".)
>
>>   else
>> /* Happens for -Werror=warning_name.  */
>> warning (0, "%<-Werror=%> argument %qs is not valid for %s",
>>  text, bad_lang);
>> We could keep the existing behavior, but give users (and our testsuite)
>> a way to silence that warning if they are ok with it applying only to a
>> subset of languages.
>> Then one could use
>> -frust-incomplete-and-experimental-compiler-do-not-use -Wno-whatever
>> or add that -Wno-whatever in check_compile if the snippet is different
>> language from main language of the testsuite (or always) etc.
>
> Like in the attaached
> "Add '-Wno-complain-wrong-lang', and use it in 
> 'gcc/testsuite/lib/target-supports.exp:check_compile' and elsewhere",
> for example?
>
> Anything that 'gcc/opts-global.cc:complain_wrong_lang' might do is cut
> short by '-Wno-complain-wrong-lang', not just the one 'warning'
> diagnostic.  This corresponds to what already exists via
> 'lang_hooks.complain_wrong_lang_p'.
>
> The 'gcc/opts-common.cc:prune_options' changes follow the same rationale
> as PR67640 "driver passes -fdiagnostics-color= always last": we need to
> process '-Wno-complain-wrong-lang' early, so that it properly affects
> other options appearing before it on the command line.
>
>
> In the test suites, a number of existing test cases explicitly match the
> "command-line option [...] is valid for [...] but not for [...]"
> diagnostic with 'dg-warning'; I've left those alone.  On the other hand,
> I've changed 'dg-prune-output' of this diagnostic into
> '-Wno-complain-wrong-lang' usage.  I'm happy to adjust that in either way
> anyone may prefer.  I've not looked for test cases that just to silence
> this diagnostic use more general 'dg-prune-output', 'dg-excess-errors',
> '-w', etc.
>
> In the GCC/D test suite, I see a number of:
>
> cc1plus: warning: command-line option '-fpreview=in' is valid for D but 
> not for C++
>
> cc1plus: warning: command-line option '-fextern-std=c++11' is valid for D 
> but not for C++
>
> It's not clear to me how they, despite this, do achieve
> 'PASS: [...] (test for excess errors)'?  Maybe I haven't found where that
> gets pruned/ignored?
>
>
> In addition to the test suites, I'm also seeing:
>
> build-gcc/build-x86_64-pc-linux-gnu/libcpp/config.log:cc1: warning: 
> command line option '-fno-rtti' is valid for C++/ObjC++ but not for C 
> [enabled by default]
> build-gcc/gcc/config.log:cc1: warning: command-line option '-fno-rtti' is 
> valid for C++/D/ObjC++ but not for C
> build-gcc/gcc/config.log:cc1: warning: command-line option '-fno-rtti' is 
> valid for C++/D/ObjC++ but not for C
> build-gcc/libbacktrace/config.log:cc1: warning: command-line option 
> '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/libcc1/config.log:cc1: warning: command-line option '-fno-rtti' 
> is valid for C++/D/ObjC++ but not for C
> build-gcc/libcpp/config.log:cc1: warning: command-line option '-fno-rtti' 
> is valid for C++/D/ObjC++ but not for C
> build-gcc/lto-plugin/config.log:cc1: warning: command-line option 
> '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/x86_64-pc-linux-gnu/libatomic/config.log:cc1: warning: 
> command-line option '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/x86_64-pc-linux-gnu/libbacktrace/config.log:cc1: warning: 
> command-line option '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/x86_64-pc-linux-gnu/libffi/config.log:cc1: warning: 
> command-line option '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/x86_64-pc-linux-gnu/libgfortran/config.log:cc1: warning: 
> command-line option '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/x86_64-pc-linux-gnu/libgo/config.log:cc1: warning: command-line 
> option '-fno-rtti' is valid for C++/D/ObjC++ but not for C
> build-gcc/x86_64-pc-linux-gnu/libgomp/config.log:cc1: warning: 
> command-line option '-fno-rtti' is valid for C

[PATCH] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread jinma via Gcc-patches
From e4ce8e825c145d74e6b9827f972629548e39f118 Mon Sep 17 00:00:00 2001
From: Jin Ma 
Date: Wed, 11 Jan 2023 19:13:27 +0800
Subject: [PATCH] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

This patch adds the 'Zfa' extension for riscv, which is an implementation for
unratified and unfrozen RISC-V extension.

Although the binutils-gdb for 'Zfa' extension is not yet upstream, we can try
to discuss it. And we can test new instructions for your (possibly virtual)
environment and early review for fast adoption after ratification.

This is based on:
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.

The Wiki Page (details):
https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa

The binutils-gdb for 'Zfa' extension:
(https://sourceware.org/pipermail/binutils/2022-September/122938.html)

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc:
 * config/riscv/constraints.md (Zf):
 * config/riscv/predicates.md:
 * config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2):
 (AVAIL):
 (RISCV_ATYPE_SF):
 (RISCV_ATYPE_DF):
 (RISCV_FTYPE_ATYPES2):
 * config/riscv/riscv-ftypes.def (2):
 * config/riscv/riscv-opts.h (MASK_ZFA):
 (TARGET_ZFA):
 * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli):
 * config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
 (riscv_cannot_force_const_mem):
 (riscv_const_insns):
 (riscv_legitimize_const_move):
 (riscv_split_64bit_move_p):
 (riscv_output_move):
 (riscv_memmodel_needs_release_fence):
 (riscv_print_operand):
 (riscv_secondary_memory_needed):
 * config/riscv/riscv.h (GP_REG_RTX_P):
 * config/riscv/riscv.md (riscv_fminm3):
 (riscv_fmaxm3):
 (fix_truncdfsi2_zfa):
 (round2):
 (rint2):
 (f_quiet4_zfa):
 * config/riscv/riscv.opt:

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zfa-fcvtmod.c: New test.
 * gcc.target/riscv/zfa-fleq-fltq.c: New test.
 * gcc.target/riscv/zfa-fli-zfh.c: New test.
 * gcc.target/riscv/zfa-fli.c: New test.
 * gcc.target/riscv/zfa-fminm-fmaxm.c: New test.
 * gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
 * gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/constraints.md   |   7 ++
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-builtins.cc|  11 ++
 gcc/config/riscv/riscv-ftypes.def |   2 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.cc | 105 +++-
 gcc/config/riscv/riscv.h  |   1 +
 gcc/config/riscv/riscv.md | 114 ++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c  |  12 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c  |  20 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  42 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  80 
 .../gcc.target/riscv/zfa-fminm-fmaxm.c|  25 
 .../gcc.target/riscv/zfa-fmovh-fmovp.c|  11 ++
 gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  25 
 18 files changed, 448 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fminm-fmaxm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0a89fdaffe2..cccec12975c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1242,6 +1244,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},
 
+  {"zfa",   &gcc_options::x_riscv_zf_subext, MASK_ZFA},
+
   {"zmmul", &gcc_options::x_riscv_zm_subext, MASK_ZMMUL},
 
   {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 51cffb2bcb6..2fd407b1d9c 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -110,6 +110,13 @@ (define_constraint "T"
   (and (match_operand 0 "move_operand")
(match_test "CONSTANT_P (op)")))
 
+;

[PING] [PATCH 2/2] nvptx: Prevent emitting duplicate declarations for '__nvptx_stacks', '__nvptx_uni'

2023-01-11 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-12-19T21:40:07+0100, I wrote:
> As I have reported to Nvidia in 2022-12-01 'NVIDIA Incident Report (3891704):
> ptxas: Duplicate declaration error: "cannot be resolved by a '.static'"',
> 'ptxas' has an inscrutable error mode for duplicate declarations:
>
> ptxas softstack-decl-1.o, line 11; error   : '.extern' variable 
> '__nvptx_stacks' cannot be resolved by a '.static'
> ptxas fatal   : Ptx assembly aborted due to errors
> nvptx-as: ptxas returned 255 exit status
>
> ptxas uniform-simt-decl-1.o, line 12; error   : '.extern' variable 
> '__nvptx_uni' cannot be resolved by a '.static'
> ptxas fatal   : Ptx assembly aborted due to errors
> nvptx-as: ptxas returned 255 exit status
>
> This is inscrutable, because (a) what is "cannot be resolved by a '.static'"
> supposed to tell me (there is no '.static' in PTX?), and (b) why arent't
> repeated declaration just verified to match the first, but otherwise a no-op
> (like in other programming languages)?
>
> gcc/
> * config/nvptx/nvptx.cc (nvptx_assemble_undefined_decl): Notice
> '__nvptx_stacks', '__nvptx_uni' declarations.
> (nvptx_file_end): Don't emit duplicate declarations for those.
> gcc/testsuite/
> * gcc.target/nvptx/softstack-decl-1.c: Make 'dg-do assemble',
> adjust.
> * gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
> ---
>  gcc/config/nvptx/nvptx.cc  | 14 --
>  gcc/testsuite/gcc.target/nvptx/softstack-decl-1.c  |  8 
>  .../gcc.target/nvptx/uniform-simt-decl-1.c |  8 
>  3 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index 8e49dd9c647..b93a253ab31 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -180,9 +180,11 @@ static GTY(()) tree global_lock_var;
>
>  /* True if any function references __nvptx_stacks.  */
>  static bool need_softstack_decl;
> +static bool have_softstack_decl;
>
>  /* True if any function references __nvptx_uni.  */
>  static bool need_unisimt_decl;
> +static bool have_unisimt_decl;
>
>  static int nvptx_mach_max_workers ();
>
> @@ -2571,6 +2573,13 @@ nvptx_assemble_undefined_decl (FILE *file, const char 
> *name, const_tree decl)
>  TREE_TYPE (decl), size ? tree_to_shwi (size) : 0,
>  DECL_ALIGN (decl), true);
>nvptx_assemble_decl_end ();
> +
> +  static tree softstack_id = get_identifier ("__nvptx_stacks");
> +  static tree unisimt_id = get_identifier ("__nvptx_uni");
> +  if (DECL_NAME (decl) == softstack_id)
> +have_softstack_decl = true;
> +  else if (DECL_NAME (decl) == unisimt_id)
> +have_unisimt_decl = true;
>  }
>
>  /* Output a pattern for a move instruction.  */
> @@ -6002,7 +6011,7 @@ nvptx_file_end (void)
>  write_shared_buffer (asm_out_file, gang_private_shared_sym,
>  gang_private_shared_align, gang_private_shared_size);
>
> -  if (need_softstack_decl)
> +  if (need_softstack_decl && !have_softstack_decl)
>  {
>write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
>/* 32 is the maximum number of warps in a block.  Even though it's an
> @@ -6011,7 +6020,8 @@ nvptx_file_end (void)
>fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[32];\n",
>POINTER_SIZE);
>  }
> -  if (need_unisimt_decl)
> +
> +  if (need_unisimt_decl && !have_unisimt_decl)
>  {
>write_var_marker (asm_out_file, false, true, "__nvptx_uni");
>fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n");
> diff --git a/gcc/testsuite/gcc.target/nvptx/softstack-decl-1.c 
> b/gcc/testsuite/gcc.target/nvptx/softstack-decl-1.c
> index c502eacc1b3..2415f6adb1f 100644
> --- a/gcc/testsuite/gcc.target/nvptx/softstack-decl-1.c
> +++ b/gcc/testsuite/gcc.target/nvptx/softstack-decl-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do assemble } */
>  /* { dg-options {-save-temps -O0 -msoft-stack} } */
>
>  extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
> @@ -14,7 +14,7 @@ void *f()
>return stack_array[5];
>  }
>
> -/* The implicit (via 'need_softstack_decl') and explicit declarations of
> -   '__nvptx_stacks' are both emitted:
> -   { dg-final { scan-assembler-times {(?n)\.extern .* __nvptx_stacks\[32\];} 
> 2 } }
> +/* Of the implicit (via 'need_softstack_decl') and explicit declarations of
> +   '__nvptx_stacks', only one is emitted:
> +   { dg-final { scan-assembler-times {(?n)\.extern .* __nvptx_stacks\[32\];} 
> 1 } }
>  */
> diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-decl-1.c 
> b/gcc/testsuite/gcc.target/nvptx/uniform-simt-decl-1.c
> index 486456ab243..5a975bdb269 100644
> --- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-decl-1.c
> +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-decl-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } *

[PING^3] nvptx: stack size limits are relevant for execution only (was: [PATCH, testsuite] Add effective target stack_size)

2023-01-11 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-12-20T08:55:08+0100, I wrote:
> Hi!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2022-11-25T12:09:36+0100, I wrote:
>> Hi!
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas
>>
>>
>> On 2022-11-08T21:29:49+0100, I wrote:
>>> Hi!
>>>
>>> On 2017-06-09T16:24:30+0200, Tom de Vries  wrote:
 The patch defines an effective target stack_size, which is used in
 individual test-cases to add -DSTACK_SIZE= [...]
>>>
 gccint.info (edited for long lines):
 ...
 7.2.3.12 Other attributes
 .

 'stack_size'
   Target has limited stack size.  [...]
>>>
>>> On top of that, OK to push the attached
>>> "nvptx: stack size limits are relevant for execution only"?
>>>
>>>
>>> Grüße
>>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 158a077129cb1579b93ddf440a5bb60b457e4b7c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 8 Nov 2022 12:10:03 +0100
Subject: [PATCH] nvptx: stack size limits are relevant for execution only

For non-'dg-do run' test cases, that means: big 'dg-require-stack-size' need
not be UNSUPPORTED (and indeed now do all PASS), 'dg-add-options stack_size'
need not define (and thus limit) 'STACK_SIZE' (and still do all PASS).

Re "Find 'dg-do-what' in an outer frame", currently (sources not completely
clean, though), we've got:

$ git grep -F 'check_effective_target_stack_size: found dg-do-what at level ' -- build-gcc/\*.log | sort | uniq -c
  6 build-gcc/gcc/testsuite/gcc/gcc.log:check_effective_target_stack_size: found dg-do-what at level 2
267 build-gcc/gcc/testsuite/gcc/gcc.log:check_effective_target_stack_size: found dg-do-what at level 3
239 build-gcc/gcc/testsuite/gcc/gcc.log:check_effective_target_stack_size: found dg-do-what at level 4

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_stack_size): For
	nvptx target, stack size limits are relevant for execution only.
	gcc/
	* doc/sourcebuild.texi (stack_size): Update.
---
 gcc/doc/sourcebuild.texi  |  4 
 gcc/testsuite/lib/target-supports.exp | 16 
 2 files changed, 20 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 137f00aadc1f..5bbf6fc55909 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2848,6 +2848,10 @@ Target has limited stack size.  The stack size limit can be obtained using the
 STACK_SIZE macro defined by @ref{stack_size_ao,,@code{dg-add-options} feature
 @code{stack_size}}.
 
+Note that for certain targets, stack size limits are relevant for
+execution only, and therefore considered only if @code{dg-do run} is
+in effect, otherwise unlimited.
+
 @item static
 Target supports @option{-static}.
 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 750897d08548..39ed1723b03a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -625,6 +625,22 @@ proc check_effective_target_trampolines { } {
 # Return 1 if target has limited stack size.
 
 proc check_effective_target_stack_size { } {
+# For nvptx target, stack size limits are relevant for execution only.
+if { [istarget nvptx-*-*] } {
+	# Find 'dg-do-what' in an outer frame.
+	set level 1
+	while true {
+	upvar $level dg-do-what dg-do-what
+	if [info exists dg-do-what] then break
+	incr level
+	}
+	verbose "check_effective_target_stack_size: found dg-do-what at level $level" 2
+
+	if { ![string equal [lindex ${dg-do-what} 0] run] } {
+	return 0
+	}
+}
+
 if [target_info exists gcc,stack_size] {
 	return 1
 }
-- 
2.35.1



[PING^2] nvptx: Support global constructors/destructors via 'collect2'

2023-01-11 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-12-20T09:03:51+0100, I wrote:
> Hi!
>
> Ping.
>
>
> Minor change in the attached
> "nvptx: Support global constructors/destructors via 'collect2'": for
> 'atexit', add '#include ' to 'libgcc/config/nvptx/crt0.c'.
>
>
> Grüße
>  Thomas
>
>
> On 2022-12-02T14:35:35+0100, I wrote:
>> Hi!
>>
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> attached; OK to push?  (... with 'gcc/doc/install.texi' accordingly
>> updated once 
>> "'nm'" and newlib
>> 
>> "nvptx: Implement '_exit' instead of 'exit'" have been merged; any
>> comments to those?)
>>
>> Per my quick scanning of 'gcc/config.gcc' history, for more than two
>> decades, there was a clear trend to remove 'use_collect2=yes'
>> configurations; now finally a new one is being added -- making sure we're
>> not slowly dispensing with the need for the early 1990s piece of work
>> that 'gcc/collect2*' is...  ;'-P
>>
>>
>> Grüße
>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0e7cf5a9f83c3a82eafa126886e5d92651bfbb30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 13 Nov 2022 14:19:30 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".

This depends on  "'nm'"
generally, and for global destructors support: newlib

"nvptx: Implement '_exit' instead of 'exit'".

	gcc/
	* collect2.cc (write_c_file_glob): Allow for
	'COLLECT2_MAIN_REFERENCE' override.
	* config.gcc : Set 'use_collect2=yes'.
	* config/nvptx/nvptx.h: Adjust.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config.host : Add 'crtbegin.o',
	'crtend.o' to 'extra_parts'.
	* config/nvptx/crt0.c: Invoke '__do_global_ctors',
	'__do_global_dtors'.
	* config/nvptx/crtstuff.c: New.
	* config/nvptx/t-nvptx: Adjust.
---
 gcc/collect2.cc   |  4 ++
 gcc/config.gcc|  1 +
 gcc/config/nvptx/nvptx.h  | 35 ++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp |  3 +-
 libgcc/config.host|  2 +-
 libgcc/config/nvptx/crt0.c|  6 ++
 libgcc/config/nvptx/crtstuff.c| 58 +++
 libgcc/config/nvptx/t-nvptx   | 15 -
 9 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/nvptx/crtstuff.c

diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index d81c7f28f16a..945a9ff86dda 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -2238,8 +2238,12 @@ write_c_file_glob (FILE *stream, const char *name ATTRIBUTE_UNUSED)
 fprintf (stream, "\tdereg_frame,\n");
   fprintf (stream, "\t0\n};\n\n");
 
+# ifdef COLLECT2_MAIN_REFERENCE
+  fprintf (stream, "%s\n\n", COLLECT2_MAIN_REFERENCE);
+# else
   fprintf (stream, "extern entry_pt %s;\n", NAME__MAIN);
   fprintf (stream, "entry_pt *__main_reference = %s;\n\n", NAME__MAIN);
+# endif
 }
 #endif /* ! LD_INIT_SWITCH */
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 951902338205..fec67d7b6e40 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2784,6 +2784,7 @@ nvptx-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
 	tmake_file="nvptx/t-nvptx"
+	use_collect2=yes
 	if test x$enable_as_accelerator = xyes; then
 		extra_programs="${extra_programs} mkoffload\$(exeext)"
 		tm_file="${tm_file} nvptx/offload.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index dc676dcb5fc5..235c1e4d99d5 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,39 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
+#define STARTFILE_SPEC \
+  STARTFILE_SPEC_MMAINKERNEL \
+  " " STARTFILE_SPEC_CDTOR
+
+#define ENDFILE_SPEC \
+  ENDF

[PING] nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2')

2023-01-11 Thread Thomas Schwinge
Hi!

Ping.


Grüße
 Thomas


On 2022-12-23T14:37:47+0100, I wrote:
> Hi!
>
> On 2022-12-23T14:35:16+0100, I wrote:
>> On 2022-12-02T14:35:35+0100, I wrote:
>>> On 2022-12-01T22:13:38+0100, I wrote:
 I'm working on support for global constructors/destructors with
 GCC/nvptx
>>>
>>> See "nvptx: Support global constructors/destructors via 'collect2'"
>>> [posted before]
>>
>> Building on that, attached is now the additional "for offloading" piece:
>> "nvptx: Support global constructors/destructors via 'collect2' for 
>> offloading".
>> OK to push?
>
> Now really attached.
>
>> I did manually test this (by putting a few constructors/destructors into
>> 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed),
>> and also in my WIP development tree with standard libgfortran
>> constructors (with 'LIBGFOR_MINIMAL' disabled).
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 30 Nov 2022 22:09:35 +0100
Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2'
 for offloading

This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.

	libgcc/
	* config/nvptx/crtstuff.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/crtstuff.c |  64 ++-
 libgomp/plugin/plugin-nvptx.c  | 113 -
 2 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c
index 0823fc49901..8dc80687e0a 100644
--- a/libgcc/config/nvptx/crtstuff.c
+++ b/libgcc/config/nvptx/crtstuff.c
@@ -29,6 +29,14 @@
files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we
do so anyway, for symmetry with other configurations.  */
 
+
+/* See 'crt0.c', 'mgomp.c'.  */
+#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+#endif
+
+
 #ifdef CRT_BEGIN
 
 void
@@ -37,6 +45,33 @@ __do_global_ctors (void)
   DO_GLOBAL_CTORS_BODY;
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_ctors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __do_global_ctors ();
+}
+
+# endif
+
 #elif defined(CRT_END) /* ! CRT_BEGIN */
 
 void
@@ -45,7 +80,7 @@ __do_global_dtors (void)
   /* In this configuration here, there's no way that "this routine is run more
  than once [...] when exit is called recursively": for nvptx target, the
  call to '__do_global_dtors' is registered via 'atexit', which doesn't
- re-enter a function already run.
+ re-enter a function already run, and neither does nvptx offload target.
  Therefore, we do *not* "arrange to remember where in the list we left off
  processing".  */
   func_ptr *p;
@@ -53,6 +88,33 @@ __do_global_dtors (void)
 (*p++) ();
 }
 
+/* Need '.entry' wrapper for offloading.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __do_global_dtors ();
+}
+
+# else
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __do_global_dtors ();
+}
+
+# endif
+
 #else /* ! CRT_BEGIN && ! CRT_END */
 #error "One of CRT_BEGIN or CRT_END must be defined."
 #endif
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index fcc97c6e0d5..395639537e8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -338,6 +338,11 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -557,6 +562,17 @@ nvptx_close_

[PATCH] tree-optimization/108352 - FSM threads creating irreducible loops

2023-01-11 Thread Richard Biener via Gcc-patches
The following relaxes a heuristic that prevents creating irreducible
loops from FSM threads not covering multi-way branches.  Instead of
allowing threads that adhere to

  && (n_insns * (unsigned) param_fsm_scale_path_stmts
  > (m_path.length () *
 (unsigned) param_fsm_scale_path_blocks))

with reasoning "We also consider it worth creating an irreducible inner loop if
the number of copied statement is low relative to the length of the path --
in that case there's little the traditional loop optimizer would have done
anyway, so an irreducible loop is not so bad." that I cannot make much
sense of the following patch changes that to only allow those after
loop optimization and when they are (scaled) short:

  && (!(cfun->curr_properties & PROP_loop_opts_done)
  || (m_n_insns * param_fsm_scale_path_stmts
  >= param_max_jump_thread_duplication_stmts)))

This allows us to get rid of --param fsm-scale-path-blocks which
previous to the bisected revision allowed an enlarged path covering
the original allowance (but we do not consider that enlarged path
now because enlarging it doesn't add any information).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/108352
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): Adjust
heuristic that allows non-multi-way branch threads creating
irreducible loops.
* doc/invoke.texi (--param fsm-scale-path-blocks): Remove.
(--param fsm-scale-path-stmts): Adjust.
* params.opt (--param=fsm-scale-path-blocks=): Remove.
(-param=fsm-scale-path-stmts=): Adjust description.

* gcc.dg/tree-ssa/ssa-thread-21.c: New testcase.
* gcc.dg/tree-ssa/vrp46.c: Remove --param fsm-scale-path-blocks=1.
---
 gcc/doc/invoke.texi   |  7 ++---
 gcc/params.opt|  6 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-21.c | 26 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp46.c |  2 +-
 gcc/tree-ssa-threadbackward.cc| 18 +
 5 files changed, 37 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-21.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 80d942917bd..701c228bd0a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15981,16 +15981,13 @@ Max. size of loc list for which reverse ops should be 
added.
 
 @item fsm-scale-path-stmts
 Scale factor to apply to the number of statements in a threading path
-when comparing to the number of (scaled) blocks.
+crossing a loop backedge when comparing to
+@option{--param=max-jump-thread-duplication-stmts}.
 
 @item uninit-control-dep-attempts
 Maximum number of nested calls to search for control dependencies
 during uninitialized variable analysis.
 
-@item fsm-scale-path-blocks
-Scale factor to apply to the number of blocks in a threading path
-when comparing to the number of (scaled) statements.
-
 @item sched-autopref-queue-depth
 Hardware autoprefetcher scheduler model control flag.
 Number of lookahead cycles the model looks into; at '
diff --git a/gcc/params.opt b/gcc/params.opt
index e178dec1600..929131254d2 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -134,13 +134,9 @@ Maximum number of basic blocks before EVRP uses a sparse 
cache.
 Common Joined UInteger Var(param_evrp_switch_limit) Init(50) Optimization Param
 Maximum number of outgoing edges in a switch before EVRP will not process it.
 
--param=fsm-scale-path-blocks=
-Common Joined UInteger Var(param_fsm_scale_path_blocks) Init(3) 
IntegerRange(1, 10) Param Optimization
-Scale factor to apply to the number of blocks in a threading path when 
comparing to the number of (scaled) statements.
-
 -param=fsm-scale-path-stmts=
 Common Joined UInteger Var(param_fsm_scale_path_stmts) Init(2) IntegerRange(1, 
10) Param Optimization
-Scale factor to apply to the number of statements in a threading path when 
comparing to the number of (scaled) blocks.
+Scale factor to apply to the number of statements in a threading path crossing 
a loop backedge when comparing to max-jump-thread-duplication-stmts.
 
 -param=gcse-after-reload-critical-fraction=
 Common Joined UInteger Var(param_gcse_after_reload_critical_fraction) Init(10) 
Param Optimization
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-21.c
new file mode 100644
index 000..16537ccfb61
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-21.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-thread2-stats -fdump-tree-optimized" } */
+
+long a;
+int b;
+void bar64_(void);
+void foo();
+int main() {
+  char c = 0;
+  unsigned d = 10;
+  int e = 2;
+  for (; d; d--) {
+bar64_();
+b = d;
+e && (c = (e = 0) != 4) > 1;
+  }
+  if (c < 1)
+foo();
+  a = b;
+}
+
+/* We need to perform a non-multi-way bran

Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2023-01-11 Thread Wilco Dijkstra via Gcc-patches
Hi,

> On 1/10/23 19:12, Jakub Jelinek via Gcc-patches wrote:
>> Anyway, the sooner this makes it into gcc trunk, the better, it breaks quite
>> a lot of stuff.
>
> Yep, please, we're also waiting for this patch for pushing to our gcc13 
> package.

Well I'm waiting for an OK from a maintainer... I believe Jakub can approve it 
as well.

Cheers,
Wilco

Re: [PATCH] Fix PR tree-optimization/108199

2023-01-11 Thread Richard Biener via Gcc-patches
On Wed, Jan 11, 2023 at 11:00 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> this fixes the problematic interaction between bitfields, unions, SSO and SRA.
>
> Tested on x86-64/Linux and SPARC/Solaris, OK for all active branches?

OK.

Thanks,
Richard.

>
> 2023-01-11  Eric Botcazou  
> Andreas Krebbel  
>
> PR tree-optimization/108199
> * tree-sra.cc (sra_modify_expr): Deal with reverse storage order
> for bit-field references.
>
>
> 2023-01-11  Eric Botcazou  
>
> * gcc.dg/sso-17.c: New test.
>
> --
> Eric Botcazou


Re: [PATCH] switch expansion: limit JT growth param values

2023-01-11 Thread Richard Biener via Gcc-patches
On Wed, Jan 11, 2023 at 11:31 AM Martin Liška  wrote:
>
> Currently, one can request a huge jump table creation which
> leads to a non-sensual huge output. Moreover, use auto_vec rather
> than a stack-allocated array.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> PR middle-end/107976
>
> gcc/ChangeLog:
>
> * params.opt: Limit JT params.
> * stmt.cc (emit_case_dispatch_table): Use auto_vec.
> ---
>  gcc/params.opt | 4 ++--
>  gcc/stmt.cc| 9 -
>  2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/params.opt b/gcc/params.opt
> index e178dec1600..3454700eb91 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -327,11 +327,11 @@ Common Joined UInteger 
> Var(param_iv_max_considered_uses) Init(250) Param Optimiz
>  Bound on number of iv uses in loop optimized in iv optimizations.
>
>  -param=jump-table-max-growth-ratio-for-size=
> -Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_size) 
> Init(300) Param Optimization
> +Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_size) 
> Init(300) IntegerRange(0, 1) Param Optimization
>  The maximum code size growth ratio when expanding into a jump table (in 
> percent).  The parameter is used when optimizing for size.
>
>  -param=jump-table-max-growth-ratio-for-speed=
> -Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_speed) 
> Init(800) Param Optimization
> +Common Joined UInteger Var(param_jump_table_max_growth_ratio_for_speed) 
> Init(800) IntegerRange(0, 1) Param Optimization
>  The maximum code size growth ratio when expanding into a jump table (in 
> percent).  The parameter is used when optimizing for speed.
>
>  -param=l1-cache-line-size=
> diff --git a/gcc/stmt.cc b/gcc/stmt.cc
> index 82a3e1035ec..b239c02018a 100644
> --- a/gcc/stmt.cc
> +++ b/gcc/stmt.cc
> @@ -746,7 +746,7 @@ emit_case_dispatch_table (tree index_expr, tree 
> index_type,
>   tree range, basic_block stmt_bb)
>  {
>int i, ncases;
> -  rtx *labelvec;
> +  auto_vec labelvec;
>rtx_insn *fallback_label = label_rtx (case_list[0].m_code_label);
>rtx_code_label *table_label = gen_label_rtx ();
>bool has_gaps = false;
> @@ -779,8 +779,7 @@ emit_case_dispatch_table (tree index_expr, tree 
> index_type,
>/* Get table of labels to jump to, in order of case index.  */
>
>ncases = tree_to_shwi (range) + 1;
> -  labelvec = XALLOCAVEC (rtx, ncases);
> -  memset (labelvec, 0, ncases * sizeof (rtx));
> +  labelvec.safe_grow_cleared (ncases);
>
>for (unsigned j = 0; j < case_list.length (); j++)
>  {
> @@ -860,11 +859,11 @@ emit_case_dispatch_table (tree index_expr, tree 
> index_type,
>  emit_jump_table_data (gen_rtx_ADDR_DIFF_VEC (CASE_VECTOR_MODE,
>  gen_rtx_LABEL_REF (Pmode,
> 
> table_label),
> -gen_rtvec_v (ncases, 
> labelvec),
> +gen_rtvec_v (ncases, 
> labelvec.address ()),
>  const0_rtx, const0_rtx));
>else
>  emit_jump_table_data (gen_rtx_ADDR_VEC (CASE_VECTOR_MODE,
> -   gen_rtvec_v (ncases, labelvec)));
> +   gen_rtvec_v (ncases, 
> labelvec.address (;
>
>/* Record no drop-through after the table.  */
>emit_barrier ();
> --
> 2.39.0
>


[PING] nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)

2023-01-11 Thread Thomas Schwinge
Hi!

Ping -- the '-mframe-malloc-threshold' idea, at least.

Note that while this issue originally did pop up for Fortran I/O, it's
likewise relevant for other functions that maintain big frames, for
example in newlib:

libc/string/libc_a-memmem.o:.local .align 16 .b8 %frame_ar[2064];
libc/string/libc_a-strcasestr.o:.local .align 16 .b8 %frame_ar[2064];
libc/string/libc_a-strstr.o:.local .align 16 .b8 %frame_ar[2064];
libm/math/libm_a-k_rem_pio2.o:.local .align 16 .b8 %frame_ar[560];

Therefore a generic solution (or, workaround if you'd like) does seem
appropriate.


Grüße
 Thomas


On 2022-12-23T15:08:06+0100, I wrote:
> Hi!
>
> On 2022-11-11T15:35:44+0100, Richard Biener via Fortran  
> wrote:
>> On Fri, Nov 11, 2022 at 3:13 PM Thomas Schwinge  
>> wrote:
>>> For example, for Fortran code like:
>>>
>>> write (*,*) "Hello world"
>>>
>>> ..., 'gfortran' creates:
>>>
>>> struct __st_parameter_dt dt_parm.0;
>>>
>>> try
>>>   {
>>> dt_parm.0.common.filename = 
>>> &"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 
>>> sz: 1};
>>> dt_parm.0.common.line = 29;
>>> dt_parm.0.common.flags = 128;
>>> dt_parm.0.common.unit = 6;
>>> _gfortran_st_write (&dt_parm.0);
>>> _gfortran_transfer_character_write (&dt_parm.0, &"Hello 
>>> world"[1]{lb: 1 sz: 1}, 11);
>>> _gfortran_st_write_done (&dt_parm.0);
>>>   }
>>> finally
>>>   {
>>> dt_parm.0 = {CLOBBER(eol)};
>>>   }
>>>
>>> The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
>>> really! -- there's a lot of state in Fortran I/O apparently).  That's a
>>> problem for GPU execution -- here: OpenACC/nvptx -- where typically you
>>> have small stacks.  (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
>>> GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
>>> "Use custom stacks instead of local memory for automatic storage".)
>>>
>>> Now, the Nvidia Driver tries to accomodate for such largish stack usage,
>>> and dynamically increases the per-thread stack as necessary (thereby
>>> potentially reducing parallelism) -- if it manages to understand the call
>>> graph.  In case of libgfortran I/O, it evidently doesn't.  Not being able
>>> to disprove existance of recursion is the common problem, as I've read.
>>> At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:
>>>
>>> warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be 
>>> statically determined
>>>
>>> That's still not an actual problem: if the GPU kernel's stack usage still
>>> fits into 1 KiB.  Very often it does, but if, as happens in libgfortran
>>> I/O handling, there is another such 'dt_parm' put onto the stack, the
>>> stack then overflows; device-side SIGSEGV.
>>>
>>> (There is, by the way, some similar analysis by Tom de Vries in
>>>  "[nvptx, openacc, openmp, testsuite]
>>> Recursive tests may fail due to thread stack limit".)
>>>
>>> Of course, you shouldn't really be doing I/O in GPU kernels, but people
>>> do like their occasional "'printf' debugging", so we ought to make that
>>> work (... without pessimizing any "normal" code).
>>>
>>> I assume that generally reducing the size of 'dt_parm' etc. is out of
>>> scope.
>>>
>>> There is a way to manually set a per-thread stack size, but it's not
>>> obvious which size to set: that sizes needs to work for the whole GPU
>>> kernel, and should be as low as possible (to maximize parallelism).
>>> I assume that even if GCC did an accurate call graph analysis of the GPU
>>> kernel's maximum stack usage, that still wouldn't help: that's before the
>>> PTX JIT does its own code transformations, including stack spilling.
>>>
>>> There exists a 'CU_JIT_LTO' flag to "Enable link-time optimization
>>> (-dlto) for device code".  This might help, assuming that it manages to
>>> simplify the libgfortran I/O code such that the PTX JIT then understands
>>> the call graph.  But: that's available only starting with recent
>>> CUDA 11.4, so not a general solution -- if it works at all, which I've
>>> not tested.
>>>
>>> Similarly, we could enable GCC's LTO for device code generation -- but
>>> that's a big project, out of scope at this time.  And again, we don't
>>> know if that at all helps this case.
>>>
>>> I see a few options:
>>>
>>> (a) Figure out what it is in the libgfortran I/O implementation that
>>> causes "Stack size [...] cannot be statically determined", and re-work
>>> that code to avoid that, or even disable certain things for nvptx, if
>>> feasible.
>
>> Shrink st_parameter_dt (it's part of the ABI though, kind of).  Lots of the
>> bloat is from things that are unused for simpler I/O cases (so some
>> "inheritance" could help), and lots of the bloat is from using
>> string/length pairs using char * + size_t for what looks like could be
>> encoded a lot more efficiently.
>>
>> There's probably not much low-hanging fruit.
>
> (Sim

Re: [PATCH] tree-optimization/107767 - not profitable switch conversion

2023-01-11 Thread Martin Liška
On 1/9/23 12:09, Richard Biener wrote:
> |Martin, OK with you?|

Yes, thanks for handling that.

Martin


Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 11, 2023 at 05:10:50AM -0500, NightStrike wrote:
> Ok, then:
> 
> /* { dg-do run { target { { ilp32 || lp64 } || llp64 } } } */
> 
> or even:
> 
> /* { dg-do run { target { ! int16 } } } */
> 
> Though I'd point out that in your original message, you only cared
> about the "important targets".  I don't think nonexistent ones where
> sizeof(int) == 8 qualifies :)

I've committed following after regtesting it on x86_64-linux and i686-linux:

2023-01-11  Jakub Jelinek  

PR target/108308
* gcc.dg/pr108308.c: Use int32 target rather than { ilp32 || lp64 }.

--- gcc/testsuite/gcc.dg/pr108308.c.jj  2023-01-06 10:52:24.982461493 +0100
+++ gcc/testsuite/gcc.dg/pr108308.c 2023-01-11 13:04:51.036789536 +0100
@@ -1,5 +1,5 @@
 /* PR target/108308 */
-/* { dg-do run { target { ilp32 || lp64 } } } */
+/* { dg-do run { target int32 } } */
 /* { dg-options "-Os -fno-tree-ccp" } */
 
 int a = 1, *d = &a, f = 2766708631, h;


Jakub



Re: [committed] testsuite: Add testcases from PR108292 and PR108308

2023-01-11 Thread NightStrike via Gcc-patches
On Wed, Jan 11, 2023 at 7:14 AM Jakub Jelinek  wrote:

> I've committed following after regtesting it on x86_64-linux and i686-linux:
...
> +/* { dg-do run { target int32 } } */

Ah, I didn't realize you meant literally int32.  I didn't see that as
a choice here:
https://gcc.gnu.org/onlinedocs/gccint/Effective-Target-Keywords.html

Is that a case of missing documentation?


Re: [PING] Add '-Wno-complain-wrong-lang', and use it in 'gcc/testsuite/lib/target-supports.exp:check_compile' and elsewhere

2023-01-11 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Jan 11, 2023 at 12:41:06PM +0100, Thomas Schwinge wrote:

I think this should be reviewed by Joseph as option handling maintainer.

> @@ -8896,6 +8897,13 @@ programs.
>  Warn for variables that might be changed by @code{longjmp} or
>  @code{vfork}.  This warning is also enabled by @option{-Wextra}.
>  
> +@item -Wno-complain-wrong-lang
> +@opindex Wcomplain-wrong-lang
> +@opindex Wno-complain-wrong-lang
> +By default, we complain about command-line options that are not valid
> +for this front end.
> +This may be disabled with @code{-Wno-complain-wrong-lang}.

I think this description is too short and confusing, it isn't clear what
"this" front end is, perhaps say "that are not valid for a front end
which compiles a particular source file"?
And certainly give an example and more explanation that the option is
mostly useful when a single compiler driver invocation is compiling
multiple sources written in different languages.

Jakub



Re: [PATCH] rs6000: Make P10_FUSION honour tuning setting

2023-01-11 Thread Kewen.Lin via Gcc-patches
on 2023/1/6 17:28, Kewen.Lin via Gcc-patches wrote:
> Hi Pat,
> 
> on 2023/1/6 03:30, Pat Haugen wrote:
>> On 1/4/23 3:20 AM, Kewen.Lin via Gcc-patches wrote:
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 88c865b6b4b..6fa084c0807 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -4378,9 +4378,15 @@ rs6000_option_override_internal (bool global_init_p)
>>>     rs6000_isa_flags &= ~OPTION_MASK_MMA;
>>>   }
>>>
>>> -  if (TARGET_POWER10
>>> -  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
>>> -    rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
>>> +  /* Enable power10 fusion if we are tuning for power10, even if we aren't
>>> + generating power10 instructions.  */
>>> +  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION))
>>> +    {
>>> +  if (processor_target_table[tune_index].processor == 
>>> PROCESSOR_POWER10)
>>
>> You can use (rs6000_tune == PROCESSOR_POWER10) at this point.
> 
> Good catch, I will update it.  Thanks!

Committed the updated version (as attached) in r13-5107-g6224db0e4d6d3b.

BR,
Kewen



Subject: [PATCH] rs6000: Make P10_FUSION honour tuning setting

We noticed this issue when Segher reviewed the patch for
PR104024.  When there is no explicit setting for option
-mpower10-fusion, we enable OPTION_MASK_P10_FUSION for
TARGET_POWER10.  But it's not right, it should honour
tuning setting instead.

This patch is to fix it accordingly, it's bootstrapped
, and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9.

But on powerpc64le-linux-gnu P10 it had one regression
failure against the test case gcc.target/powerpc/pr105586.c.
I looked into it and confirmed that a latent bug was
exposed and filed one separated bug PR108273 instead.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Make
OPTION_MASK_P10_FUSION implicit setting honour Power10 tuning setting.
* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Remove
OPTION_MASK_P10_FUSION.
---
 gcc/config/rs6000/rs6000-cpus.def |  3 +--
 gcc/config/rs6000/rs6000.cc   | 12 +---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index c3825bcccd8..4d5544e927a 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -84,8 +84,7 @@
 
 #define ISA_3_1_MASKS_SERVER   (ISA_3_0_MASKS_SERVER   \
 | OPTION_MASK_POWER10  \
-| OTHER_POWER10_MASKS  \
-| OPTION_MASK_P10_FUSION)
+| OTHER_POWER10_MASKS)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6ac3adcec6b..3baa2c3b7b0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4397,9 +4397,15 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_isa_flags &= ~OPTION_MASK_MMA;
 }
 
-  if (TARGET_POWER10
-  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
-rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+  /* Enable power10 fusion if we are tuning for power10, even if we aren't
+ generating power10 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION))
+{
+  if (rs6000_tune == PROCESSOR_POWER10)
+   rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+  else
+   rs6000_isa_flags &= ~OPTION_MASK_P10_FUSION;
+}
 
   /* MMA requires SIMD support as ISA 3.1 claims and our implementation
  such as "*movoo" uses vector pair access which use VSX registers.
-- 
2.34.1



[PATCH, committed] rs6000/test: Make ppc-fortran.exp only available for PowerPC target

2023-01-11 Thread Kewen.Lin via Gcc-patches
Hi,

When testing one patch which adds a fortran test case into
test bucket powerpc/ppc-fortran/, I found one unexpected
failure on a non-PowerPC target.  It's due to that
ppc-fortran.exp does not exit early if the testing target
isn't a PowerPC target.  This patch is to make it exit
immediately if the testing target isn't a PowerPC target.

Tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

Committed in r13-5108-gde99049f6fe534.

BR,
Kewen
-
gcc/testsuite/ChangeLog:

* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: Exit immediately if
the testing target isn't a PowerPC target.
---
 gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp 
b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
index bd7ad95ad0d..ded643b56bf 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
@@ -16,6 +16,11 @@

 # GCC testsuite that uses the `dg.exp' driver.

+# Exit immediately if this isn't a PowerPC target.
+if { ![istarget powerpc*-*-*] && ![istarget rs6000-*-*] } then {
+  return
+}
+
 # Load support procs.
 load_lib gfortran-dg.exp

--
2.34.1


[ping2][PATCH 0/2] __bos and flex arrays

2023-01-11 Thread Siddhesh Poyarekar

Ping!

On 2022-12-21 17:25, Siddhesh Poyarekar wrote:

Hi,

The first patch in the series is just a minor test cleanup that I did to
make sure all tests in a test case run (instead of aborting at first
failure) and print the ones that failed.  The second patch is the actual
fix.

The patch intends to make __bos/__bdos do the right thing with structs
containing flex arrays, either directly or within nested structs and
unions.  This should improve minimum object size estimation in some
cases and also bail out more consistently so that flex arrays don't
cause false positives in fortification.

I've tested this with a bootstrap on x86_64 and also with
--with-build-config=bootstrap-ubsan to make sure that there are no new
failures due to this change.

Siddhesh Poyarekar (2):
   testsuite: Run __bos tests to completion
   tree-object-size: More consistent behaviour with flex arrays

  .../g++.dg/ext/builtin-object-size1.C | 267 
  .../g++.dg/ext/builtin-object-size2.C | 267 
  .../gcc.dg/builtin-dynamic-object-size-0.c|  14 +-
  gcc/testsuite/gcc.dg/builtin-object-size-1.c  | 263 
  gcc/testsuite/gcc.dg/builtin-object-size-12.c |  12 +-
  gcc/testsuite/gcc.dg/builtin-object-size-13.c |  17 +-
  gcc/testsuite/gcc.dg/builtin-object-size-15.c |  11 +-
  gcc/testsuite/gcc.dg/builtin-object-size-2.c  | 287 +-
  gcc/testsuite/gcc.dg/builtin-object-size-3.c  | 263 
  gcc/testsuite/gcc.dg/builtin-object-size-4.c  | 267 
  gcc/testsuite/gcc.dg/builtin-object-size-6.c  | 267 
  gcc/testsuite/gcc.dg/builtin-object-size-7.c  |  52 ++--
  gcc/testsuite/gcc.dg/builtin-object-size-8.c  |  17 +-
  .../gcc.dg/builtin-object-size-common.h   |  12 +
  .../gcc.dg/builtin-object-size-flex-common.h  |  90 ++
  ...n-object-size-flex-nested-struct-nonzero.c |   6 +
  ...ltin-object-size-flex-nested-struct-zero.c |   6 +
  .../builtin-object-size-flex-nested-struct.c  |  22 ++
  ...in-object-size-flex-nested-union-nonzero.c |   6 +
  ...iltin-object-size-flex-nested-union-zero.c |   6 +
  .../builtin-object-size-flex-nested-union.c   |  28 ++
  .../gcc.dg/builtin-object-size-flex-nonzero.c |   6 +
  .../gcc.dg/builtin-object-size-flex-zero.c|   6 +
  .../gcc.dg/builtin-object-size-flex.c |  18 ++
  gcc/testsuite/gcc.dg/pr101836.c   |  11 +-
  gcc/testsuite/gcc.dg/strict-flex-array-3.c|  11 +-
  gcc/tree-object-size.cc   | 150 -
  27 files changed, 1275 insertions(+), 1107 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-common.h
  create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-flex-common.h
  create mode 100644 
gcc/testsuite/gcc.dg/builtin-object-size-flex-nested-struct-nonzero.c
  create mode 100644 
gcc/testsuite/gcc.dg/builtin-object-size-flex-nested-struct-zero.c
  create mode 100644 
gcc/testsuite/gcc.dg/builtin-object-size-flex-nested-struct.c
  create mode 100644 
gcc/testsuite/gcc.dg/builtin-object-size-flex-nested-union-nonzero.c
  create mode 100644 
gcc/testsuite/gcc.dg/builtin-object-size-flex-nested-union-zero.c
  create mode 100644 
gcc/testsuite/gcc.dg/builtin-object-size-flex-nested-union.c
  create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-flex-nonzero.c
  create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-flex-zero.c
  create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-flex.c



[PATCH] rs6000: Imply VSX early to adopt some checkings on conflict [PR108240]

2023-01-11 Thread Kewen.Lin via Gcc-patches
Hi,

As PR108240 shows, some options like -mmodulo can enable some
flags implicitly including OPTION_MASK_VSX.  But the enabled
flag can conflict with some existing setting like soft float,
it would result in some unexpected cases and consequent ICE.
Actually there are already some checkings for VSX vs. soft
float and no altivec etc., but unfortunately they happens
ahead of the implicit enablements, and we can not postpone them
since they can affect the generation of ignore_masks which is
used in the following implicit enablements.

This patch is to imply OPTION_MASK_VSX early for those options
which will enable it implicitly later, put it right before the
checkings for the possible conflicts with TARGET_VSX, then it's
able to check if there are some possible conflicts, and prevent
the unexpected (incompatible) cases if yes, otherwise it would
be just the same as before (both would have it implied just at
the different timings).

Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-

PR target/108240

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Enable
OPTION_MASK_VSX early to make the checkings on VSX conflicts take
effect.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/ppc-fortran/pr108240.f90: New test.
---
 gcc/config/rs6000/rs6000.cc   | 17 
 .../powerpc/ppc-fortran/pr108240.f90  | 27 +++
 2 files changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr108240.f90

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6ac3adcec6b..c3582976521 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3794,6 +3794,23 @@ rs6000_option_override_internal (bool global_init_p)
   & OPTION_MASK_DIRECT_MOVE))
 rs6000_isa_flags |= ~rs6000_isa_flags_explicit & OPTION_MASK_STRICT_ALIGN;

+  /* Some options can enable OPTION_MASK_VSX implicitly, but the implicit
+ enablement is after the below checking and adjustment for TARGET_VSX,
+ it can result in some unexpected situation, like PR108240 without
+ hard float but vsx support, which is supposed to be checked and then
+ prevented in the below handlings for TARGET_VSX.  So for any options
+ which can imply OPTION_MASK_VSX later, we want to imply it first here
+ to make the following checking take effects.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_VSX)
+  && (TARGET_P9_VECTOR
+ || TARGET_MODULO
+ || TARGET_P9_MISC
+ || TARGET_P9_MINMAX
+ || TARGET_P8_VECTOR
+ || TARGET_DIRECT_MOVE
+ || TARGET_CRYPTO))
+rs6000_isa_flags |= OPTION_MASK_VSX;
+
   /* Add some warnings for VSX.  */
   if (TARGET_VSX)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr108240.f90 
b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr108240.f90
new file mode 100644
index 000..d5ae80321ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr108240.f90
@@ -0,0 +1,27 @@
+! { dg-options "-mmodulo -mcpu=401" }
+! This need one explicit 64 bit option on 64 bit environment
+! to avoid possible error or warning message.
+! { dg-additional-options "-m64" { target lp64 } }
+
+! Verify there is no ICE on 64 bit environment.
+
+program main
+  implicit none
+  integer, parameter :: n=4
+  character(len=4), dimension(n,n) :: c
+  integer, dimension(n,n) :: a
+  integer, dimension(2) :: res1, res2
+  real, dimension(n,n) :: r
+  integer :: i,j
+  character(len=4,kind=4), dimension(n,n) :: c4
+
+  call random_number (r)
+  a = int(r*100)
+
+  do j=1,n
+ do i=1,n
+write (*,*) a(i,j)
+ end do
+  end do
+
+end program main
--
2.37.0


Re: [PATCH v3] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2023-01-11 Thread Manolis Tsamis
Hi Richard and Tamar,

I just wanted to ping you about this patch. Is there a chance to get
this into GCC13?

Thanks,
Manolis

On Tue, Dec 20, 2022 at 2:31 PM Manolis Tsamis  wrote:
>
> On Tue, Dec 20, 2022 at 2:23 PM Manolis Tsamis  
> wrote:
> >
> > When using SWAR (SIMD in a register) techniques a comparison operation 
> > within
> > such a register can be made by using a combination of shifts, bitwise and 
> > and
> > multiplication. If code using this scheme is vectorized then there is 
> > potential
> > to replace all these operations with a single vector comparison, by 
> > reinterpreting
> > the vector types to match the width of the SWAR register.
> >
> > For example, for the test function packed_cmp_16_32, the original generated 
> > code is:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > ushrv0.4s, v0.4s, 15
> > and v0.16b, v0.16b, v2.16b
> > shl v1.4s, v0.4s, 16
> > sub v0.4s, v1.4s, v0.4s
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > with this pattern the above can be optimized to:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > cmltv0.8h, v0.8h, #0
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > The effect is similar for x86-64.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Simplify vector shift + bit_and + multiply in some 
> > cases.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> >
> > ---
> >
> > Changes in v3:
> > - Changed pattern to use vec_cond_expr.
> > - Changed pattern to work with VLA vector.
> > - Added both expand_vec_cmp_expr_p and
> >   expand_vec_cond_expr_p check.
> > - Fixed type compatibility issues.
> >
> >  gcc/match.pd  | 61 
> >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
> >  2 files changed, 133 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 67a0a682f31..320437f8aa3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -301,6 +301,67 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (view_convert (bit_and:itype (view_convert @0)
> >  (ne @1 { build_zero_cst (type); })))
> >
> > +/* In SWAR (SIMD within a register) code a signed comparison of packed data
> > +   can be constructed with a particular combination of shift, bitwise and,
> > +   and multiplication by constants.  If that code is vectorized we can
> > +   convert this pattern into a more efficient vector comparison.  */
> > +(simplify
> > + (mult (bit_and (rshift @0 uniform_integer_cst_p@1)
> > +   uniform_integer_cst_p@2)
> > +uniform_integer_cst_p@3)
> > + (with {
> > +   tree rshift_cst = uniform_integer_cst_p (@1);
> > +   tree bit_and_cst = uniform_integer_cst_p (@2);
> > +   tree mult_cst = uniform_integer_cst_p (@3);
> > +  }
> > +  /* Make sure we're working with vectors and uniform vector constants.  */
> > +  (if (VECTOR_TYPE_P (type)
> > +   && tree_fits_uhwi_p (rshift_cst)
> > +   && tree_fits_uhwi_p (mult_cst)
> > +   && tree_fits_uhwi_p (bit_and_cst))
> > +   /* Compute what constants would be needed for this to represent a packed
> > +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> > +   (with {
> > + HOST_WIDE_INT vec_elem_bits = vector_element_bits (type);
> > + poly_int64 vec_nelts = TYPE_VECTOR_SUBPARTS (type);
> > + poly_int64 vec_bits = vec_elem_bits * vec_nelts;
> > + unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
> > + unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
> > + cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
> > + mult_i = tree_to_uhwi (mult_cst);
> > + target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
> > + bit_and_i = tree_to_uhwi (bit_and_cst);
> > + target_bit_and_i = 0;
> > +
> > + /* The bit pattern in BIT_AND_I should be a mask for the least
> > +   significant bit of each packed element that is CMP_BITS wide.  */
> > + for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
> > +   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
> > +}
> > +(if ((exact_log2 (cmp_bits_i)) >= 0
> > +&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
> > +&& multiple_p (vec_bits, cmp_bits_i)
> > +&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
> > +&& target_mult_i == mult_i
> > +&& target_bit_and_i == bit_and_i)
> > + /* Compute the vector shape for the comparison and check if the 
> > target is
> > +   able to expand the comparison with that type.  */
> > + (with {
> > +   /* We're doing a signed comparison.  */
> > +   tree cmp_type = build_nonstandard_integer_type (cmp

[PATCH v3 2/2] aarch64: Fix bit-field alignment in param passing [PR105549]

2023-01-11 Thread Christophe Lyon via Gcc-patches
While working on enabling DFP for AArch64, I noticed new failures in
gcc.dg/compat/struct-layout-1.exp (t028) which were not actually
caused by DFP types handling. These tests are generated during 'make
check' and enabling DFP made generation different (not sure if new
non-DFP tests are generated, or if existing ones are generated
differently, the tests in question are huge and difficult to compare).

Anyway, I reduced the problem to what I attach at the end of the new
gcc.target/aarch64/aapcs64/va_arg-17.c test and rewrote it in the same
scheme as other va_arg* AArch64 tests.  Richard Sandiford further
reduced this to a non-vararg function, added as a second testcase.

This is a tough case mixing bit-fields and alignment, where
aarch64_function_arg_alignment did not follow what its descriptive
comment says: we want to use the natural alignment of the bit-field
type only if the user didn't reduce the alignment for the bit-field
itself.

The patch also adds a comment and assert that would help someone who
has to look at this area again.

The fix would be very small, except that this introduces a new ABI
break, and we have to warn about that.  Since this actually fixes a
problem introduced in GCC 9.1, we keep the old computation to detect
when we now behave differently.

This patch adds two new tests (va_arg-17.c and
pr105549.c). va_arg-17.c contains the reduced offending testcase from
struct-layout-1.exp for reference.  We update some tests introduced by
the previous patch, where parameters with bit-fields and packed
attribute now emit a different warning.

v2->v3: testcase update

2022-11-28  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/105549
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment):
Check DECL_PACKED for bitfield.
(aarch64_layout_arg): Warn when parameter passing ABI changes.
(aarch64_function_arg_boundary): Do not warn here.
(aarch64_gimplify_va_arg_expr): Warn when parameter passing ABI
changes.

gcc/testsuite/
PR target/105549
* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: Update.
* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: Update.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: Update.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: Update.
* gcc.target/aarch64/aapcs64/va_arg-17.c: New test.
* gcc.target/aarch64/pr105549.c: New test.
* g++.target/aarch64/bitfield-abi-warning-align16-O2.C: Update.
* g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: Update.
* g++.target/aarch64/bitfield-abi-warning-align32-O2.C: Update.
* g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: Update.
---
 gcc/config/aarch64/aarch64.cc | 148 ++
 .../bitfield-abi-warning-align16-O2-extra.C   |  64 
 .../aarch64/bitfield-abi-warning-align16-O2.C |  48 +++---
 .../bitfield-abi-warning-align32-O2-extra.C   | 131 +++-
 .../aarch64/bitfield-abi-warning-align32-O2.C | 132 
 .../gcc.target/aarch64/aapcs64/va_arg-17.c| 105 +
 .../bitfield-abi-warning-align16-O2-extra.c   |  64 
 .../aarch64/bitfield-abi-warning-align16-O2.c |  48 +++---
 .../bitfield-abi-warning-align32-O2-extra.c   | 131 +++-
 .../aarch64/bitfield-abi-warning-align32-O2.c | 132 
 gcc/testsuite/gcc.target/aarch64/pr105549.c   |  12 ++
 11 files changed, 587 insertions(+), 428 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/aapcs64/va_arg-17.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr105549.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3623df5bd94..a6d95dd85bf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7265,14 +7265,18 @@ aarch64_vfp_is_call_candidate (cumulative_args_t 
pcum_v, machine_mode mode,
bits.  The idea is to suppress any stronger alignment requested by
the user and opt for the natural alignment (specified in AAPCS64 \S
4.1).  ABI_BREAK is set to the old alignment if the alignment was
-   incorrectly calculated in versions of GCC prior to GCC-9.  This is
-   a helper function for local use only.  */
+   incorrectly calculated in versions of GCC prior to GCC-9.
+   ABI_BREAK_PACKED is set to the old alignment if it was incorrectly
+   calculated in versions between GCC-9 and GCC-13.  This is a helper
+   function for local use only.  */
 
 static unsigned int
 aarch64_function_arg_alignment (machine_mode mode, const_tree type,
-   unsigned int *abi_break)
+   unsigned int *abi_break,
+   unsigned int *abi_break_packed)
 {
   *abi_break = 0;
+  *abi_break_packed = 0;
   if (!type)
 return GET_MODE_ALIGNMENT (mode);
 
@@ -7288,6 +7292,7 @@ aarch64_function_arg_alignment (m

[PATCH v3 1/2] aarch64: fix warning emission for ABI break since GCC 9.1

2023-01-11 Thread Christophe Lyon via Gcc-patches
While looking at PR 105549, which is about fixing the ABI break
introduced in GCC 9.1 in parameter alignment with bit-fields, we
noticed that the GCC 9.1 warning is not emitted in all the cases where
it should be.  This patch fixes that and the next patch in the series
fixes the GCC 9.1 break.

We split this into two patches since patch #2 introduces a new ABI
break starting with GCC 13.1.  This way, patch #1 can be back-ported
to release branches if needed to fix the GCC 9.1 warning issue.

The main idea is to add a new global boolean that indicates whether
we're expanding the start of a function, so that aarch64_layout_arg
can emit warnings for callees as well as callers.  This removes the
need for aarch64_function_arg_boundary to warn (with its incomplete
information).  However, in the first patch there are still cases where
we emit warnings were we should not; this is fixed in patch #2 where
we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly.

The fix in aarch64_function_arg_boundary (replacing & with &&) looks
like an oversight of a previous commit in this area which changed
'abi_break' from a boolean to an integer.

We also take the opportunity to fix the comment above
aarch64_function_arg_alignment since the value of the abi_break
parameter was changed in a previous commit, no longer matching the
description.

v2->v3: removed a bogus comment, added C++ tests (copied from the C
ones)

2022-11-28  Christophe Lyon  
Richard Sandiford  

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix
comment.
(aarch64_layout_arg): Factorize warning conditions.
(aarch64_function_arg_boundary): Fix typo.
* function.cc (currently_expanding_function_start): New variable.
(expand_function_start): Handle
currently_expanding_function_start.
* function.h (currently_expanding_function_start): Declare.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New
test.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New
test.
* gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning.h: New test.
* g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test.
* g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New
test.
* g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test.
* g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New
test.
* g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test.
* g++.target/aarch64/bitfield-abi-warning.h: New test.
---
 gcc/config/aarch64/aarch64.cc |  28 +++-
 gcc/function.cc   |   5 +
 gcc/function.h|   2 +
 .../bitfield-abi-warning-align16-O2-extra.C   |  86 
 .../aarch64/bitfield-abi-warning-align16-O2.C |  87 
 .../bitfield-abi-warning-align32-O2-extra.C   | 119 +
 .../aarch64/bitfield-abi-warning-align32-O2.C | 119 +
 .../aarch64/bitfield-abi-warning-align8-O2.C  |  16 +++
 .../g++.target/aarch64/bitfield-abi-warning.h | 125 ++
 .../bitfield-abi-warning-align16-O2-extra.c   |  86 
 .../aarch64/bitfield-abi-warning-align16-O2.c |  87 
 .../bitfield-abi-warning-align32-O2-extra.c   | 119 +
 .../aarch64/bitfield-abi-warning-align32-O2.c | 119 +
 .../aarch64/bitfield-abi-warning-align8-O2.c  |  16 +++
 .../gcc.target/aarch64/bitfield-abi-warning.h | 125 ++
 15 files changed, 1132 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/bitfield-abi-warning-align16-O2.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/bitfield-abi-warning-align32-O2.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/bitfield-abi-warning-align8-O2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/bitfield-abi-warning.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align16-O2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align32-O2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning-align8-O2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/bitfield-abi-warning.h

diff --git a/gcc/config/aar

Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Gerald Pfeifer
On Thu, 5 Jan 2023, Segher Boessenkool wrote:
> Happy new year everyone.
> 
> Is this patch okay to commit?

>From a wwwdocs perspective, yes. 

Are you also *asking* from an architectural/"strategic" perspective, 
or simply *informing*? :-)  The former I cannot approve, the latter I 
certainly can.

Gerald


Re: [PATCH,WWWDOCS] htdocs: news: GCC BPF in Compiler Explorer

2023-01-11 Thread Gerald Pfeifer
On Fri, 23 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
> This patch adds an entry to the News section in index.html, announcing
> the availability of a nightly build of bpf-unknown-none-gcc.

Nice!

> +https://godbolt.org";>GCC BPF in Compiler 
> Explorer
> + [2022-12-23]
> +Support for a nightly build of the bpf-unknown-none-gcc compiler
> +  has been contributed to Compiler Explorer (aka godbolt.org) by Marc
> +  Poulhiès

Usually I recommend active voice, something like "Compiler Explorer (aka 
godbolt.org) now supports nightly builds of the bpf-unknown-none-gcc 
compiler thanks to Marc Poulhiès", but your proposal is perfectly fine, 
too.

Which means only change if you like the alternative apprach better 
yourself; otherwise simply use the existing one.

Either way: Okay, and thank you!

Gerald


[PATCH 2/2 v2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-01-11 Thread Stam Markianos-Wright via Gcc-patches

-  Respin of the below patch -

In this 2/2 patch, from v1 to v2 I have:

* Removed the modification the interface of the doloop_end target-insn
(so I no longer need to touch any other target backends)


* Added more modes to `arm_get_required_vpr_reg` to make it flexible
between searching: all operands/only input arguments/only outputs. Also
added helpers:
`arm_get_required_vpr_reg_ret_val`
`arm_get_required_vpr_reg_param`

* Added support for the use of other VPR predicate values within
a dlstp/letp loop, as long as they don't originate from the vctp-generated
VPR value. Also changed `arm_mve_get_loop_unique_vctp` to the simpler
`arm_mve_get_loop_vctp` since now we can support other VCTP insns
within the loop.

* Added support for loops of the form:
     int num_of_iters = (num_of_elem + num_of_lanes - 1) / num_of_lanes
     for (i = 0; i < num_of_iters; i++)
       {
     p = vctp (num_of_elem)
     n -= num_of_lanes;
       }
   to be tranformed into dlstp/letp loops.

* Changed the VCTP look-ahead for SIGN_EXTEND and SUBREG insns to
use df def/use chains instead of `next_nonnote_nondebug_insn_bb`.

* Added support for using unpredicated (but predicable) insns
within the dlstp/letp loop. These need to meet some specific conditions,
because they _will_ become implicitly tail predicated by the dlstp/letp
transformation.

* Added a df chain check to any other instructions to make sure that they
don't USE the VCTP-generated VPR value.

* Added testing of all these various edge cases.


Original email with updated Changelog at the end:



Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_get_required_vpr_reg_ret_val): New.
    (arm_get_required_vpr_reg_param): New.
    (arm_mve_get_loop_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md (DLSTP): New.
    (mode1): Add DLSTP mappings.
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * tm.texi: Document new hook.
    * tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dlstp-

Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Richard Biener via Gcc-patches
On Wed, Jan 11, 2023 at 3:22 PM Gerald Pfeifer  wrote:
>
> On Thu, 5 Jan 2023, Segher Boessenkool wrote:
> > Happy new year everyone.
> >
> > Is this patch okay to commit?
>
> From a wwwdocs perspective, yes.
>
> Are you also *asking* from an architectural/"strategic" perspective,
> or simply *informing*? :-)  The former I cannot approve, the latter I
> certainly can.

Note this is more info for port maintainers not for users and
changes.html is for users.  "In a future release" is also quite vague.

Richard.

>
> Gerald


[ping][PATCH 1/1] docs: Add link to gmplib.org

2023-01-11 Thread Benson Muite via Gcc-patches
Improvement to documentation from a new contributor without commit rights.

On 1/5/23 06:38, Benson Muite wrote:
> Link is missing from install documentation
> 
> Signed-off-by: Benson Muite 
> ---
>  gcc/doc/install.texi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index ccc8d15fd08..18e8709a169 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -396,7 +396,8 @@ install the libraries.
>  @table @asis
>  @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later)
>  
> -Necessary to build GCC@.  If a GMP source distribution is found in a
> +Necessary to build GCC@.  It can be downloaded from
> +@uref{https://gmplib.org/}.  If a GMP source distribution is found in a
>  subdirectory of your GCC sources named @file{gmp}, it will be built
>  together with GCC.  Alternatively, if GMP is already installed but it
>  is not in your library search path, you will have to configure with the



Re: [PATCH 12/15 V5] arm: implement bti injection

2023-01-11 Thread Richard Earnshaw via Gcc-patches




On 22/12/2022 17:13, Andrea Corallo via Gcc-patches wrote:

Richard Earnshaw  writes:


On 14/12/2022 17:00, Richard Earnshaw via Gcc-patches wrote:

On 14/12/2022 16:40, Andrea Corallo via Gcc-patches wrote:

Hi Richard,

thanks for reviewing.

Richard Earnshaw  writes:


On 28/10/2022 17:40, Andrea Corallo via Gcc-patches wrote:

Hi all,
please find attached the third iteration of this patch addresing
review
comments.
Thanks
     Andrea



@@ -23374,12 +23374,6 @@ output_probe_stack_range (rtx reg1, rtx reg2)
     return "";
   }

-static bool
-aarch_bti_enabled ()
-{
-  return false;
-}
-
   /* Generate the prologue instructions for entry into an ARM or Thumb-2
  function.  */
   void
@@ -32992,6 +32986,61 @@ arm_current_function_pac_enabled_p (void)
     && !crtl->is_leaf));
   }

+/* Return TRUE if Branch Target Identification Mechanism is
enabled.  */
+bool
+aarch_bti_enabled (void)
+{
+  return aarch_enable_bti == 1;
+}

See comment in earlier patch about the location of this function
moving.   Can aarch_enable_bti take values other than 0 and 1?


Yes default is 2.

It shouldn't be by this point, because, hopefully you've gone
through the equivalent of this hunk (from aarch64) somewhere in
arm_override_options:
     if (aarch_enable_bti == 2)
   {
   #ifdef TARGET_ENABLE_BTI
     aarch_enable_bti = 1;
   #else
     aarch_enable_bti = 0;
   #endif
   }
And after this point the '2' should never be seen again.  We use
this trick to permit the user to force a default that differs from
the configuration.
However, I don't see a hunk to do this in patch 3, so perhaps that
needs updating to fix this.


I've just remembered that the above is to support a configure-time
option of the compiler to enable branch protection.  But perhaps we
don't want to have that in AArch32, in which case it would be better
not to have the default be 2 anyway, just default to off (0).

R.


Done in 1/15 (needs approval again now).




[...]


+  return GET_CODE (pat) == UNSPEC_VOLATILE && XINT (pat, 1) ==
UNSPEC_BTI_NOP;

I'm not sure where this crept in, but UNSPEC and UNSPEC_VOLATILE have
separate enums in the backend, so UNSPEC_BIT_NOP should really be
VUNSPEC_BTI_NOP and defined in the enum "unspecv".


Done


+aarch_pac_insn_p (rtx x)
+{
+  if (!x || !INSN_P (x))
+    return false;
+
+  rtx pat = PATTERN (x);
+
+  if (GET_CODE (pat) == SET)
+    {
+  rtx tmp = XEXP (pat, 1);
+  if (tmp
+  && GET_CODE (tmp) == UNSPEC
+  && (XINT (tmp, 1) == UNSPEC_PAC_NOP
+  || XINT (tmp, 1) == UNSPEC_PACBTI_NOP))
+    return true;
+    }
+

This will also need updating (see review on earlier patch) because
PACBTI needs to be unspec_volatile, while PAC doesn't.


Done


+/* The following two functions are for code compatibility with aarch64
+   code, this even if in arm we have only one bti instruction.  */
+

I'd just write
   /* Target specific mapping for aarch_gen_bti_c and
   aarch_gen_bti_j. For Arm, both of these map to a simple BTI
instruction.  */


Done



@@ -162,6 +162,7 @@ (define_c_enum "unspec" [
     UNSPEC_PAC_NOP    ; Represents PAC signing LR
     UNSPEC_PACBTI_NOP    ; Represents PAC signing LR + valid landing pad
     UNSPEC_AUT_NOP    ; Represents PAC verifying LR
+  UNSPEC_BTI_NOP    ; Represent BTI
   ])

BTI is an unspec volatile, so this should be in the "vunspec" enum and
renamed accordingly (see above).


Done.

Please find attached the updated version of this patch.

BR

    Andrea


Apart from that, this is OK.
R.


Cool, attached the updated patch.

Also I added some error handling not to run the bti pass if the march
selected does not support bti.

BR

   Andrea




OK.

R.


Re: PING: New reg note REG_CFA_NORESTORE

2023-01-11 Thread Andreas Krebbel via Gcc-patches
On 12/27/22 19:23, Jeff Law wrote:
> 
> 
> On 12/13/22 01:55, Andreas Krebbel via Gcc-patches wrote:
>> Hi,
>>
>> I need a way to save registers on the stack and generate proper CFI for it. 
>> Since I do not intend to
>> restore them I needed a way to tell the CFI generation step about it:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606128.html
>>
>> Is this ok for mainline?
> Presumably there's validation bits that want to validate that everything 
> saved eventually gets restored?
> 
> There's only one call to dwarf2out_frame_debug_cfa_restore, so ISTM that 
> providing an initializer for the argument isn't needed and just creates 
> an overload (and associated code) that isn't needed.  Why not just 
> remove the default initializer?
> 
> Ok with that change or a good reason why you need to keep the initializer.

Right. I'll remove it. Thanks for having a look!

Bye,

Andreas



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Segher Boessenkool
On Wed, Jan 11, 2023 at 03:34:45PM +0100, Richard Biener wrote:
> On Wed, Jan 11, 2023 at 3:22 PM Gerald Pfeifer  wrote:
> >
> > On Thu, 5 Jan 2023, Segher Boessenkool wrote:
> > > Happy new year everyone.
> > >
> > > Is this patch okay to commit?
> >
> > From a wwwdocs perspective, yes.
> >
> > Are you also *asking* from an architectural/"strategic" perspective,
> > or simply *informing*? :-)  The former I cannot approve, the latter I
> > certainly can.

Strategic, yes.  Good way of phrasing it, thanks :-)

> Note this is more info for port maintainers not for users and
> changes.html is for users.

And users will notice some ports will have to be removed, because those
ports are not maintained / not maintained enough.  Some ports will not
work with LRA, most will be easy to fix, but someone will have to do
that.  If no one does so the port works sufficiently well it will have
to be removed before release.

> "In a future release" is also quite vague.

It's what we usually say in changes.html .  "In GCC 14" if you want?

I can add some stuff on how this will benefit users?


Segher


Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Richard Biener via Gcc-patches



> Am 11.01.2023 um 16:17 schrieb Segher Boessenkool 
> :
> 
> On Wed, Jan 11, 2023 at 03:34:45PM +0100, Richard Biener wrote:
>>> On Wed, Jan 11, 2023 at 3:22 PM Gerald Pfeifer  wrote:
>>> 
>>> On Thu, 5 Jan 2023, Segher Boessenkool wrote:
 Happy new year everyone.
 
 Is this patch okay to commit?
>>> 
>>> From a wwwdocs perspective, yes.
>>> 
>>> Are you also *asking* from an architectural/"strategic" perspective,
>>> or simply *informing*? :-)  The former I cannot approve, the latter I
>>> certainly can.
> 
> Strategic, yes.  Good way of phrasing it, thanks :-)
> 
>> Note this is more info for port maintainers not for users and
>> changes.html is for users.
> 
> And users will notice some ports will have to be removed, because those
> ports are not maintained / not maintained enough.  Some ports will not
> work with LRA, most will be easy to fix, but someone will have to do
> that.  If no one does so the port works sufficiently well it will have
> to be removed before release.
> 
>> "In a future release" is also quite vague.
> 
> It's what we usually say in changes.html .  "In GCC 14" if you want?
> 
> I can add some stuff on how this will benefit users?

I guess listing the ports without LRA support might be a first step for 
clarification?


> 
> 
> Segher


RE: [GCC][PATCH v2] arm: Add cde feature support for Cortex-M55 CPU.

2023-01-11 Thread Srinath Parvathaneni via Gcc-patches
Ping!!
-
From: Srinath Parvathaneni  
Sent: Tuesday, December 6, 2022 11:32 AM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw 
Cc: Christophe Lyon 
Subject: Re: [GCC][PATCH v2] arm: Add cde feature support for Cortex-M55 CPU.

Ping!!

From: Srinath Parvathaneni
Sent: 31 October 2022 12:38
To: mailto:gcc-patches@gcc.gnu.org 
Cc: Richard Earnshaw ; Christophe Lyon 

Subject: RE: [GCC][PATCH v2] arm: Add cde feature support for Cortex-M55 CPU. 
 
Hi,

> -Original Message-
> From: Christophe Lyon 
> Sent: Monday, October 17, 2022 2:30 PM
> To: Srinath Parvathaneni ; gcc-
> mailto:patc...@gcc.gnu.org
> Cc: Richard Earnshaw 
> Subject: Re: [GCC][PATCH] arm: Add cde feature support for Cortex-M55
> CPU.
> 
> Hi Srinath,
> 
> 
> On 10/10/22 10:20, Srinath Parvathaneni via Gcc-patches wrote:
> > Hi,
> >
> > This patch adds cde feature (optional) support for Cortex-M55 CPU,
> > please refer [1] for more details. To use this feature we need to
> > specify +cdecpN (e.g. -mcpu=cortex-m55+cdecp), where N is the
> coprocessor number 0 to 7.
> >
> > Bootstrapped for arm-none-linux-gnueabihf target, regression tested on
> > arm-none-eabi target and found no regressions.
> >
> > [1] https://developer.arm.com/documentation/101051/0101/?lang=en
> (version: r1p1).
> >
> > Ok for master?
> >
> > Regards,
> > Srinath.
> >
> > gcc/ChangeLog:
> >
> > 2022-10-07  Srinath Parvathaneni  
> >
> >  * common/config/arm/arm-common.cc (arm_canon_arch_option_1):
> Ignore cde
> >  options for mlibarch.
> >  * config/arm/arm-cpus.in (begin cpu cortex-m55): Add cde options.
> >  * doc/invoke.texi (CDE): Document options for Cortex-M55 CPU.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2022-10-07  Srinath Parvathaneni  
> >
> >  * gcc.target/arm/multilib.exp: Add multilib tests for Cortex-M55 
> >CPU.
> >
> >
> > ### Attachment also inlined for ease of reply
> ###
> >
> >
> > diff --git a/gcc/common/config/arm/arm-common.cc
> > b/gcc/common/config/arm/arm-common.cc
> > index
> >
> c38812f1ea6a690cd19b0dc74d963c4f5ae155ca..b6f955b3c012475f398382e72
> c9a
> > 3966412991ec 100644
> > --- a/gcc/common/config/arm/arm-common.cc
> > +++ b/gcc/common/config/arm/arm-common.cc
> > @@ -753,6 +753,15 @@ arm_canon_arch_option_1 (int argc, const char
> **argv, bool arch_for_multilib)
> > arm_initialize_isa (target_isa, selected_cpu->common.isa_bits);
> > arm_parse_option_features (target_isa, &selected_cpu->common,
> >   strchr (cpu, '+'));
> > +  if (arch_for_multilib)
> > +   {
> > + const enum isa_feature removable_bits[] =
> {ISA_IGNORE_FOR_MULTILIB,
> > +    isa_nobit};
> > + sbitmap isa_bits = sbitmap_alloc (isa_num_bits);
> > + arm_initialize_isa (isa_bits, removable_bits);
> > + bitmap_and_compl (target_isa, target_isa, isa_bits);
> > +   }
> > +
> 
> I can see the piece of code you add here is exactly the same as the one a few
> lines above when handling "if (arch)". Can this be moved below and thus be
> common to the two cases, or does it have to be performed before
> bitmap_ior of fpu_isa?

Thanks for pointing out this, I have moved the common code below the arch and 
cpu
if blocks in the attached patch.
 
> Also, IIUC, CDE was already optional for other CPUs (M33, M35P, star-mc1),
> so the hunk above fixes a latent bug when handling multilibs for these CPUs
> too? If so, maybe worth splitting the patch into two parts since the above is
> not strictly related to M55?
>
Even though CDE is optional for the mentioned CPUs as per the specs, the code to
enable CDE as optional feature is missing in current compiler.
Current GCC compiler supports CDE as optional feature only with -march options 
and
this pass adds CDE as optional for M55 and so this is not a fix bug.

> But I'm not a maintainer ;-)
> 
> Thanks,
> 
> Christophe
> 
> > if (fpu && strcmp (fpu, "auto") != 0)
> >  {
> >    /* The easiest and safest way to remove the default fpu diff
> > --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index
> >
> 5a63bc548e54dbfdce5d1df425bd615d81895d80..aa02c04c4924662f3ddd58e
> 69673
> > 92ba3f4b4a87 100644
> > --- a/gcc/config/arm/arm-cpus.in
> > +++ b/gcc/config/arm/arm-cpus.in
> > @@ -1633,6 +1633,14 @@ begin cpu cortex-m55
> >    option nomve remove mve mve_float
> >    option nofp remove ALL_FP mve_float
> >    option nodsp remove MVE mve_float
> > + option cdecp0 add cdecp0
> > + option cdecp1 add cdecp1
> > + option cdecp2 add cdecp2
> > + option cdecp3 add cdecp3
> > + option cdecp4 add cdecp4
> > + optio

[PATCH] c++: Avoid incorrect shortening of divisions [PR108365]

2023-01-11 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.

Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value.  Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.

But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.

So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion, and then next to it also allows op0 to be just unsigned,
promoted or not, as that is what the C FE will do for those cases too
and I believe it must work - either the division/modulo common type
will be that unsigned type, then we can shorten and don't need to worry
about UB, or it will be some wider signed type but then it can't be most
negative value of the wider type.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-11  Jakub Jelinek  

PR c++/108365
* typeck.cc (cp_build_binary_op): For integral division or modulo,
shorten if type0 is unsigned, or op0 is cast from narrower unsigned
integral type or stripped_op1 is INTEGER_CST other than -1.

* g++.dg/opt/pr108365.C: New test.
* g++.dg/warn/pr108365.C: New test.

--- gcc/cp/typeck.cc.jj 2022-12-15 19:17:37.828072458 +0100
+++ gcc/cp/typeck.cc2023-01-11 12:15:25.195284107 +0100
@@ -5455,8 +5455,15 @@ cp_build_binary_op (const op_location_t
 point, so we have to dig out the original type to find out if
 it was unsigned.  */
  tree stripped_op1 = tree_strip_any_location_wrapper (op1);
- shorten = ((TREE_CODE (op0) == NOP_EXPR
- && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0
+ shorten = (TYPE_UNSIGNED (type0)
+|| (TREE_CODE (op0) == NOP_EXPR
+&& INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0,
+ 0)))
+&& TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0,
+   0)))
+&& (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0,
+ 0)))
+< TYPE_PRECISION (type0)))
 || (TREE_CODE (stripped_op1) == INTEGER_CST
 && ! integer_all_onesp (stripped_op1)));
}
@@ -5491,8 +5498,12 @@ cp_build_binary_op (const op_location_t
 quotient can't be represented in the computation mode.  We shorten
 only if unsigned or if dividing by something we know != -1.  */
  tree stripped_op1 = tree_strip_any_location_wrapper (op1);
- shorten = ((TREE_CODE (op0) == NOP_EXPR
- && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0
+ shorten = (TYPE_UNSIGNED (type0)
+|| (TREE_CODE (op0) == NOP_EXPR
+&& INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0, 0)))
+&& TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0)))
+&& (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0, 0

[PATCH] c: Don't emit DEBUG_BEGIN_STMTs for K&R function argument declarations [PR105972]

2023-01-11 Thread Jakub Jelinek via Gcc-patches
Hi!

K&R function parameter declarations are handled by calling
recursively c_parser_declaration_or_fndef in a loop, where each such
call will add_debug_begin_stmt at the start.
Now, if the K&R function definition is not a nested function,
building_stmt_list_p () is false and so we don't emit the DEBUG_BEGIN_STMTs
anywhere, but if it is a nested function, we emit it in the containing
function at the point of the nested function definition.
As the following testcase shows, it can cause ICEs if the containing
function has var-tracking disabled but nested function has them enabled,
as the DEBUG_BEGIN_STMTs are added to the containing function which
shouldn't have them but MAY_HAVE_DEBUG_MARKER_STMTS is checked already
for the nested function, or just wrong experience in the debugger.

The following patch ensures we don't emit any such DEBUG_BEGIN_STMTs for the
K&R function parameter declarations even in nested functions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-11  Jakub Jelinek  

PR c/105972
* c-parser.cc (c_parser_declaration_or_fndef): Disable debug non-bind
markers for K&R function parameter declarations of nested functions.

* gcc.dg/pr105972.c: New test.

--- gcc/c/c-parser.cc.jj2023-01-09 13:30:46.873347238 +0100
+++ gcc/c/c-parser.cc   2023-01-11 14:55:39.161287717 +0100
@@ -2804,10 +2804,13 @@ c_parser_declaration_or_fndef (c_parser
 declarator with a nonempty identifier list in a definition;
 and postfix attributes have never been accepted here in
 function definitions either.  */
+  int save_debug_nonbind_markers_p = debug_nonbind_markers_p;
+  debug_nonbind_markers_p = 0;
   while (c_parser_next_token_is_not (parser, CPP_EOF)
 && c_parser_next_token_is_not (parser, CPP_OPEN_BRACE))
c_parser_declaration_or_fndef (parser, false, false, false,
   true, false);
+  debug_nonbind_markers_p = save_debug_nonbind_markers_p;
   store_parm_decls ();
   if (omp_declare_simd_clauses)
c_finish_omp_declare_simd (parser, current_function_decl, NULL_TREE,
--- gcc/testsuite/gcc.dg/pr105972.c.jj  2023-01-11 15:06:55.377366557 +0100
+++ gcc/testsuite/gcc.dg/pr105972.c 2023-01-11 15:04:47.817238069 +0100
@@ -0,0 +1,15 @@
+/* PR c/105972 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g" } */
+
+__attribute__((optimize (0))) int
+foo (void)
+{
+  int
+  bar (x)
+int x;
+  {
+return x;
+  }
+  return bar (0);
+}

Jakub



[OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-11 Thread Andrew Stubbs
This patch fixes a runtime issue I encountered with the AMD GCN Unified 
Shared Memory implementation.


We were using regular malloc'd memory configured into USM mode, but 
there were random intermittent crashes. I can't be completely sure, but 
my best guess is that the HSA driver is using malloc internally from the 
same heap, and therefore using memory on the same page as the offload 
kernel. What I do know is that I could make the crashes go away by 
simply padding the USM allocations before and after.


With this patch USM allocations are now completely separated from the 
system heap. The custom allocator is probably less optimal is some 
use-cases, but does have the advantage that all the metadata is stored 
in a side-table that won't ever cause any pages to migrate back to 
main-memory unnecessarily. It's still possible for the user program to 
use USM memory in a way that causes it to thrash, and this might have 
been the ultimate cause of the crashes, but there's not much we can do 
about that here.


I've broken the allocator out into a new file because I anticipate it 
being needed in more than one place, but I didn't put full 
data-isolation on it yet.


I'll rebase, merge, and repost all of the OpenMP memory patches sometime 
soonish.


Andrewamdgcn, libgomp: custom USM allocator

There were problems with critical driver data sharing pages with USM data, so
this new allocator implementation moves USM to entirely different pages.

libgomp/ChangeLog:

* plugin/plugin-gcn.c: Include sys/mman.h and unistd.h.
(usm_heap_create): New function.
(struct usm_splay_tree_key_s): Delete function.
(usm_splay_compare): Delete function.
(splay_tree_prefix): Delete define.
(GOMP_OFFLOAD_usm_alloc): Use new allocator.
(GOMP_OFFLOAD_usm_free): Likewise.
(GOMP_OFFLOAD_is_usm_ptr): Likewise.
(gomp_fatal): Delete macro.
(splay_tree_c): Delete.
* usm-allocator.c: New file.

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 3c0404c09b2..36fab3951d5 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -48,6 +48,8 @@
 #include "oacc-plugin.h"
 #include "oacc-int.h"
 #include 
+#include 
+#include 
 
 /* These probably won't be in elf.h for a while.  */
 #ifndef R_AMDGPU_NONE
@@ -3071,6 +3073,102 @@ wait_queue (struct goacc_asyncqueue *aq)
 }
 
 /* }}}  */
+/* {{{ Unified Shared Memory
+
+   Normal heap memory is already enabled for USM, but by default it is "fine-
+   grained" memory, meaning that the GPU must access it via the system bus,
+   slowly.  Changing the page to "coarse-grained" mode means that the page
+   is migrated on-demand and can therefore be accessed quickly by both CPU and
+   GPU (although care should be taken to prevent thrashing the page back and
+   forth).
+
+   GOMP_OFFLOAD_alloc also allocates coarse-grained memory, but in that case
+   the initial location is GPU memory; GOMP_OFFLOAD_usm_alloc returns system
+   memory configure coarse-grained.
+
+   The USM memory space is allocated as a largish block and then subdivided
+   via a custom allocator.  (It would be possible to reconfigure regular
+   "malloc'd" memory, but if it ends up on the same page as memory used by
+   the HSA driver then bad things happen.)  */
+
+#include "../usm-allocator.c"
+
+/* Record a list of the memory blocks configured for USM.  */
+static struct usm_heap_pages {
+  void *start;
+  void *end;
+  struct usm_heap_pages *next;
+} *usm_heap_pages = NULL;
+
+/* Initialize or extend the USM memory space.  This is called whenever
+   allocation fails.  SIZE is the minimum size required for the failed
+   allocation to succeed; the function may choose a larger size.
+   Note that Linux lazy allocation means that the memory returned isn't
+   guarenteed to acually exist.  */
+
+static bool
+usm_heap_create (size_t size)
+{
+  static int lock = 0;
+  while (__atomic_exchange_n (&lock, 1, MEMMODEL_ACQUIRE) != 0)
+;
+
+  size_t default_size = 1L * 1024 * 1024 * 1024; /* 1GB */
+  if (size < default_size)
+size = default_size;
+
+  /* Round up to a whole page.  */
+  int pagesize = getpagesize ();
+  int misalignment = size % pagesize;
+  if (misalignment > 0)
+size += pagesize - misalignment;
+
+  /* Try to get contiguous memory, but it might not be possible.
+ The most recent previous allocation is at the head of the list.  */
+  void *addrhint = (usm_heap_pages ? usm_heap_pages->end : NULL);
+  void *new_pages = mmap (addrhint, size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (!new_pages)
+{
+  GCN_DEBUG ("Could not allocate Unified Shared Memory heap.");
+  __atomic_store_n (&lock, 0, MEMMODEL_RELEASE);
+  return false;
+}
+
+  /* Register the heap allocation as coarse grained, which implies USM.  */
+  struct hsa_amd_svm_attribute_pair_s attr = {
+HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG,
+HSA_AMD

Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Segher Boessenkool
On Wed, Jan 11, 2023 at 05:27:36PM +0100, Richard Biener wrote:
> > Am 11.01.2023 um 16:17 schrieb Segher Boessenkool 
> > :
> >> Note this is more info for port maintainers not for users and
> >> changes.html is for users.
> > 
> > And users will notice some ports will have to be removed, because those
> > ports are not maintained / not maintained enough.  Some ports will not
> > work with LRA, most will be easy to fix, but someone will have to do
> > that.  If no one does so the port works sufficiently well it will have
> > to be removed before release.
> > 
> >> "In a future release" is also quite vague.
> > 
> > It's what we usually say in changes.html .  "In GCC 14" if you want?
> > 
> > I can add some stuff on how this will benefit users?
> 
> I guess listing the ports without LRA support might be a first step for 
> clarification?

Every port has LRA support.

Some ports will not build later when we delete old reload, because they
use some functions and/or data structures unique to that.

But all that is easily fixed (for the port maintainers at least,
assuming they understand what their code does ;-) ).  The bigger problem
is that if the port has never been tested with LRA the chances of it
working in all cases are not great (say 50%), so likely some attention
will be needed to get the compiler back to release quality.  And some
ports will even not work for the simplest pieces of source code.  Those
are the problematic cases.

Usually not hard to fix -- all the more complicated targets already run
LRA always, the hard work is done already -- but it still requires a
target maintainer (with a suitable testing environment, hopefully even
hardware) to do the work.  This is what I want to alert people to, and
get agreement that this will happen next major release.


Segher


Re: [PATCH] [PR40457] [arm] expand SI-aligned movdi into pair of movsi

2023-01-11 Thread Richard Earnshaw via Gcc-patches




On 02/12/2022 09:29, Alexandre Oliva via Gcc-patches wrote:


When expanding a misaligned DImode move, emit aligned SImode moves if
the parts are sufficiently aligned.  This enables neighboring stores
to be peephole-combined into stm, as expected by the PR40457 testcase,
even after SLP vectorizes the originally aligned SImode stores into a
misaligned DImode store.

Regstraped on x86_64-linux-gnu, also tested with crosses to riscv64-elf
and arm-eabi (tms570).  Ok to install?


for  gcc/ChangeLog

PR target/40457
* config/arm/arm.md (movmisaligndi): Prefer aligned SImode
moves.


OK.

R.


---
  gcc/config/arm/arm.md |   12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 69bf343fb0ed6..a9eb0299aa761 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12783,8 +12783,16 @@ (define_expand "movmisaligndi"
rtx hi_op0 = gen_highpart_mode (SImode, DImode, operands[0]);
rtx hi_op1 = gen_highpart_mode (SImode, DImode, operands[1]);
  
-  emit_insn (gen_movmisalignsi (lo_op0, lo_op1));

-  emit_insn (gen_movmisalignsi (hi_op0, hi_op1));
+  if (aligned_operand (lo_op0, SImode) && aligned_operand (lo_op1, SImode))
+{
+  emit_move_insn (lo_op0, lo_op1);
+  emit_move_insn (hi_op0, hi_op1);
+}
+  else
+{
+  emit_insn (gen_movmisalignsi (lo_op0, lo_op1));
+  emit_insn (gen_movmisalignsi (hi_op0, hi_op1));
+}
DONE;
  })
  



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Richard Biener via Gcc-patches



> Am 11.01.2023 um 19:34 schrieb Segher Boessenkool 
> :
> 
> On Wed, Jan 11, 2023 at 05:27:36PM +0100, Richard Biener wrote:
 Am 11.01.2023 um 16:17 schrieb Segher Boessenkool 
 :
> Note this is more info for port maintainers not for users and
> changes.html is for users.
>>> 
>>> And users will notice some ports will have to be removed, because those
>>> ports are not maintained / not maintained enough.  Some ports will not
>>> work with LRA, most will be easy to fix, but someone will have to do
>>> that.  If no one does so the port works sufficiently well it will have
>>> to be removed before release.
>>> 
 "In a future release" is also quite vague.
>>> 
>>> It's what we usually say in changes.html .  "In GCC 14" if you want?
>>> 
>>> I can add some stuff on how this will benefit users?
>> 
>> I guess listing the ports without LRA support might be a first step for 
>> clarification?
> 
> Every port has LRA support.
> 
> Some ports will not build later when we delete old reload, because they
> use some functions and/or data structures unique to that.
> 
> But all that is easily fixed (for the port maintainers at least,
> assuming they understand what their code does ;-) ).  The bigger problem
> is that if the port has never been tested with LRA the chances of it
> working in all cases are not great (say 50%), so likely some attention
> will be needed to get the compiler back to release quality.  And some
> ports will even not work for the simplest pieces of source code.  Those
> are the problematic cases.

Like if they cannot even build their target libraries aka their build will 
fail.  It would be nice to identify those and, say, make at least -mlra 
available to all ports that currently do not have a way to enable LRA?

Richard 

> Usually not hard to fix -- all the more complicated targets already run
> LRA always, the hard work is done already -- but it still requires a
> target maintainer (with a suitable testing environment, hopefully even
> hardware) to do the work.  This is what I want to alert people to, and
> get agreement that this will happen next major release.
> 
> 
> Segher


Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Paul Koning via Gcc-patches



> On Jan 11, 2023, at 1:32 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Jan 11, 2023 at 05:27:36PM +0100, Richard Biener wrote:
>>> Am 11.01.2023 um 16:17 schrieb Segher Boessenkool 
>>> :
 Note this is more info for port maintainers not for users and
 changes.html is for users.
>>> 
>>> And users will notice some ports will have to be removed, because those
>>> ports are not maintained / not maintained enough.  Some ports will not
>>> work with LRA, most will be easy to fix, but someone will have to do
>>> that.  If no one does so the port works sufficiently well it will have
>>> to be removed before release.
>>> 
 "In a future release" is also quite vague.
>>> 
>>> It's what we usually say in changes.html .  "In GCC 14" if you want?
>>> 
>>> I can add some stuff on how this will benefit users?
>> 
>> I guess listing the ports without LRA support might be a first step for 
>> clarification?
> 
> Every port has LRA support.
> 
> Some ports will not build later when we delete old reload, because they
> use some functions and/or data structures unique to that.

Or, as in my case, because building with LRA as the default triggers an ICE 
that I don't understand.  I posted a note to the GCC list about what I saw, but 
have received no reaction.

If anyone can help me understand how LRA can generate RTL with register choices 
that violate the constraints listed in the MD file, I would be grateful.

paul



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Segher Boessenkool
On Wed, Jan 11, 2023 at 07:39:29PM +0100, Richard Biener wrote:
> Like if they cannot even build their target libraries aka their build will 
> fail.  It would be nice to identify those and, say, make at least -mlra 
> available to all ports that currently do not have a way to enable LRA?

It is up to the target maintainers to make such support, it is a machine
flag after all (-m are machine flags, -f are more general flags).

There has been ample warning, see 
for example.  GCC 13 release will be six years after that, I'd hope that
that is enough.

Just using
  targetm.lra_p = default_lra_p;
is enough to test.  I don't have a setup to build all targets (that
requires target headers, to begin with), and it is up to the target
maintainers to decide how they want things fixed anyway.

I'll put up a preliminary branch for the generic patches, but let me
update it to trunk first :-)


Segher


Re: [PATCH] preprocessor: Don't register pragmas in directives-only mode [PR108244]

2023-01-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 30, 2022 at 12:21:37PM -0500, Lewis Hyatt via Gcc-patches wrote:
> libcpp's directives-only mode does not expect deferred pragmas to be
> registered, but to date the c-family registration process has not checked for
> this case. That issue became more visible since r13-1544, which added the
> commonly used GCC diagnostic pragmas to the set of those registered in
> preprocessing modes. Fix it by checking for directives-only mode in
> c-family/c-pragma.cc.
> 
> gcc/c-family/ChangeLog:
> 
>   PR preprocessor/108244
>   * c-pragma.cc (c_register_pragma_1): Don't attempt to register any
>   deferred pragmas if -fdirectives-only.
>   (init_pragma): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/cpp/pr108244-1.c: New test.
>   * c-c++-common/cpp/pr108244-2.c: New test.
>   * c-c++-common/cpp/pr108244-3.c: New test.

Ok, with a nit:

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/cpp/pr108244-3.c
> @@ -0,0 +1,6 @@
> +/* { dg-do preprocess } */
> +/* { dg-additional-options "-fdirectives-only -fopenmp" } */
> +/* { dg-require-effective-target "fopenmp" } */
> +#pragma omp parallel
> +#ifdef t
> +#endif

This test should be in gcc/testsuite/c-c++-common/gomp/
directory instead, without the dg-require-effective-target.

Jakub



Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Segher Boessenkool
On Wed, Jan 11, 2023 at 01:42:22PM -0500, Paul Koning wrote:
> Or, as in my case, because building with LRA as the default triggers an ICE 
> that I don't understand.  I posted a note to the GCC list about what I saw, 
> but have received no reaction.

?

I would say your predicates are way too lenient here (general_operand),
but this needs more info.  A testcase to reproduce the problem, to
start with :-)


Segher


Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2023-01-11 Thread Michael Meissner via Gcc-patches
On Tue, Nov 01, 2022 at 10:42:30PM -0400, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in 
> gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).

Unfortunately, this patch no longer works against the trunk.  I have a simpler
patch to libgcc that uses the _Complex _Float128 and _Float128 types for
building the IEEE 128-bit support in libgcc.  It doesn't fix the problem in the
compiler, but it will allow us to go forward and build GCC on targets that have
IEEE 128-bit floating point support (i.e. Fedora 36).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] c: Don't emit DEBUG_BEGIN_STMTs for K&R function argument declarations [PR105972]

2023-01-11 Thread Joseph Myers
On Wed, 11 Jan 2023, Jakub Jelinek via Gcc-patches wrote:

> The following patch ensures we don't emit any such DEBUG_BEGIN_STMTs for the
> K&R function parameter declarations even in nested functions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2023-01-11  Jakub Jelinek  
> 
>   PR c/105972
>   * c-parser.cc (c_parser_declaration_or_fndef): Disable debug non-bind
>   markers for K&R function parameter declarations of nested functions.
> 
>   * gcc.dg/pr105972.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC/PATCH] Remove the workaround for _Float128 precision [PR107299]

2023-01-11 Thread Michael Meissner via Gcc-patches
On Tue, Jan 10, 2023 at 07:23:23PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 09, 2023 at 10:21:52PM -0500, Michael Meissner wrote:
> > I had the patches to change the precision to 128, and I just ran them.  C 
> > and
> > C++ do not seem to be bothered by changing the precision to 128 (once I got 
> > it
> > to build, etc.).  But Fortran on the other hand does actually use the 
> > precision
> > to differentiate between IBM extended double and IEEE 128-bit.  In 
> > particular,
> > the following 3 tests fail when long double is IBM extended double:
> > 
> > gfortran.dg/PR100914.f90
> > gfortran.dg/c-interop/typecodes-array-float128.f90
> > gfortran.dg/c-interop/typecodes-scalar-float128.f90
> > 
> > I tried adding code to use the old precisions for Fortran, but not for 
> > C/C++,
> > but it didn't seem to work.
> > 
> > So while it might be possible to use a single 128 for the precision, it 
> > needs
> > more work and attention, particularly on the Fortran side.
> 
> Can't be more than a few lines changed in the fortran FE.
> Yes, the FE needs to know if it is IBM extended double or IEEE 128-bit so
> that it can decide on the mangling - where to use the artificial kind 17 and
> where to use 16.  But as long as it can figure that out, it doesn't need to
> rely on a particular precision.

I agree that in theory it should be simple to fix.  Unfortunately the patches
that I was working on cause some other failures that I need to investigate.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[committed] analyzer: fix leak false positives on "*UNKNOWN = PTR; " [PR108252]

2023-01-11 Thread David Malcolm via Gcc-patches
PR analyzer/108252 reports a false positive from -Wanalyzer-malloc-leak on
code like this:

  *ptr_ptr = strdup(EXPR);

where ptr_ptr is an UNKNOWN_VALUE.

When we handle:
  *UNKNOWN = PTR;
store::set_value normally marks *PTR as having escaped, and this means
we don't report PTR as leaking when the last usage of PTR is lost.

However this only works for cases where PTR is a region_svalue.
In the example in the bug, it's a conjured_svalue, rather than a
region_svalue.  A similar problem can arise for FDs, which aren't
pointers.

This patch fixes the bug by updating store::set_value to mark any
values stored via *UNKNOWN = VAL as not leaking.

Additionally, sm-malloc.cc's known_allocator_p hardcodes strdup and
strndup as allocators (and thus transitioning their result to
"unchecked"), but we don't implement known_functions for these, leading
to the LHS being a CONJURED_SVALUE, rather than a region_svalue to a
heap-allocated region.  A similar issue happens with functions marked
with __attribute__((malloc)).  As part of a "belt and braces" fix, the
patch also updates the handling of these functions, so that they use
heap-allocated regions.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-5113-g688fc162b76dc6.

gcc/analyzer/ChangeLog:
PR analyzer/108252
* kf.cc (class kf_strdup): New.
(class kf_strndup): New.
(register_known_functions): Register them.
* region-model.cc (region_model::on_call_pre): Use
&HEAP_ALLOCATED_REGION for the default result of an external
function with the "malloc" attribute, rather than CONJURED_SVALUE.
(region_model::get_or_create_region_for_heap_alloc): Allow
"size_in_bytes" to be NULL.
* store.cc (store::set_value): When handling *UNKNOWN = VAL,
mark VAL as "maybe bound".

gcc/testsuite/ChangeLog:
PR analyzer/108252
* gcc.dg/analyzer/attr-malloc-pr108252.c: New test.
* gcc.dg/analyzer/fd-leak-pr108252.c: New test.
* gcc.dg/analyzer/flex-with-call-summaries.c: Remove xfail from
warning false +ve directives.
* gcc.dg/analyzer/pr103217-2.c: Add -Wno-analyzer-too-complex.
* gcc.dg/analyzer/pr103217-3.c: Likewise.
* gcc.dg/analyzer/strdup-pr108252.c: New test.
* gcc.dg/analyzer/strndup-pr108252.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/kf.cc| 56 +++
 gcc/analyzer/region-model.cc  | 32 +++
 gcc/analyzer/store.cc |  2 +
 .../gcc.dg/analyzer/attr-malloc-pr108252.c| 25 +
 .../gcc.dg/analyzer/fd-leak-pr108252.c| 15 +
 .../analyzer/flex-with-call-summaries.c   |  6 +-
 gcc/testsuite/gcc.dg/analyzer/pr103217-2.c|  2 +
 gcc/testsuite/gcc.dg/analyzer/pr103217-3.c|  2 +
 .../gcc.dg/analyzer/strdup-pr108252.c | 19 +++
 .../gcc.dg/analyzer/strndup-pr108252.c| 21 +++
 10 files changed, 166 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-malloc-pr108252.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-leak-pr108252.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strdup-pr108252.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strndup-pr108252.c

diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 6088bfc72c0..53190c51772 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -851,6 +851,32 @@ kf_strcpy::impl_call_pre (const call_details &cd) const
   model->set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
 }
 
+/* Handler for "strdup" and "__builtin_strdup".  */
+
+class kf_strdup : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+return (cd.num_args () == 1 && cd.arg_is_pointer_p (0));
+  }
+  void impl_call_pre (const call_details &cd) const final override
+  {
+region_model *model = cd.get_model ();
+region_model_manager *mgr = cd.get_manager ();
+/* Ideally we'd get the size here, and simulate copying the bytes.  */
+const region *new_reg
+  = model->get_or_create_region_for_heap_alloc (NULL, cd.get_ctxt ());
+model->mark_region_as_unknown (new_reg, NULL);
+if (cd.get_lhs_type ())
+  {
+   const svalue *ptr_sval
+ = mgr->get_ptr_svalue (cd.get_lhs_type (), new_reg);
+   cd.maybe_set_lhs (ptr_sval);
+  }
+  }
+};
+
 /* Handle the on_call_pre part of "strlen".  */
 
 class kf_strlen : public known_function
@@ -892,6 +918,32 @@ kf_strlen::impl_call_pre (const call_details &cd) const
   /* Otherwise a conjured value.  */
 }
 
+/* Handler for "strndup" and "__builtin_strndup".  */
+
+class kf_strndup : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+return (cd.num_args () == 2 && cd.arg_is_pointer_p (0));
+  }
+  void impl_call_pre (const call_details &cd) const final overri

Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Paul Koning via Gcc-patches



> On Jan 11, 2023, at 2:28 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Jan 11, 2023 at 01:42:22PM -0500, Paul Koning wrote:
>> Or, as in my case, because building with LRA as the default triggers an ICE 
>> that I don't understand.  I posted a note to the GCC list about what I saw, 
>> but have received no reaction.
> 
> ?

I just saw that, thanks!

> I would say your predicates are way too lenient here (general_operand),
> but this needs more info.  A testcase to reproduce the problem, to
> start with :-)

I'll try to trim it down.

What do you mean "too lenient"?  The first input operand (which is supposed to 
be the same as the output since the instruction set is 2-address) is 
"general_operand".  The destination is "nonimmediate_operand" which fits the 
constraints applied to it.

paul



Re: [PATCH,WWWDOCS] htdocs: add an Atom feed for GCC news

2023-01-11 Thread Thomas Schwinge
Hi!

On 2022-12-23T10:50:13+0100, "Jose E. Marchesi via Gcc-patches" 
 wrote:
> This patch adds an Atom feed for GCC news, which can then be easily
> aggregated in other sites, such as the GNU planet
> (https://planet.gnu.org).
>
> The feed lives in a file news.xml, and this patch initializes it with
> the latest entry in News as an example.

I absolutely agree that providing such an RSS feed is a good thing
(..., and that we generally should make better use of our News section,
and other "PR"...) -- but I'm less convinced by the prospect of manually
editing the RSS 'news.xml' file, duplicating in a (potentially) different
format what we've got in the HTML News section.  :-|

Ideally, there'd be some simple files for News items (Markdown, or
similar), which are then converted into HTML News as well as RSS feed.
Obviously, there needs to be some consensus on what to use, and somebody
needs to set up the corresponding machinery...

Or do others think that manual 'news.xml' maintenance is not so bad (for
now)?


Grüße
 Thomas


> ---
>  htdocs/index.html |  9 -
>  htdocs/news.xml   | 28 
>  2 files changed, 36 insertions(+), 1 deletion(-)
>  create mode 100644 htdocs/news.xml
>
> diff --git a/htdocs/index.html b/htdocs/index.html
> index e91fadf1..2ddee6f6 100644
> --- a/htdocs/index.html
> +++ b/htdocs/index.html
> @@ -6,6 +6,9 @@
>   content="FUv_3eEIkimd6LAoWned4TPMqmKKQmw3aA2_PBJ5SAY">
>  GCC, the GNU Compiler Collection
>  https://gcc.gnu.org/gcc.css";>
> + +  title="News about the GNU Compiler Collection"
> +  href="news.xml"/>
>  
>
>  
> @@ -48,7 +51,11 @@ mission statement.
>
>  
>
>  
> diff --git a/htdocs/news.xml b/htdocs/news.xml
> new file mode 100644
> index ..bebcaa66
> --- /dev/null
> +++ b/htdocs/news.xml
> @@ -0,0 +1,28 @@
> +
> +
> +
> +  
> +News about the GNU Compiler Collection
> +https://gcc.gnu.org
> +
> +  The GNU Compiler Collection includes front ends for C, C++,
> +  Objective-C, Fortran, Ada, Go, and D, as well as libraries for
> +  these languages (libstdc++,...). GCC was originally written as
> +  the compiler for the GNU operating system. The GNU system was
> +  developed to be 100% free software, free in the sense that it
> +  respects the user's freedom.
> +
> +
> +
> +  GCC BPF in Compiler Explorer
> +  https://godbolt.org
> +  
> +Support for a nightly build of the bpf-unknown-none-gcc
> +compiler has been contributed to Compiler Explorer (aka
> +godbolt.org) by Marc Poulhiès
> +  
> +  Fri, 23 December 2022 11:00:00 CET
> +
> +
> +  
> +
> --
> 2.30.2
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH,WWWDOCS] htdocs: rotate news

2023-01-11 Thread Gerald Pfeifer
On Fri, 23 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
>  htdocs/index.html | 24 
>  htdocs/news.html  | 24 
>  2 files changed, 24 insertions(+), 24 deletions(-)

Okay, thank you.

And you can consider this kind of change preapproved. Or falling under 
our "obvious rule". Whichever you prefer. :-)

Gerald


[committed] wwwdocs: gcc-8: Properly spell "command-line option"

2023-01-11 Thread Gerald Pfeifer
On the way add some missing "the"s.

Pushed.

Gerald
---
 htdocs/gcc-8/changes.html | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/htdocs/gcc-8/changes.html b/htdocs/gcc-8/changes.html
index 73ccd07d..c329a509 100644
--- a/htdocs/gcc-8/changes.html
+++ b/htdocs/gcc-8/changes.html
@@ -972,17 +972,17 @@ is now easier-to-read.
   
   
 Refactored small data feature implementation, controlled
-via -G command line option.
+via the -G command-line option.
   
   
 New support for reduced register set ARC architecture
-configurations, controlled via -mrf16 command line
+configurations, controlled via the -mrf16 command-line
 option.
   
   
 Refurbished and improved support for zero overhead loops.
-Introduced -mlpc-width command line option to control the
-width of lp_count register.
+Introduced -mlpc-width command-line option to control
+the width of the lp_count register.
   
 
 
-- 
2.38.1


Re: [PATCH,WWWDOCS] htdocs: add an Atom feed for GCC news

2023-01-11 Thread Jose E. Marchesi via Gcc-patches


> Hi!
>
> On 2022-12-23T10:50:13+0100, "Jose E. Marchesi via Gcc-patches" 
>  wrote:
>> This patch adds an Atom feed for GCC news, which can then be easily
>> aggregated in other sites, such as the GNU planet
>> (https://planet.gnu.org).
>>
>> The feed lives in a file news.xml, and this patch initializes it with
>> the latest entry in News as an example.
>
> I absolutely agree that providing such an RSS feed is a good thing
> (..., and that we generally should make better use of our News section,
> and other "PR"...) -- but I'm less convinced by the prospect of manually
> editing the RSS 'news.xml' file, duplicating in a (potentially) different
> format what we've got in the HTML News section.  :-|
>
> Ideally, there'd be some simple files for News items (Markdown, or
> similar), which are then converted into HTML News as well as RSS feed.
> Obviously, there needs to be some consensus on what to use, and somebody
> needs to set up the corresponding machinery...
>
> Or do others think that manual 'news.xml' maintenance is not so bad (for
> now)?

I would like to point out that I have maintained these kind of feeds for
my own sites for years, and that in my humble personal experience unless
there are a lot of updates, like more than a couple of new entries per
month, any automated schema would be overkill, prone to rot, and not
really worth the effort.

I strongly suggest to not overengineer here [and nowhere else :)]

>
> Grüße
>  Thomas
>
>
>> ---
>>  htdocs/index.html |  9 -
>>  htdocs/news.xml   | 28 
>>  2 files changed, 36 insertions(+), 1 deletion(-)
>>  create mode 100644 htdocs/news.xml
>>
>> diff --git a/htdocs/index.html b/htdocs/index.html
>> index e91fadf1..2ddee6f6 100644
>> --- a/htdocs/index.html
>> +++ b/htdocs/index.html
>> @@ -6,6 +6,9 @@
>>  > content="FUv_3eEIkimd6LAoWned4TPMqmKKQmw3aA2_PBJ5SAY">
>>  GCC, the GNU Compiler Collection
>>  https://gcc.gnu.org/gcc.css";>
>> +> +  title="News about the GNU Compiler Collection"
>> +  href="news.xml"/>
>>  
>>
>>  
>> @@ -48,7 +51,11 @@ mission statement.
>>
>>  
>>
>>  
>> diff --git a/htdocs/news.xml b/htdocs/news.xml
>> new file mode 100644
>> index ..bebcaa66
>> --- /dev/null
>> +++ b/htdocs/news.xml
>> @@ -0,0 +1,28 @@
>> +
>> +
>> +
>> +  
>> +News about the GNU Compiler Collection
>> +https://gcc.gnu.org
>> +
>> +  The GNU Compiler Collection includes front ends for C, C++,
>> +  Objective-C, Fortran, Ada, Go, and D, as well as libraries for
>> +  these languages (libstdc++,...). GCC was originally written as
>> +  the compiler for the GNU operating system. The GNU system was
>> +  developed to be 100% free software, free in the sense that it
>> +  respects the user's freedom.
>> +
>> +
>> +
>> +  GCC BPF in Compiler Explorer
>> +  https://godbolt.org
>> +  
>> +Support for a nightly build of the bpf-unknown-none-gcc
>> +compiler has been contributed to Compiler Explorer (aka
>> +godbolt.org) by Marc Poulhiès
>> +  
>> +  Fri, 23 December 2022 11:00:00 CET
>> +
>> +
>> +  
>> +
>> --
>> 2.30.2
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße
> 201, 80634 München; Gesellschaft mit beschränkter Haftung;
> Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft:
> München; Registergericht München, HRB 106955


Re: [PATCH,WWWDOCS] htdocs: add an Atom feed for GCC news

2023-01-11 Thread Jose E. Marchesi via Gcc-patches


>> Hi!
>>
>> On 2022-12-23T10:50:13+0100, "Jose E. Marchesi via Gcc-patches" 
>>  wrote:
>>> This patch adds an Atom feed for GCC news, which can then be easily
>>> aggregated in other sites, such as the GNU planet
>>> (https://planet.gnu.org).
>>>
>>> The feed lives in a file news.xml, and this patch initializes it with
>>> the latest entry in News as an example.
>>
>> I absolutely agree that providing such an RSS feed is a good thing
>> (..., and that we generally should make better use of our News section,
>> and other "PR"...) -- but I'm less convinced by the prospect of manually
>> editing the RSS 'news.xml' file, duplicating in a (potentially) different
>> format what we've got in the HTML News section.  :-|
>>
>> Ideally, there'd be some simple files for News items (Markdown, or
>> similar), which are then converted into HTML News as well as RSS feed.
>> Obviously, there needs to be some consensus on what to use, and somebody
>> needs to set up the corresponding machinery...
>>
>> Or do others think that manual 'news.xml' maintenance is not so bad (for
>> now)?
>
> I would like to point out that I have maintained these kind of feeds for
> my own sites for years, and that in my humble personal experience unless
> there are a lot of updates, like more than a couple of new entries per
> month, any automated schema would be overkill, prone to rot, and not
> really worth the effort.
>
> I strongly suggest to not overengineer here [and nowhere else :)]

I forgot to mention that it is also useful to have fine-grain control of
what you publish on what feed.

Not all the news may be appropriate for all feeds.  For example, I have
a separated feed in my site for entries I want to aggregate in the GNU
Planet.  Other stuff, which is more personal in nature, is included in a
more general feed, or not included in a feed at all.

Not sure if this really applies to the case in hand, which is the GCC
News, but that is another reason why I maintain my feeds manually as
proposed in the patch..

>
>>
>> Grüße
>>  Thomas
>>
>>
>>> ---
>>>  htdocs/index.html |  9 -
>>>  htdocs/news.xml   | 28 
>>>  2 files changed, 36 insertions(+), 1 deletion(-)
>>>  create mode 100644 htdocs/news.xml
>>>
>>> diff --git a/htdocs/index.html b/htdocs/index.html
>>> index e91fadf1..2ddee6f6 100644
>>> --- a/htdocs/index.html
>>> +++ b/htdocs/index.html
>>> @@ -6,6 +6,9 @@
>>>  >> content="FUv_3eEIkimd6LAoWned4TPMqmKKQmw3aA2_PBJ5SAY">
>>>  GCC, the GNU Compiler Collection
>>>  https://gcc.gnu.org/gcc.css";>
>>> +>> +  title="News about the GNU Compiler Collection"
>>> +  href="news.xml"/>
>>>  
>>>
>>>  
>>> @@ -48,7 +51,11 @@ mission statement.
>>>
>>>  
>>>
>>>  
>>> diff --git a/htdocs/news.xml b/htdocs/news.xml
>>> new file mode 100644
>>> index ..bebcaa66
>>> --- /dev/null
>>> +++ b/htdocs/news.xml
>>> @@ -0,0 +1,28 @@
>>> +
>>> +
>>> +
>>> +  
>>> +News about the GNU Compiler Collection
>>> +https://gcc.gnu.org
>>> +
>>> +  The GNU Compiler Collection includes front ends for C, C++,
>>> +  Objective-C, Fortran, Ada, Go, and D, as well as libraries for
>>> +  these languages (libstdc++,...). GCC was originally written as
>>> +  the compiler for the GNU operating system. The GNU system was
>>> +  developed to be 100% free software, free in the sense that it
>>> +  respects the user's freedom.
>>> +
>>> +
>>> +
>>> +  GCC BPF in Compiler Explorer
>>> +  https://godbolt.org
>>> +  
>>> +Support for a nightly build of the bpf-unknown-none-gcc
>>> +compiler has been contributed to Compiler Explorer (aka
>>> +godbolt.org) by Marc Poulhiès
>>> +  
>>> +  Fri, 23 December 2022 11:00:00 CET
>>> +
>>> +
>>> +  
>>> +
>>> --
>>> 2.30.2
>> -
>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße
>> 201, 80634 München; Gesellschaft mit beschränkter Haftung;
>> Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft:
>> München; Registergericht München, HRB 106955


[committed] config-list.mk: Remove obsolete FreeBSD targets

2023-01-11 Thread Gerald Pfeifer
ia64-freebsd is officially dead, and sparc64-freebsd has not been able
to build GCC for half a dozen years (or so) and is essentially end of
life.

The default per gcc/config/i386/freebsd.h has been i586 for a while,
so i486-freebsd can go as well. (We still have i686-freebsd.)

Pushed, obvious rule and such.

Gerald


contrib/ChangeLog:

* config-list.mk: Remove i486-freebsd4, ia64-freebsd6, and
sparc64-freebsd6.
---
 contrib/config-list.mk | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 2056a221ac2..05184eaa701 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -48,14 +48,14 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   hppa64-hpux11.3 \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes \
   i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
-  i486-freebsd4 i686-freebsd6 i686-kfreebsd-gnu \
+  i686-freebsd6 i686-kfreebsd-gnu \
   i686-netbsdelf9 \
   i686-openbsd i686-elf i686-kopensolaris-gnu i686-symbolics-gnu \
   i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
   i686-rtems i686-solaris2.11 i686-wrs-vxworks \
   i686-wrs-vxworksae \
   i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
-  ia64-freebsd6 ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \
+  ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \
   lm32-rtems lm32-uclinux \
   loongarch64-linux-gnuf64 loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
   m32c-elf m32r-elf m32rle-elf \
@@ -93,7 +93,7 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   sparc-leon-elf sparc-rtems sparc-linux-gnu \
   sparc-leon3-linux-gnuOPT-enable-target=all sparc-netbsdelf \
   
sparc64-sun-solaris2.11OPT-with-gnu-ldOPT-with-gnu-asOPT-enable-threads=posix \
-  sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 \
+  sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux \
   sparc64-netbsd sparc64-openbsd \
   v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
   vax-netbsdelf visium-elf x86_64-apple-darwin \
-- 
2.38.1


Re: [RFA] choosing __platform_wait_t on targets without lock-free 64 atomics

2023-01-11 Thread Thomas Rodgers via Gcc-patches
I agree with this change.

On Thu, Jan 5, 2023 at 4:22 PM Jonathan Wakely  wrote:

> How about this?
>
> I don't think we should worry about targets without atomic int, so don't
> bother using types smaller than int.
>
>
> -- >8 --
>
> For non-futex targets the __platform_wait_t type is currently uint64_t,
> but that requires a lock in libatomic for some 32-bit targets. We don't
> really need a 64-bit type, so use unsigned long if that is lock-free,
> and int otherwise. This should mean it's lock-free on a wider set of
> targets.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_wait.h (__detail::__platform_wait_t):
> Define as unsigned long if always lock-free, and unsigned int
> otherwise.
> ---
>  libstdc++-v3/include/bits/atomic_wait.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
> b/libstdc++-v3/include/bits/atomic_wait.h
> index bd1ed56d157..46f39f10cbc 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -64,7 +64,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  // and __platform_notify() if there is a more efficient primitive
> supported
>  // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better
> than
>  // a mutex/condvar based wait.
> -using __platform_wait_t = uint64_t;
> +# if  ATOMIC_LONG_LOCK_FREE == 2
> +using __platform_wait_t = unsigned long;
> +# else
> +using __platform_wait_t = unsigned int;
> +# endif
>  inline constexpr size_t __platform_wait_alignment
>= __alignof__(__platform_wait_t);
>  #endif
> --
> 2.39.0
>
>


[PATCH] gimple-fold.h: Add missing gimple-iterator.h

2023-01-11 Thread Palmer Dabbelt
As of 6f5b06032eb ("Finish gimple_build API enhancement") gimple-fold.h
uses some of the declarations from gimple-iterator.h, which causes
issues when building Linux's stackprotector plugin.

gcc/ChangeLog:

* gimple-fold.h: Add gimple-iterator.h include.

---

I'm not sure if this should instead be fixed in Linux by reordering the
includes along the lines of

diff --git a/scripts/gcc-plugins/gcc-common.h b/scripts/gcc-plugins/gcc-common.h
index 9a1895747b15..2c3a3079128a 100644
--- a/scripts/gcc-plugins/gcc-common.h
+++ b/scripts/gcc-plugins/gcc-common.h
@@ -72,6 +72,7 @@
 #include "stor-layout.h"
 #include "internal-fn.h"
 #include "gimple-expr.h"
+#include "gimple-iterator.h"
 #include "gimple-fold.h"
 #include "context.h"
 #include "tree-ssa-alias.h"
@@ -88,7 +89,6 @@
 #include "gimple.h"
 #include "tree-phinodes.h"
 #include "tree-cfg.h"
-#include "gimple-iterator.h"
 #include "gimple-ssa.h"
 #include "ssa-iterators.h"

but I figured it was slightly easier for users to keep these compatible.
It looks like many GCC-internal uses of gimple-fold.h already have the
gimple-iterator.h include right before, though, so not sure if that's
how things are meant to be.
---
 gcc/gimple-fold.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 2fd58db9a2e..66bee2b75df 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_GIMPLE_FOLD_H
 #define GCC_GIMPLE_FOLD_H
 
+#include "gimple-iterator.h"
+
 extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
-- 
2.39.0



[PATCH v2] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread jinma via Gcc-patches
From e4ce8e825c145d74e6b9827f972629548e39f118 Mon Sep 17 00:00:00 2001
From: Jin Ma 
Date: Wed, 11 Jan 2023 19:13:27 +0800
Subject: [PATCH] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

From e4ce8e825c145d74e6b9827f972629548e39f118 Mon Sep 17 00:00:00 2001
From: Jin Ma 
Date: Wed, 11 Jan 2023 19:13:27 +0800
Subject: [PATCH] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

This patch adds the 'Zfa' extension for riscv, which is an implementation for
unratified and unfrozen RISC-V extension.

Although the binutils-gdb for 'Zfa' extension is not yet upstream, we can try
to discuss it. And we can test new instructions for your (possibly virtual)
environment and early review for fast adoption after ratification.

This is based on:
( 
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
 )
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.

The Wiki Page (details):
( https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa )

The binutils-gdb for 'Zfa' extension:
( https://sourceware.org/pipermail/binutils/2022-September/122938.html )

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
* config/riscv/constraints.md (Zf):
* config/riscv/predicates.md:
* config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2):
(AVAIL):
(RISCV_ATYPE_SF):
(RISCV_ATYPE_DF):
(RISCV_FTYPE_ATYPES2):
* config/riscv/riscv-ftypes.def (2):
* config/riscv/riscv-opts.h (MASK_ZFA):
(TARGET_ZFA):
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli):
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
(riscv_cannot_force_const_mem):
(riscv_const_insns):
(riscv_legitimize_const_move):
(riscv_split_64bit_move_p):
(riscv_output_move):
(riscv_memmodel_needs_release_fence):
(riscv_print_operand):
(riscv_secondary_memory_needed):
* config/riscv/riscv.h (GP_REG_RTX_P):
* config/riscv/riscv.md (riscv_fminm3):
(riscv_fmaxm3):
(fix_truncdfsi2_zfa):
(round2):
(rint2):
(f_quiet4_zfa):
* config/riscv/riscv.opt:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fcvtmod.c: New test.
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fminm-fmaxm.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/constraints.md   |   7 ++
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-builtins.cc|  11 ++
 gcc/config/riscv/riscv-ftypes.def |   2 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.cc | 105 +++-
 gcc/config/riscv/riscv.h  |   1 +
 gcc/config/riscv/riscv.md | 114 ++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c  |  12 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c  |  20 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  42 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  80 
 .../gcc.target/riscv/zfa-fminm-fmaxm.c|  25 
 .../gcc.target/riscv/zfa-fmovh-fmovp.c|  11 ++
 gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  25 
 18 files changed, 448 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fminm-fmaxm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0a89fdaffe2..cccec12975c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},

+  {"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},

   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1242,6 +1244,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},

+  {"zfa",   &gcc_options::x_riscv_zf_subext, 

Re: [PING] nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)

2023-01-11 Thread Jerry D via Gcc-patches

On 1/11/23 4:06 AM, Thomas Schwinge wrote:

Hi!

Ping -- the '-mframe-malloc-threshold' idea, at least.

Note that while this issue originally did pop up for Fortran I/O, it's
likewise relevant for other functions that maintain big frames, for
example in newlib:

 libc/string/libc_a-memmem.o:.local .align 16 .b8 %frame_ar[2064];
 libc/string/libc_a-strcasestr.o:.local .align 16 .b8 %frame_ar[2064];
 libc/string/libc_a-strstr.o:.local .align 16 .b8 %frame_ar[2064];
 libm/math/libm_a-k_rem_pio2.o:.local .align 16 .b8 %frame_ar[560];

Therefore a generic solution (or, workaround if you'd like) does seem
appropriate.


---snip ---

AS a gfortranner I have to at least say anyone doing fortran I/O on a 
GPU is nuts.


With that said, a configurable option to address the broader issue makes 
sense. Perhaps the default threshold should be whatever it is now and if 
someone has a real situation where it is needed, they can adjust.


Regards,

Jerry



Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2023-01-11 Thread Takayuki 'January June' Suwa via Gcc-patches
On 2023/01/11 17:02, Robin Dapp wrote:
> Hi,
Hi!

>  
>> On optimizing for speed, default_noce_conversion_profitable_p() allows
>> plenty of headroom, so this patch has little impact.
>>
>> Also, if the target-specific cost estimate is accurate or allows for
>> margins, the impact should be similarly small.
> I believe this part of ifcvt does/did not use the costing on purpose.
> It will generally convert more sequences than other paths that compare
> before and after costs since we just count the number of converted
> insns comparing them against the "branch costs".  Similar to rtx costs
> they are kind of relative to a single insn but AFAIK it's not used
> consistently everywhere.  All the major platforms have low branch costs
> nowadays (0 or 1?) thus we won't emit too many conditional moves here.
> 
> In general I agree that we should compare costs everywhere and not just
> count (the costing should include the branch costs as well) but this would
> be a major overhaul.  For your case (assuming xtensa), could you not
> tune xtensa_branch_cost?  It is currently 3 allowing up to 4 conditional
> moves to be generated.  optimize_function_for_speed_p is already being
> passed to the hook so you could make use of that and decrease branch
> costs when optimizing for size only.
> 
> Regards
>  Robin

Thank you for your detailed explanation.

In my case (for Xtensa), the cost of branching isn't really an issue.
The actual problem (that I think) is the costs of the sequence itself before 
and after conversion.
It is due to the fact that ifcvt's internal estimation is based on 
PATTERN(insn), so the instruction lengths ("length" attribute) associated with 
insns are not well reflected.
This is especially noticeable when optimizing for size (overestimating the 
original cost).

Currently, in addition to the patch, I have implemented the following code, and 
I'm confirming that it works roughly well (fine adjustments are still required).

/* Return true if the instruction sequence seq is a good candidate as a
   replacement for the if-convertible sequence described in if_info.  */

static bool
xtensa_noce_conversion_profitable_p (rtx_insn *seq,
 struct noce_if_info *if_info)
{
  unsigned int cost, original_cost;
  bool speed_p;
  rtx_insn *insn;

  speed_p = if_info->speed_p;  /* of TEST_BB */

  /* Estimate the cost for the replacing sequence.  */
  cost = 0;
  for (insn = seq; insn; insn = NEXT_INSN (insn))
if (active_insn_p (insn))
  cost += xtensa_insn_cost (insn, speed_p);

  /* Short circuit and margins if optimiziing for speed.  */
  if (speed_p)
return cost <= if_info->max_seq_cost;

  /* Estimate the cost for the original sequence if optimizing for
 size.  */
  original_cost = xtensa_insn_cost (if_info->jump, speed_p);
  speed_p = optimize_bb_for_speed_p (if_info->then_bb);
  FOR_BB_INSNS (if_info->then_bb, insn)
if (active_insn_p (insn))
  original_cost += xtensa_insn_cost (insn, speed_p);
  if (if_info->else_bb)
{
  speed_p = optimize_bb_for_speed_p (if_info->else_bb);
  FOR_BB_INSNS (if_info->else_bb, insn)
if (active_insn_p (insn))
  original_cost += xtensa_insn_cost (insn, speed_p);
}

  return cost <= original_cost;
}


Just test mailbox

2023-01-11 Thread ijinma--- via Gcc-patches
Sorry, I just want to test whether my mailbox function is correct. Please 
ignore it. Thank you.




Re: Just test mailbox

2023-01-11 Thread ijinma--- via Gcc-patches
new one.



 Replied Message 
| From | iji...@yeah.net |
| Date | 01/12/2023 11:39 |
| To | gcc-patches |
| Cc | |
| Subject | Just test mailbox |
Sorry, I just want to test whether my mailbox function is correct. Please 
ignore it. Thank you.




Re: Just test mailbox

2023-01-11 Thread ijinma--- via Gcc-patches
one more.



 Replied Message 
| From | iji...@yeah.net |
| Date | 01/12/2023 12:06 |
| To | gcc-patches、MarJin |
| Cc | |
| Subject | Re: Just test mailbox |
new one.



 Replied Message 
| From | iji...@yeah.net |
| Date | 01/12/2023 11:39 |
| To | gcc-patches |
| Cc | |
| Subject | Just test mailbox |
Sorry, I just want to test whether my mailbox function is correct. Please 
ignore it. Thank you.




Re: [PATCH v2] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread jinma via Gcc-patches
From e4ce8e825c145d74e6b9827f972629548e39f118 Mon Sep 17 00:00:00 2001
From: Jin Ma 
Date: Wed, 11 Jan 2023 19:13:27 +0800
Subject: [PATCH] [RISCV] Add 'Zfa' extension according to riscv-isa-manual
This patch adds the 'Zfa' extension for riscv, which is based on:
( 
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
 )
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.
The Wiki Page (details):
( https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa )
The binutils-gdb for 'Zfa' extension:
( https://sourceware.org/pipermail/binutils/2022-September/122938.html )
gcc/ChangeLog:
 * common/config/riscv/riscv-common.cc:
 * config/riscv/constraints.md (Zf):
 * config/riscv/predicates.md:
 * config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2):
 (AVAIL):
 (RISCV_ATYPE_SF):
 (RISCV_ATYPE_DF):
 (RISCV_FTYPE_ATYPES2):
 * config/riscv/riscv-ftypes.def (2):
 * config/riscv/riscv-opts.h (MASK_ZFA):
 (TARGET_ZFA):
 * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli):
 * config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
 (riscv_cannot_force_const_mem):
 (riscv_const_insns):
 (riscv_legitimize_const_move):
 (riscv_split_64bit_move_p):
 (riscv_output_move):
 (riscv_memmodel_needs_release_fence):
 (riscv_print_operand):
 (riscv_secondary_memory_needed):
 * config/riscv/riscv.h (GP_REG_RTX_P):
 * config/riscv/riscv.md (riscv_fminm3):
 (riscv_fmaxm3):
 (fix_truncdfsi2_zfa):
 (round2):
 (rint2):
 (f_quiet4_zfa):
 * config/riscv/riscv.opt:
gcc/testsuite/ChangeLog:
 * gcc.target/riscv/zfa-fcvtmod.c: New test.
 * gcc.target/riscv/zfa-fleq-fltq.c: New test.
 * gcc.target/riscv/zfa-fli-zfh.c: New test.
 * gcc.target/riscv/zfa-fli.c: New test.
 * gcc.target/riscv/zfa-fminm-fmaxm.c: New test.
 * gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
 * gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc | 4 +
 gcc/config/riscv/constraints.md | 7 ++
 gcc/config/riscv/predicates.md | 4 +
 gcc/config/riscv/riscv-builtins.cc | 11 ++
 gcc/config/riscv/riscv-ftypes.def | 2 +
 gcc/config/riscv/riscv-opts.h | 3 +
 gcc/config/riscv/riscv-protos.h | 1 +
 gcc/config/riscv/riscv.cc | 105 +++-
 gcc/config/riscv/riscv.h | 1 +
 gcc/config/riscv/riscv.md | 114 ++
 gcc/config/riscv/riscv.opt | 4 +
 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c | 12 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c | 20 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c | 42 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli.c | 80 
 .../gcc.target/riscv/zfa-fminm-fmaxm.c | 25 
 .../gcc.target/riscv/zfa-fmovh-fmovp.c | 11 ++
 gcc/testsuite/gcc.target/riscv/zfa-fround.c | 25 
 18 files changed, 448 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fminm-fmaxm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0a89fdaffe2..cccec12975c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 {"zfh", ISA_SPEC_CLASS_NONE, 1, 0},
 {"zfhmin", ISA_SPEC_CLASS_NONE, 1, 0},
+ {"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
+
 {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1242,6 +1244,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 {"zfhmin", &gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
 {"zfh", &gcc_options::x_riscv_zf_subext, MASK_ZFH},
+ {"zfa", &gcc_options::x_riscv_zf_subext, MASK_ZFA},
+
 {"zmmul", &gcc_options::x_riscv_zm_subext, MASK_ZMMUL},
 {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 51cffb2bcb6..2fd407b1d9c 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -110,6 +110,13 @@ (define_constraint "T"
 (and (match_operand 0 "move_operand")
 (match_test "CONSTANT_P (op)")))
+;; Zfa constraints.
+
+(define_constraint "Zf"
+ "A floating point number that can be loaded using instruction `fli` in zfa."
+ (and (match_code "const_double")
+ (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
+
 ;; Vector constraints.
 (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a5a49bf7c0..0e8cf3b3708 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -149,6 +149,10 @@ (define_predicate "move_operand"
 case CONST_PO

[PATCH 1/2] xtensa: Tune "*btrue" insn pattern

2023-01-11 Thread Takayuki 'January June' Suwa via Gcc-patches
This branch instruction has short encoding if EQ/NE comparison against
immediate zero when the Code Density Option is enabled, but its "length"
attribute was only for normal encoding.  This patch fixes it.

This patch also prevents undesireable replacement the comparison immediate
zero of the instruction (short encoding, as mentioned above) with a
register that has value of zero (normal encoding) by the postreload pass.

gcc/ChangeLog:

* config/xtensa/xtensa.md (*btrue):
Correct value of the attribute "length" that depends on
TARGET_DENSITY and operands, and add '?' character to the register
constraint of the compared operand.
---
 gcc/config/xtensa/xtensa.md | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index db1d68ee658..b4989832169 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1679,7 +1679,7 @@
   [(set (pc)
(if_then_else (match_operator 3 "branch_operator"
[(match_operand:SI 0 "register_operand" "r,r")
-(match_operand:SI 1 "branch_operand" "K,r")])
+(match_operand:SI 1 "branch_operand" "K,?r")])
  (label_ref (match_operand 2 "" ""))
  (pc)))]
   ""
@@ -1688,7 +1688,14 @@
 }
   [(set_attr "type""jump,jump")
(set_attr "mode""none")
-   (set_attr "length"  "3,3")])
+   (set (attr "length")
+(if_then_else (match_test "TARGET_DENSITY
+  && CONST_INT_P (operands[1])
+  && INTVAL (operands[1]) == 0
+  && (GET_CODE (operands[3]) == EQ
+  || GET_CODE (operands[3]) == NE)")
+  (const_int 2)
+  (const_int 3)))])
 
 (define_insn "*ubtrue"
   [(set (pc)
-- 
2.30.2


[PATCH 2/2] xtensa: Optimize ctzsi2 and ffssi2 a bit

2023-01-11 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch saves one byte when the Code Density Option is enabled,

gcc/ChangeLog:

* config/xtensa/xtensa.md (ctzsi2, ffssi2):
Rearrange the emitting codes.
---
 gcc/config/xtensa/xtensa.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index b4989832169..764da63f91c 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -477,8 +477,8 @@
   emit_insn (gen_negsi2 (temp, operands[1]));
   emit_insn (gen_andsi3 (temp, temp, operands[1]));
   emit_insn (gen_clzsi2 (temp, temp));
-  emit_insn (gen_negsi2 (temp, temp));
-  emit_insn (gen_addsi3 (operands[0], temp, GEN_INT (31)));
+  emit_move_insn (operands[0], GEN_INT (31));
+  emit_insn (gen_subsi3 (operands[0], operands[0], temp));
   DONE;
 })
 
@@ -491,8 +491,8 @@
   emit_insn (gen_negsi2 (temp, operands[1]));
   emit_insn (gen_andsi3 (temp, temp, operands[1]));
   emit_insn (gen_clzsi2 (temp, temp));
-  emit_insn (gen_negsi2 (temp, temp));
-  emit_insn (gen_addsi3 (operands[0], temp, GEN_INT (32)));
+  emit_move_insn (operands[0], GEN_INT (32));
+  emit_insn (gen_subsi3 (operands[0], operands[0], temp));
   DONE;
 })
 
-- 
2.30.2


Re: [PATCH v2] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread jinma via Gcc-patches
From 4ee11f99d23d39d55bdadd86699ac35a60c79705 Mon Sep 17 00:00:00 2001
From: Jin Ma 
Date: Thu, 12 Jan 2023 12:51:37 +0800
Subject: [PATCH v2] [RISCV] Add 'Zfa' extension according to riscv-isa-manual
This patch adds the 'Zfa' extension for riscv, which is based on:
( 
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
 )
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.
The Wiki Page (details):
( https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa )
The binutils-gdb for 'Zfa' extension:
( https://sourceware.org/pipermail/binutils/2022-September/122938.html )
gcc/ChangeLog:
 * common/config/riscv/riscv-common.cc:
 * config/riscv/constraints.md (Zf):
 * config/riscv/predicates.md:
 * config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2):
 (AVAIL):
 (RISCV_ATYPE_SF):
 (RISCV_ATYPE_DF):
 (RISCV_FTYPE_ATYPES2):
 * config/riscv/riscv-ftypes.def (2):
 * config/riscv/riscv-opts.h (MASK_ZFA):
 (TARGET_ZFA):
 * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli):
 * config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
 (riscv_cannot_force_const_mem):
 (riscv_const_insns):
 (riscv_legitimize_const_move):
 (riscv_split_64bit_move_p):
 (riscv_output_move):
 (riscv_memmodel_needs_release_fence):
 (riscv_print_operand):
 (riscv_secondary_memory_needed):
 * config/riscv/riscv.h (GP_REG_RTX_P):
 * config/riscv/riscv.md (riscv_fminm3):
 (riscv_fmaxm3):
 (fix_truncdfsi2_zfa):
 (round2):
 (rint2):
 (f_quiet4_zfa):
 * config/riscv/riscv.opt:
gcc/testsuite/ChangeLog:
 * gcc.target/riscv/zfa-fcvtmod.c: New test.
 * gcc.target/riscv/zfa-fleq-fltq.c: New test.
 * gcc.target/riscv/zfa-fli-zfh.c: New test.
 * gcc.target/riscv/zfa-fli.c: New test.
 * gcc.target/riscv/zfa-fminm-fmaxm.c: New test.
 * gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
 * gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc | 4 +
 gcc/config/riscv/constraints.md | 7 ++
 gcc/config/riscv/predicates.md | 4 +
 gcc/config/riscv/riscv-builtins.cc | 11 ++
 gcc/config/riscv/riscv-ftypes.def | 2 +
 gcc/config/riscv/riscv-opts.h | 3 +
 gcc/config/riscv/riscv-protos.h | 1 +
 gcc/config/riscv/riscv.cc | 109 -
 gcc/config/riscv/riscv.h | 1 +
 gcc/config/riscv/riscv.md | 114 ++
 gcc/config/riscv/riscv.opt | 4 +
 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c | 12 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c | 20 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c | 42 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli.c | 80 
 .../gcc.target/riscv/zfa-fminm-fmaxm.c | 25 
 .../gcc.target/riscv/zfa-fmovh-fmovp.c | 11 ++
 gcc/testsuite/gcc.target/riscv/zfa-fround.c | 25 
 18 files changed, 452 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fminm-fmaxm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0a89fdaffe2..cccec12975c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 {"zfh", ISA_SPEC_CLASS_NONE, 1, 0},
 {"zfhmin", ISA_SPEC_CLASS_NONE, 1, 0},
+ {"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
+
 {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1242,6 +1244,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 {"zfhmin", &gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
 {"zfh", &gcc_options::x_riscv_zf_subext, MASK_ZFH},
+ {"zfa", &gcc_options::x_riscv_zf_subext, MASK_ZFA},
+
 {"zmmul", &gcc_options::x_riscv_zm_subext, MASK_ZMMUL},
 {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 51cffb2bcb6..2fd407b1d9c 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -110,6 +110,13 @@ (define_constraint "T"
 (and (match_operand 0 "move_operand")
 (match_test "CONSTANT_P (op)")))
+;; Zfa constraints.
+
+(define_constraint "Zf"
+ "A floating point number that can be loaded using instruction `fli` in zfa."
+ (and (match_code "const_double")
+ (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
+
 ;; Vector constraints.
 (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a5a49bf7c0..0e8cf3b3708 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -149,6 +149,10 @@ (define_predicate "move_operand"
 case CONS

Re: [PATCH v2] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread jinma via Gcc-patches
I am very sorry. There seems to be some unknown problems in my email, which
caused a problem with the format of the patch. I will deal with it as soon as 
possible. 
I am very sorry for the trouble.


[PATCH v3] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread jinma via Gcc-patches
From 4ee11f99d23d39d55bdadd86699ac35a60c79705 Mon Sep 17 00:00:00 2001
In-Reply-To: <77a18666-f71d-48e2-a502-a879b3eb6ccf.ji...@linux.alibaba.com>
References: <77a18666-f71d-48e2-a502-a879b3eb6ccf.ji...@linux.alibaba.com>
From: Jin Ma 
Date: Thu, 12 Jan 2023 12:51:37 +0800
Subject: [PATCH v3] [RISCV] Add 'Zfa' extension according to riscv-isa-manual
This patch adds the 'Zfa' extension for riscv, which is based on:
( 
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
 )
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.
The Wiki Page (details):
( https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa )
The binutils-gdb for 'Zfa' extension:
( https://sourceware.org/pipermail/binutils/2022-September/122938.html )
gcc/ChangeLog:
 * common/config/riscv/riscv-common.cc:
 * config/riscv/constraints.md (Zf):
 * config/riscv/predicates.md:
 * config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2):
 (AVAIL):
 (RISCV_ATYPE_SF):
 (RISCV_ATYPE_DF):
 (RISCV_FTYPE_ATYPES2):
 * config/riscv/riscv-ftypes.def (2):
 * config/riscv/riscv-opts.h (MASK_ZFA):
 (TARGET_ZFA):
 * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli):
 * config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
 (riscv_cannot_force_const_mem):
 (riscv_const_insns):
 (riscv_legitimize_const_move):
 (riscv_split_64bit_move_p):
 (riscv_output_move):
 (riscv_memmodel_needs_release_fence):
 (riscv_print_operand):
 (riscv_secondary_memory_needed):
 * config/riscv/riscv.h (GP_REG_RTX_P):
 * config/riscv/riscv.md (riscv_fminm3):
 (riscv_fmaxm3):
 (fix_truncdfsi2_zfa):
 (round2):
 (rint2):
 (f_quiet4_zfa):
 * config/riscv/riscv.opt:
gcc/testsuite/ChangeLog:
 * gcc.target/riscv/zfa-fcvtmod.c: New test.
 * gcc.target/riscv/zfa-fleq-fltq.c: New test.
 * gcc.target/riscv/zfa-fli-zfh.c: New test.
 * gcc.target/riscv/zfa-fli.c: New test.
 * gcc.target/riscv/zfa-fminm-fmaxm.c: New test.
 * gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
 * gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc | 4 +
 gcc/config/riscv/constraints.md | 7 ++
 gcc/config/riscv/predicates.md | 4 +
 gcc/config/riscv/riscv-builtins.cc | 11 ++
 gcc/config/riscv/riscv-ftypes.def | 2 +
 gcc/config/riscv/riscv-opts.h | 3 +
 gcc/config/riscv/riscv-protos.h | 1 +
 gcc/config/riscv/riscv.cc | 109 -
 gcc/config/riscv/riscv.h | 1 +
 gcc/config/riscv/riscv.md | 114 ++
 gcc/config/riscv/riscv.opt | 4 +
 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c | 12 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c | 20 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c | 42 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli.c | 80 
 .../gcc.target/riscv/zfa-fminm-fmaxm.c | 25 
 .../gcc.target/riscv/zfa-fmovh-fmovp.c | 11 ++
 gcc/testsuite/gcc.target/riscv/zfa-fround.c | 25 
 18 files changed, 452 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fminm-fmaxm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0a89fdaffe2..cccec12975c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 {"zfh", ISA_SPEC_CLASS_NONE, 1, 0},
 {"zfhmin", ISA_SPEC_CLASS_NONE, 1, 0},
+ {"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
+
 {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1242,6 +1244,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 {"zfhmin", &gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
 {"zfh", &gcc_options::x_riscv_zf_subext, MASK_ZFH},
+ {"zfa", &gcc_options::x_riscv_zf_subext, MASK_ZFA},
+
 {"zmmul", &gcc_options::x_riscv_zm_subext, MASK_ZMMUL},
 {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 51cffb2bcb6..2fd407b1d9c 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -110,6 +110,13 @@ (define_constraint "T"
 (and (match_operand 0 "move_operand")
 (match_test "CONSTANT_P (op)")))
+;; Zfa constraints.
+
+(define_constraint "Zf"
+ "A floating point number that can be loaded using instruction `fli` in zfa."
+ (and (match_code "const_double")
+ (match_test "(riscv_float_const_rtx_index_for_fli (op) != -1)")))
+
 ;; Vector constraints.
 (define_register_constraint "vr" "TARGET_VECTOR ? V_REGS : NO_REGS"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a5a49bf7c0..0e8cf3

Re: libstdc++: Fix deadlock in debug iterator increment [PR108288]

2023-01-11 Thread François Dumont via Gcc-patches
Small update for an obvious compilation issue and to review new test 
case that could have lead to an infinite loop if the increment issue was 
not detected.


I also forgot to ask if there is more chance for the instantiation to be 
elided when it is implemented like in the _Safe_local_iterator:

return { __cur, this->_M_sequence };

than in the _Safe_iterator:
return _Safe_iterator(__cur, this->_M_sequence);

In the case where the user code do not use it ?

Fully tested now, ok to commit ?

François

On 11/01/23 07:03, François Dumont wrote:

Thanks for fixing this.

Here is the extension of the fix to all post-increment/decrement 
operators we have on _GLIBCXX_DEBUG iterator.


I prefer to restore somehow previous implementation to continue to 
have _GLIBCXX_DEBUG post operators implemented in terms of normal post 
operators.


I also plan to remove the debug check in the _Safe_iterator 
constructor from base iterator to avoid the redundant check we have 
now. But I need to make sure first that we are never calling it with 
an unchecked base iterator. And it might not be the right moment to do 
such a change.


    libstdc++: Fix deadlock in debug local_iterator increment [PR108288]

    Complete fix on all _Safe_iterator post-increment and 
post-decrement implementations

    and on _Safe_local_iterator.

    libstdc++-v3/ChangeLog:

    * include/debug/safe_iterator.h 
(_Safe_iterator<>::operator++(int)): Extend deadlock fix to

    other iterator category.
    (_Safe_iterator<>::operator--(int)): Likewise.
    * include/debug/safe_local_iterator.h 
(_Safe_local_iterator<>::operator++(int)): Fix deadlock.
    * testsuite/util/debug/unordered_checks.h 
(invalid_local_iterator_pre_increment): New.

    (invalid_local_iterator_post_increment): New.
    * 
testsuite/23_containers/unordered_map/debug/invalid_local_iterator_post_increment_neg.cc:

    New test.
    * 
testsuite/23_containers/unordered_map/debug/invalid_local_iterator_pre_increment_neg.cc:

    New test.

Tested under Linux x86_64.

Ok to commit ?

François

On 06/01/23 12:54, Jonathan Wakely via Libstdc++ wrote:

Tested x86_64-linux. Pushed to trunk.

I think we should backport this too, after some soak time on trunk.

-- >8 --

With -fno-elide-constructors the debug iterator post-increment and
post-decrement operators are susceptible to deadlock. They take a mutex
lock and then return a temporary, which also attempts to take a lock to
attach itself to the sequence. If the return value and *this happen to
Note that the chosen mutex depends on the sequence so there is no need 
for conditional sentense here, it will necessarily be the same mutex.

collide and use the same mutex from the pool, then you get a deadlock
trying to lock a mutex that is already held by the current thread.


diff --git a/libstdc++-v3/include/debug/safe_iterator.h b/libstdc++-v3/include/debug/safe_iterator.h
index f9068eaf8d6..f8b46826b7c 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -129,14 +129,6 @@ namespace __gnu_debug
 	typename _Sequence::_Base::iterator,
 	typename _Sequence::_Base::const_iterator>::__type _OtherIterator;
 
-  struct _Attach_single
-  { };
-
-  _Safe_iterator(_Iterator __i, _Safe_sequence_base* __seq, _Attach_single)
-  _GLIBCXX_NOEXCEPT
-  : _Iter_base(__i)
-  { _M_attach_single(__seq); }
-
 public:
   typedef _Iterator	iterator_type;
   typedef typename _Traits::iterator_category	iterator_category;
@@ -347,8 +339,13 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
 			  _M_message(__msg_bad_inc)
 			  ._M_iterator(*this, "this"));
-	__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-	return _Safe_iterator(base()++, this->_M_sequence, _Attach_single());
+	_Iter_base __cur;
+	{
+	  __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+	  __cur = base()++;
+	}
+
+	return _Safe_iterator(__cur, this->_M_sequence);
   }
 
   // -- Utilities --
@@ -520,12 +517,6 @@ namespace __gnu_debug
 
 protected:
   typedef typename _Safe_base::_OtherIterator _OtherIterator;
-  typedef typename _Safe_base::_Attach_single _Attach_single;
-
-  _Safe_iterator(_Iterator __i, _Safe_sequence_base* __seq, _Attach_single)
-  _GLIBCXX_NOEXCEPT
-  : _Safe_base(__i, __seq, _Attach_single())
-  { }
 
 public:
   /// @post the iterator is singular and unattached
@@ -609,9 +600,13 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
 			  _M_message(__msg_bad_inc)
 			  ._M_iterator(*this, "this"));
-	__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-	return _Safe_iterator(this->base()++, this->_M_sequence,
-			  _Attach_single());
+	_Iterator __cur;
+	{
+	  __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+	  __cur = this->base()++;
+	}
+
+	return _Safe_iterator(__cur, th

[PATCH v4] [RISCV] Add 'Zfa' extension according to riscv-isa-manual

2023-01-11 Thread Jin Ma via Gcc-patches
This patch adds the 'Zfa' extension for riscv, which is based on:
( 
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
 )
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.

The Wiki Page (details):
( https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa )

The binutils-gdb for 'Zfa' extension:
( https://sourceware.org/pipermail/binutils/2022-September/122938.html )

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
* config/riscv/constraints.md (Zf):
* config/riscv/predicates.md:
* config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2):
(AVAIL):
(RISCV_ATYPE_SF):
(RISCV_ATYPE_DF):
(RISCV_FTYPE_ATYPES2):
* config/riscv/riscv-ftypes.def (2):
* config/riscv/riscv-opts.h (MASK_ZFA):
(TARGET_ZFA):
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli):
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
(riscv_cannot_force_const_mem):
(riscv_const_insns):
(riscv_legitimize_const_move):
(riscv_split_64bit_move_p):
(riscv_output_move):
(riscv_memmodel_needs_release_fence):
(riscv_print_operand):
(riscv_secondary_memory_needed):
* config/riscv/riscv.h (GP_REG_RTX_P):
* config/riscv/riscv.md (riscv_fminm3):
(riscv_fmaxm3):
(fix_truncdfsi2_zfa):
(round2):
(rint2):
(f_quiet4_zfa):
* config/riscv/riscv.opt:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fcvtmod.c: New test.
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fminm-fmaxm.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   4 +
 gcc/config/riscv/constraints.md   |   7 ++
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-builtins.cc|  11 ++
 gcc/config/riscv/riscv-ftypes.def |   2 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv.cc | 109 -
 gcc/config/riscv/riscv.h  |   1 +
 gcc/config/riscv/riscv.md | 114 ++
 gcc/config/riscv/riscv.opt|   4 +
 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c  |  12 ++
 .../gcc.target/riscv/zfa-fleq-fltq.c  |  20 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  42 +++
 gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  80 
 .../gcc.target/riscv/zfa-fminm-fmaxm.c|  25 
 .../gcc.target/riscv/zfa-fmovh-fmovp.c|  11 ++
 gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  25 
 18 files changed, 452 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fcvtmod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fminm-fmaxm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fround.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0a89fdaffe2..cccec12975c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -217,6 +217,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfa", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1242,6 +1244,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},
 
+  {"zfa",   &gcc_options::x_riscv_zf_subext, MASK_ZFA},
+
   {"zmmul", &gcc_options::x_riscv_zm_subext, MASK_ZMMUL},
 
   {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 51cffb2bcb6..2fd407b1d9c 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -110,6 +110,13 @@ (define_constraint "T"
   (and (match_operand 0 "move_operand")
(match_test "CONSTANT_P (op)")))
 
+;; Zfa constraints.
+
+(define_constraint "Zf"
+  "A floating point number that can be loaded using instruction `fli` in zfa."
+  (and (match_code "const_double")
+   (match_test "(riscv_float_const_rtx_index_for_fli (op) != -

Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Richard Biener via Gcc-patches
On Wed, Jan 11, 2023 at 8:09 PM Segher Boessenkool
 wrote:
>
> On Wed, Jan 11, 2023 at 07:39:29PM +0100, Richard Biener wrote:
> > Like if they cannot even build their target libraries aka their build will 
> > fail.  It would be nice to identify those and, say, make at least -mlra 
> > available to all ports that currently do not have a way to enable LRA?
>
> It is up to the target maintainers to make such support, it is a machine
> flag after all (-m are machine flags, -f are more general flags).
>
> There has been ample warning, see 
> for example.  GCC 13 release will be six years after that, I'd hope that
> that is enough.
>
> Just using
>   targetm.lra_p = default_lra_p;
> is enough to test.  I don't have a setup to build all targets (that
> requires target headers, to begin with), and it is up to the target
> maintainers to decide how they want things fixed anyway.
>
> I'll put up a preliminary branch for the generic patches, but let me
> update it to trunk first :-)

Just saying that the changes.html note has not much information but instead will
spread FUD without indicating which ports would be dysfunctional after removing
reload support (aka will even no longer build).  So I'd say we don't want this
note in changes.html in the proposed form.

Richard.

>
>
> Segher


Re: [PATCH] wwwdocs: Note that old reload is deprecated

2023-01-11 Thread Richard Biener via Gcc-patches
On Thu, Jan 12, 2023 at 8:44 AM Richard Biener
 wrote:
>
> On Wed, Jan 11, 2023 at 8:09 PM Segher Boessenkool
>  wrote:
> >
> > On Wed, Jan 11, 2023 at 07:39:29PM +0100, Richard Biener wrote:
> > > Like if they cannot even build their target libraries aka their build 
> > > will fail.  It would be nice to identify those and, say, make at least 
> > > -mlra available to all ports that currently do not have a way to enable 
> > > LRA?
> >
> > It is up to the target maintainers to make such support, it is a machine
> > flag after all (-m are machine flags, -f are more general flags).
> >
> > There has been ample warning, see 
> > for example.  GCC 13 release will be six years after that, I'd hope that
> > that is enough.
> >
> > Just using
> >   targetm.lra_p = default_lra_p;
> > is enough to test.  I don't have a setup to build all targets (that
> > requires target headers, to begin with), and it is up to the target
> > maintainers to decide how they want things fixed anyway.
> >
> > I'll put up a preliminary branch for the generic patches, but let me
> > update it to trunk first :-)
>
> Just saying that the changes.html note has not much information but instead 
> will
> spread FUD without indicating which ports would be dysfunctional after 
> removing
> reload support (aka will even no longer build).  So I'd say we don't want this
> note in changes.html in the proposed form.

Btw, the following is the ports that default to reload and have no command line
option to switch to LRA.

config/alpha/alpha.cc:#define TARGET_LRA_P hook_bool_void_false
config/avr/avr.cc:#define TARGET_LRA_P hook_bool_void_false
config/bfin/bfin.cc:#define TARGET_LRA_P hook_bool_void_false
config/c6x/c6x.cc:#define TARGET_LRA_P hook_bool_void_false
config/cris/cris.cc:#define TARGET_LRA_P hook_bool_void_false
config/epiphany/epiphany.cc:#define TARGET_LRA_P hook_bool_void_false
config/fr30/fr30.cc:#define TARGET_LRA_P hook_bool_void_false
config/frv/frv.cc:#define TARGET_LRA_P hook_bool_void_false
config/h8300/h8300.cc:#define TARGET_LRA_P hook_bool_void_false
config/ia64/ia64.cc:#define TARGET_LRA_P hook_bool_void_false
config/iq2000/iq2000.cc:#define TARGET_LRA_P hook_bool_void_false
config/lm32/lm32.cc:#define TARGET_LRA_P hook_bool_void_false
config/m32c/m32c.cc:#define TARGET_LRA_P hook_bool_void_false
config/m32r/m32r.cc:#define TARGET_LRA_P hook_bool_void_false
config/m68k/m68k.cc:#define TARGET_LRA_P hook_bool_void_false
config/mcore/mcore.cc:#define TARGET_LRA_P hook_bool_void_false
config/microblaze/microblaze.cc:#define TARGET_LRA_P hook_bool_void_false
config/mmix/mmix.cc:#define TARGET_LRA_P hook_bool_void_false
config/mn10300/mn10300.cc:#define TARGET_LRA_P hook_bool_void_false
config/moxie/moxie.cc:#define TARGET_LRA_P hook_bool_void_false
config/msp430/msp430.cc:#define TARGET_LRA_P hook_bool_void_false
config/nvptx/nvptx.cc:#define TARGET_LRA_P hook_bool_void_false
config/pa/pa.cc:#define TARGET_LRA_P hook_bool_void_false
config/rl78/rl78.cc:#define TARGET_LRA_P hook_bool_void_false
config/stormy16/stormy16.cc:#define TARGET_LRA_P hook_bool_void_false
config/visium/visium.cc:#define TARGET_LRA_P hook_bool_void_false


Re: [PATCH] gimple-fold.h: Add missing gimple-iterator.h

2023-01-11 Thread Richard Biener via Gcc-patches
On Thu, Jan 12, 2023 at 2:46 AM Palmer Dabbelt  wrote:
>
> As of 6f5b06032eb ("Finish gimple_build API enhancement") gimple-fold.h
> uses some of the declarations from gimple-iterator.h, which causes
> issues when building Linux's stackprotector plugin.
>
> gcc/ChangeLog:
>
> * gimple-fold.h: Add gimple-iterator.h include.
>
> ---
>
> I'm not sure if this should instead be fixed in Linux by reordering the
> includes along the lines of
>
> diff --git a/scripts/gcc-plugins/gcc-common.h 
> b/scripts/gcc-plugins/gcc-common.h
> index 9a1895747b15..2c3a3079128a 100644
> --- a/scripts/gcc-plugins/gcc-common.h
> +++ b/scripts/gcc-plugins/gcc-common.h
> @@ -72,6 +72,7 @@
>  #include "stor-layout.h"
>  #include "internal-fn.h"
>  #include "gimple-expr.h"
> +#include "gimple-iterator.h"
>  #include "gimple-fold.h"
>  #include "context.h"
>  #include "tree-ssa-alias.h"
> @@ -88,7 +89,6 @@
>  #include "gimple.h"
>  #include "tree-phinodes.h"
>  #include "tree-cfg.h"
> -#include "gimple-iterator.h"
>  #include "gimple-ssa.h"
>  #include "ssa-iterators.h"

The above change is OK.

> but I figured it was slightly easier for users to keep these compatible.
> It looks like many GCC-internal uses of gimple-fold.h already have the
> gimple-iterator.h include right before, though, so not sure if that's
> how things are meant to be.
> ---
>  gcc/gimple-fold.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 2fd58db9a2e..66bee2b75df 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_GIMPLE_FOLD_H
>  #define GCC_GIMPLE_FOLD_H
>
> +#include "gimple-iterator.h"
> +

But this is not - we try to avoid #include directives in headers, we want the
include dependences to be "flat"

>  extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
>  extern tree canonicalize_constructor_val (tree, tree);
>  extern tree get_symbol_constant_value (tree);
> --
> 2.39.0
>