Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-06 Thread Richard Biener via Gcc-patches
On Thu, 6 Jul 2023, Jan Hubicka wrote:

> Hi,
> original scale_loop_profile was implemented to only handle very simple loops
> produced by vectorizer at that time (basically loops with only one exit and no
> subloops). It also has not been updated to new profile-count API very 
> carefully.
> Since I want to use it from loop peeling and unlooping, I need the
> function to at least not get profile worse on general loops.
> 
> The function does two thigs
>  1) scales down the loop profile by a given probability.
> This is useful, for example, to scale down profile after peeling when loop
> body is executed less often than before
>  2) after scaling is done and if profile indicates too large iteration
> count update profile to cap iteration count by ITERATION_BOUND parameter.
> 
> Step 1 is easy and unchanged.
> 
> I changed ITERATION_BOUND to be actual bound on number of iterations as
> used elsewhere (i.e. number of executions of latch edge) rather then
> number of iterations + 1 as it was before.
> 
> To do 2) one needs to do the following
>   a) scale own loop profile so frquency o header is at most
>  the sum of in-edge counts * (iteration_bound + 1)
>   b) update loop exit probabilities so their count is the same
>  as before scaling.
>   c) reduce frequencies of basic blocks after loop exit
> 
> old code did b) by setting probability to 1 / iteration_bound which is
> correctly only of the basic block containing exit executes precisely one per
> iteration (it is not insie other conditional or inner loop).  This is fixed
> now by using set_edge_probability_and_rescale_others
> 
> aldo c) was implemented only for special case when the exit was just before
> latch bacis block.  I now use dominance info to get right some of addional
> case.
> 
> I still did not try to do anything for multiple exit loops, though the
> implementatoin could be generalized.
> 
> Bootstrapped/regtested x86_64-linux.  Plan to cmmit it tonight if there
> are no complains.

Looks good, but I wonder what we can do to at least make the
multiple exit case behave reasonably?  The vectorizer keeps track
of a "canonical" exit, would it be possible to pass in the main
exit edge and use that instead of single_exit (), would other
exits then behave somewhat reasonable or would we totally screw
things up here?  That is, the "canonical" exit would be the
counting exit while the other exits are on data driven conditions
and thus wouldn't change probability when we reduce the number
of iterations(?)

Richard.

> gcc/ChangeLog:
> 
>   * cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
>   probability update to be safe on loops with subloops.
>   Make bound parameter to be iteration bound.
>   * tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
>   of scale_loop_profile.
>   * tree-vect-loop-manip.cc (vect_do_peeling): Likewise.
> 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 6e09dcbb0b1..524b979a546 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -499,7 +499,7 @@ scale_loop_frequencies (class loop *loop, 
> profile_probability p)
>  }
>  
>  /* Scale profile in LOOP by P.
> -   If ITERATION_BOUND is non-zero, scale even further if loop is predicted
> +   If ITERATION_BOUND is not -1, scale even further if loop is predicted
> to iterate too many times.
> Before caling this function, preheader block profile should be already
> scaled to final count.  This is necessary because loop iterations are
> @@ -510,106 +510,123 @@ void
>  scale_loop_profile (class loop *loop, profile_probability p,
>   gcov_type iteration_bound)
>  {
> -  edge e, preheader_e;
> -  edge_iterator ei;
> -
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> +  if (!(p == profile_probability::always ()))
>  {
> -  fprintf (dump_file, ";; Scaling loop %i with scale ",
> -loop->num);
> -  p.dump (dump_file);
> -  fprintf (dump_file, " bounding iterations to %i\n",
> -(int)iteration_bound);
> -}
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fprintf (dump_file, ";; Scaling loop %i with scale ",
> +loop->num);
> +   p.dump (dump_file);
> +   fprintf (dump_file, "\n");
> + }
>  
> -  /* Scale the probabilities.  */
> -  scale_loop_frequencies (loop, p);
> +  /* Scale the probabilities.  */
> +  scale_loop_frequencies (loop, p);
> +}
>  
> -  if (iteration_bound == 0)
> +  if (iteration_bound == -1)
>  return;
>  
>gcov_type iterations = expected_loop_iterations_unbounded (loop, NULL, 
> true);
> +  if (iterations == -1)
> +return;
>  
>if (dump_file && (dump_flags & TDF_DETAILS))
>  {
> -  fprintf (dump_file, ";; guessed iterations after scaling %i\n",
> -(int)iterations);
> +  fprintf (dump_file,
> +";; guessed iterations of loop %i:%i new upper bound %i:\n",
> +loop->num,
> +

[Bug libstdc++/110574] --enable-cstdio=stdio_pure is incompatible with LFS

2023-07-06 Thread keithp at keithp dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110574

--- Comment #5 from keithp at keithp dot com  ---
Seems like using fseeko would be a reasonable choice here -- while it's not in
ISO C, it is in POSIX 2017.

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Richard Biener via Gcc-patches



> Am 06.07.2023 um 19:50 schrieb Richard Sandiford :
> 
> Richard Biener via Gcc-patches  writes:
>>> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> If a loop is unrolled by n times during vectoriation, two steps are used to
>>> calculate the induction variable:
>>>  - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>>>  - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>>> 
>>> This patch calculates an extra vec_n to replace vec_loop:
>>>  vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>>> 
>>> So that we can save the large step register and related operations.
>> 
>> OK.  It would be nice to avoid the dead stmts created earlier though.
> 
> FWIW, I still don't think we should do this.  Part of the point of
> unrolling is to shorten loop-carried dependencies, whereas this patch
> is going in the opposite direction.

Note ncopies can be >1 without additional unrolling.  With non VLA vectors all 
of the updates will be constant folded btw.

Richard 

> Richard
> 
>> 
>> Thanks,
>> Richard.
>> 
>>> gcc/ChangeLog:
>>> 
>>>PR tree-optimization/110449
>>>* tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
>>>vec_loop for the unrolled loop.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>* gcc.target/aarch64/pr110449.c: New testcase.
>>> ---
>>> gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +
>>> gcc/tree-vect-loop.cc   | 21 +--
>>> 2 files changed, 58 insertions(+), 3 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c
>>> 
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c 
>>> b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>>> new file mode 100644
>>> index 000..bb3b6dcfe08
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>>> @@ -0,0 +1,40 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param 
>>> aarch64-vect-unroll-limit=2" } */
>>> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */
>>> +
>>> +/* Calcualte the vectorized induction with smaller step for an unrolled 
>>> loop.
>>> +
>>> +   before (suggested_unroll_factor=2):
>>> + fmovs30, 8.0e+0
>>> + fmovs31, 4.0e+0
>>> + dup v27.4s, v30.s[0]
>>> + dup v28.4s, v31.s[0]
>>> + .L6:
>>> + mov v30.16b, v31.16b
>>> + faddv31.4s, v31.4s, v27.4s
>>> + faddv29.4s, v30.4s, v28.4s
>>> + stp q30, q29, [x0]
>>> + add x0, x0, 32
>>> + cmp x1, x0
>>> + bne .L6
>>> +
>>> +   after:
>>> + fmovs31, 4.0e+0
>>> + dup v29.4s, v31.s[0]
>>> + .L6:
>>> + faddv30.4s, v31.4s, v29.4s
>>> + stp q31, q30, [x0]
>>> + add x0, x0, 32
>>> + faddv31.4s, v29.4s, v30.4s
>>> + cmp x0, x1
>>> + bne .L6  */
>>> +
>>> +void
>>> +foo2 (float *arr, float freq, float step)
>>> +{
>>> +  for (int i = 0; i < 1024; i++)
>>> +{
>>> +  arr[i] = freq;
>>> +  freq += step;
>>> +}
>>> +}
>>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>>> index 3b46c58a8d8..706ecbffd0c 100644
>>> --- a/gcc/tree-vect-loop.cc
>>> +++ b/gcc/tree-vect-loop.cc
>>> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>>   new_vec, step_vectype, NULL);
>>> 
>>>   vec_def = induc_def;
>>> -  for (i = 1; i < ncopies; i++)
>>> +  for (i = 1; i < ncopies + 1; i++)
>>>{
>>>  /* vec_i = vec_prev + vec_step  */
>>>  gimple_seq stmts = NULL;
>>> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>>  vec_def = gimple_convert (, vectype, vec_def);
>>> 
>>>  gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
>>> - new_stmt = SSA_NAME_DEF_STMT (vec_def);
>>> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>>> + if (i < ncopies)
>>> +   {
>>> + new_stmt = SSA_NAME_DEF_STMT (vec_def);
>>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>>> +   }
>>> + else
>>> +   {
>>> + /* vec_1 = vec_iv + (VF/n * S)
>>> +vec_2 = vec_1 + (VF/n * S)
>>> +...
>>> +vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop
>>> +
>>> +vec_n is used as vec_loop to save the large step register 
>>> and
>>> +related operations.  */
>>> + add_phi_arg (induction_phi, vec_def, loop_latch_edge 
>>> (iv_loop),
>>> +  UNKNOWN_LOCATION);
>>> +   }
>>>}
>>> }
>>> 
>>> --
>>> 2.34.1


[PATCH V2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-06 Thread liuhongt via Gcc-patches
> Please split the above pattern into two, one emitting UNSPEC_IEEE_MAX
> and the other emitting UNSPEC_IEEE_MIN.
Splitted.

> The test involves blendv instruction, which is SSE4.1, so it is
> pointless to test it without -msse4.1. Please add -msse4.1 instead of
> -march=x86_64 and use sse4_runtime target selector, as is the case
> with gcc.target/i386/pr90358.c.
Changed.

> Please also use -msse4.1 instead of -march here. With -mfpmath=sse,
> the test is valid also for 32bit targets, you should use -msseregparm
> additional options for ia32 (please see gcc.target/i386/pr43546.c
> testcase) in the same way as -mregparm to pass SSE arguments in
> registers.
32-bit target still failed to do condition elimination for DFmode due to
below code in rtx_cost

  /* A size N times larger than UNITS_PER_WORD likely needs N times as
 many insns, taking N times as long.  */
  factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1;

It looks like a separate issue for DFmode operation under 32-bit target.

I've enable 32-bit for the testcase, but only scan for minss/maxss
currently.

Here's updated patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.

This patch adds pre_reload splitter to detect the min/max pattern.

Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (*ieee_max3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min3_1): Ditto, but for fp min pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.
---
 gcc/config/i386/i386.md  | 43 +
 gcc/testsuite/g++.target/i386/pr110170.C | 78 
 gcc/testsuite/gcc.target/i386/pr110170.c | 21 +++
 3 files changed, 142 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/pr110170.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110170.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a82cc353cfd..6f415f899ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23163,6 +23163,49 @@ (define_insn "*ieee_s3"
(set_attr "type" "sseadd")
(set_attr "mode" "")])
 
+;; Operands order in min/max instruction matters for signed zero and NANs.
+(define_insn_and_split "*ieee_max3_1"
+  [(set (match_operand:MODEF 0 "register_operand")
+   (unspec:MODEF
+ [(match_operand:MODEF 1 "register_operand")
+  (match_operand:MODEF 2 "register_operand")
+  (lt:MODEF
+(match_operand:MODEF 3 "register_operand")
+(match_operand:MODEF 4 "register_operand"))]
+ UNSPEC_BLENDV))]
+  "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
+  && (rtx_equal_p (operands[1], operands[3])
+  && rtx_equal_p (operands[2], operands[4]))
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:MODEF
+ [(match_dup 2)
+  (match_dup 1)]
+UNSPEC_IEEE_MAX))])
+
+(define_insn_and_split "*ieee_min3_1"
+  [(set (match_operand:MODEF 0 "register_operand")
+   (unspec:MODEF
+ [(match_operand:MODEF 1 "register_operand")
+  (match_operand:MODEF 2 "register_operand")
+  (lt:MODEF
+(match_operand:MODEF 3 "register_operand")
+(match_operand:MODEF 4 "register_operand"))]
+ UNSPEC_BLENDV))]
+  "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
+  && (rtx_equal_p (operands[1], operands[4])
+  && rtx_equal_p (operands[2], operands[3]))
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:MODEF
+ [(match_dup 2)
+  (match_dup 1)]
+UNSPEC_IEEE_MIN))])
+
 ;; Make two stack loads independent:
 ;;   fld aa  fld aa
 ;;   fld %st(0) ->   fld bb
diff --git a/gcc/testsuite/g++.target/i386/pr110170.C 
b/gcc/testsuite/g++.target/i386/pr110170.C
new file mode 100644
index 000..5d6842270d0
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr110170.C
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-options " -O2 -msse4.1 -mfpmath=sse -std=gnu++20" } */
+#include 
+
+void
+__attribute__((noinline))
+__cond_swap(double* __x, double* __y) {
+  bool __r = (*__x < *__y);
+  auto __tmp = __r ? *__x : *__y;
+  *__y = __r ? *__y : *__x;
+  *__x = __tmp;
+}
+
+auto test1() {
+double nan = -0.0;
+double x = 0.0;
+__cond_swap(, );
+return x == -0.0 && nan == 0.0;
+}
+
+auto test1r() {
+double nan = NAN;
+double x = 1.0;
+__cond_swap(, );
+return isnan(x) && signbit(x) == 0 && nan == 1.0;
+}
+
+auto 

[Bug rtl-optimization/104914] [MIPS] wrong comparison with scrabbled int value

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

--- Comment #12 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #11)
> But I don't have any other notes on my change (nor a testcase).

So I found some notes and it is similar but still different.
We were expanding:
;; insn.j_format.target = D.21597_19;
Into
(insn 25 24 26 (set (reg:DI 220)
(lshiftrt:DI (reg:DI 196 [ D.21584 ])
(const_int 2 [0x2]))) arch/mips/kernel/jump_label.c:56 -1
 (nil))

(insn 26 25 0 (set (zero_extract:SI (reg/v:SI 208 [ insn ])
(const_int 26 [0x1a])
(const_int 6 [0x6]))
(subreg:SI (reg:DI 220) 4)) arch/mips/kernel/jump_label.c:56 -1
 (nil))

But the subreg there was incorrect.

In this case of this bug, the reg is DI rather than SI. I wonder why we have
that in the first place even though val is the size of SImode ...

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Kewen.Lin via Gcc-patches
on 2023/7/7 07:00, Peter Bergner wrote:
> On 7/6/23 5:54 PM, Peter Bergner wrote:
>> On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
>>> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
>>> @@ -0,0 +1,153 @@
>>> +/* { dg-do run { target { powerpc*-*-* } } } */
>>
>> powerpc*-*-* is the default for this test directory, so you can drop that,
>> but you need to disable this test for soft-float systems, so you probably 
>> want:
>>
>>   /* { dg-do run { target powerpc_fprs } } */
> 
> We actually want something like powerpc_fprs_hw, but that doesn't exist.
> 

Yeah, good point!  I noticed that we have a few test cases which need to
check soft-float env as well but they don't, I didn't find any related
issues have been reported, so I would assume that there are very few
actual testings on this area.  Based on this, I'm not sure if it's worthy
to add a new effective target for it.  Personally I'm happy with just using
powerpc_fprs here to keep it simple. :)

BR,
Kewen


[Bug rtl-optimization/104914] [MIPS] wrong comparison with scrabbled int value

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

--- Comment #11 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #10)
> Created attachment 55496 [details]
> old patch against GCC 4.7
> 
> I am trying to find my notes on this old patch but our internal bug system
> has moved a few times and the project looks archived even.
> But I am pretty sure this is related to the problem at hand.

(note I had another patch before that which renamed store_bit_field_1 to
store_bit_field_2).

The code is now in store_bit_field_using_insv.
Here:
  else
{
  tmp = gen_lowpart_if_possible (op_mode, value1);
  if (! tmp)
tmp = gen_lowpart (op_mode, force_reg (value_mode, value1));
}
  value1 = tmp;
}

But I don't have any other notes on my change (nor a testcase).

[Bug rtl-optimization/104914] [MIPS] wrong comparison with scrabbled int value

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

--- Comment #10 from Andrew Pinski  ---
Created attachment 55496
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55496=edit
old patch against GCC 4.7

I am trying to find my notes on this old patch but our internal bug system has
moved a few times and the project looks archived even.
But I am pretty sure this is related to the problem at hand.

[Bug rtl-optimization/67736] Wrong optimization with -fexpensive-optimizations on mips64el

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67736

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |5.3

[Bug rtl-optimization/104914] [MIPS] wrong comparison with scrabbled int value

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=67736

--- Comment #9 from Andrew Pinski  ---
(In reply to YunQiang Su from comment #8)
> (In reply to Andrew Pinski from comment #7)
> > The initial RTL has a signed extend in there:
> > 
> > 
> > (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
> > (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4)))
> > "/app/example.cpp":7:29 -1
> >  (nil))
> > (jump_insn 23 20 24 2 (set (pc)
> > (if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
> > (const_int 0 [0]))
> > (label_ref 32)
> > (pc))) "/app/example.cpp":8:5 -1
> >  (int_list:REG_BR_PROB 440234148 (nil))
> >  -> 32)
> > 
> > 
> > Before combine also looks fine:
> > (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
> > (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4)))
> > "/app/example.cpp":7:29 235 {extendsidi2}
> >  (nil))
> 
> Yes. I noticed it. while in mips.md,  extendsidi2 is expanded to no
> instructions at all.

Right then the le should had a truncation before the use of SI mode here ...

[Bug rtl-optimization/104914] [MIPS] wrong comparison with scrabbled int value

2023-07-06 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

--- Comment #8 from YunQiang Su  ---
(In reply to Andrew Pinski from comment #7)
> The initial RTL has a signed extend in there:
> 
> 
> (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
> (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4)))
> "/app/example.cpp":7:29 -1
>  (nil))
> (jump_insn 23 20 24 2 (set (pc)
> (if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
> (const_int 0 [0]))
> (label_ref 32)
> (pc))) "/app/example.cpp":8:5 -1
>  (int_list:REG_BR_PROB 440234148 (nil))
>  -> 32)
> 
> 
> Before combine also looks fine:
> (insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
> (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4)))
> "/app/example.cpp":7:29 235 {extendsidi2}
>  (nil))

Yes. I noticed it. while in mips.md,  extendsidi2 is expanded to no
instructions at all.




```
;; Extension insns.
;; Those for integer source operand are ordered widest source type first.

;; When TARGET_64BIT, all SImode integer and accumulator registers
;; should already be in sign-extended form (see TARGET_TRULY_NOOP_TRUNCATION
;; and truncdisi2).  We can therefore get rid of register->register
;; instructions if we constrain the source to be in the same register as
;; the destination.
;;
;; Only the pre-reload scheduler sees the type of the register alternatives;
;; we split them into nothing before the post-reload scheduler runs.
;; These alternatives therefore have type "move" in order to reflect
;; what happens if the two pre-reload operands cannot be tied, and are
;; instead allocated two separate GPRs.  We don't distinguish between
;; the GPR and LO cases because we don't usually know during pre-reload
;; scheduling whether an operand will be LO or not.
(define_insn_and_split "extendsidi2"
  [(set (match_operand:DI 0 "register_operand" "=d,l,d")
(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))]
  "TARGET_64BIT"
  "@
   #
   #
   lw\t%0,%1"
  "&& reload_completed && register_operand (operands[1], VOIDmode)"
  [(const_int 0)]
{
  emit_note (NOTE_INSN_DELETED);
  DONE;
}
  [(set_attr "move_type" "move,move,load")
   (set_attr "mode" "DI")])
```

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Kewen.Lin via Gcc-patches
Hi Carl,

Some more minor comments are inline below on top of Peter's insightful
review comments.

on 2023/7/1 08:58, Carl Love wrote:
> 
> GCC maintainers:
> 
> Ver 2,  Went back thru the requirements and emails.  Not sure where I
> came up with the requirement for an overloaded version with double
> argument.  Removed the overloaded version with the double argument. 
> Added the macro to announce if the __builtin_set_fpscr_rn returns a
> void or a double with the FPSCR bits.  Updated the documentation file. 
> Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
> file.  Per request, the original test file functionality was not
> changed.  Just changed the name from test_fpscr_rn_builtin.c to 
> test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
> new test file, test_fpscr_rn_builtin_2.c.
> 
> The GLibC team requested a builtin to replace the mffscrn and
> mffscrniinline asm instructions in the GLibC code.  Previously there
> was discussion on adding builtins for the mffscrn instructions.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
> 
> In the end, it was felt that it would be to extend the existing
> __builtin_set_fpscr_rn builtin to return a double instead of a void
> type.  The desire is that we could have the functionality of the
> mffscrn and mffscrni instructions on older ISAs.  The two instructions
> were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
> needed functionality to set the RN field using the mffscrn and mffscrni
> instructions if ISA 3.0 is supported or fall back to using logical
> instructions to mask and set the bits for earlier ISAs.  The
> instructions return the current value of the FPSCR fields DRN, VE, OE,
> UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
> the new RN value provided.
> 
> The current __builtin_set_fpscr_rn builtin has a return type of void. 
> So, changing the return type to double and returning the  FPSCR fields
> DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> functionally equivalent of the mffscrn and mffscrni instructions.  Any
> current uses of the builtin would just ignore the return value yet any
> new uses could use the return value.  So the requirement is for the
> change to the __builtin_set_fpscr_rn builtin to be backwardly
> compatible and work for all ISAs.
> 
> The following patch changes the return type of the
>  __builtin_set_fpscr_rn builtin from void to double.  The return value
> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
> XE, NI, RN bit positions when the builtin is called.  The builtin then
> updated the RN field with the new value provided as an argument to the
> builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
> check that the builtin returns the current value of the FPSCR fields
> and then updates the RN field.
> 
> The GLibC team has reviewed the patch to make sure it met their needs
> as a drop in replacement for the inline asm mffscr and mffscrni
> statements in the GLibC code.  T
> 
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>Carl 
> 
> 
> --
> rs6000, __builtin_set_fpscr_rn add retrun value
> 
> Change the return value from void to double.  The return value consists of
> the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
> overloaded version which accepts a double argument.
> 
> The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
> double reterun value and the new double argument.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
>   builtin definition return type.
>   * config/rs6000-c.cc(rs6000_target_modify_macros): Add check, define
>   __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
>   * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
>   define_expand.
>   (rs6000_update_fpscr_rn_field): New define_expand.
>   (rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new
>   rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
>_expands.
>   * doc/extend.texi (__builtin_set_fpscr_rn): Update description for
>   the return value and new double argument.  Add descripton for
>   __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> 
> gcc/testsuite/ChangeLog:
>   gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
>   test_fpscr_rn_builtin_1.c.  Added comment.
>   gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
>   return value of __builtin_set_fpscr_rn builtin.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |   2 +-
>  gcc/config/rs6000/rs6000-c.cc |   4 +
>  gcc/config/rs6000/rs6000.md   |  87 +++---
>  gcc/doc/extend.texi   |  26 ++-
>  

[PATCH] VECT: Add COND_LEN_* operations for loop control with length targets

2023-07-06 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch is adding cond_len_* operations pattern for target support loop 
control with length.

These patterns will be used in these following case:

1. Integer division:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
   {
 for (int i = 0; i < n; ++i)
  {
a[i] = b[i] / c[i];
  }
   }

  ARM SVE IR:
  
  ...
  max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });

  Loop:
  ...
  # loop_mask_29 = PHI 
  ...
  vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
  ...
  vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
  vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, 
vect__4.8_28);
  ...
  .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...
  
  For target like RVV who support loop control with length, we want to see IR 
as follows:
  
  Loop:
  ...
  # loop_len_29 = SELECT_VL
  ...
  vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
  ...
  vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
  vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, 
vect__4.8_28, loop_len_29);
  ...
  .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...
  
  Notice here, we use dummp_mask = { -1, -1,  , -1 }

2. Integer conditional division:
   Similar case with (1) but with condtion:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * 
cond, int n)
   {
 for (int i = 0; i < n; ++i)
   {
 if (cond[i])
 a[i] = b[i] / c[i];
   }
   }
   
   ARM SVE:
   ...
   max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });

   Loop:
   ...
   # loop_mask_55 = PHI 
   ...
   vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
   ...
   vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
   ...
   vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
   vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, 
vect__6.13_62);
   ...
   .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
   ...
   next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });
   
   Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to 
gurantee the correct result.
   
   However, target with length control can not perform this elegant flow, for 
RVV, we would expect:
   
   Loop:
   ...
   loop_len_55 = SELECT_VL
   ...
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   ...
   vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, 
vect__8.16_66, vect__6.13_62, loop_len_55);
   ...

   Here we expect COND_LEN_DIV predicated by a real mask which is the outcome 
of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
   and a real length which is produced by loop control : loop_len_55 = SELECT_VL
   
3. conditional Floating-point operations (no -ffast-math):
   
void
f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
  a[i] = b[i] + a[i];
}
}
  
  ARM SVE IR:
  max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });

  ...
  # loop_mask_49 = PHI 
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
  ...
  vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
vect__6.13_56);
  ...
  next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
  ...
  
  For RVV, we would expect IR:
  
  ...
  loop_len_49 = SELECT_VL
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  ...
  vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, 
vect__6.13_56, loop_len_49);
  ...

4. Conditional un-ordered reduction:
   
   int32_t
   f (int32_t *restrict a, 
   int32_t *restrict cond, int n)
   {
 int32_t result = 0;
 for (int i = 0; i < n; ++i)
   {
   if (cond[i])
 result += a[i];
   }
 return result;
   }
   
   ARM SVE IR:
 
 Loop:
 # vect_result_18.7_37 = PHI 
 ...
 # loop_mask_40 = PHI 
 ...
 mask__17.11_43 = vect__4.10_41 != { 0, ... };
 vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
 ...
 vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, 
vect__7.14_47, vect_result_18.7_37);
 ...
 next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
 ...
   
 Epilogue:
 _53 = .REDUC_PLUS (vect__33.16_51); [tail call]
   
   For RVV, we expect:
 
Loop:
 # vect_result_18.7_37 = PHI 
 ...
 loop_len_40 = SELECT_VL
 ...
 mask__17.11_43 = vect__4.10_41 != { 0, ... };
 ...
 vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, 
vect__7.14_47, vect_result_18.7_37, loop_len_40);
 ...
 next_mask_58 = .WHILE_ULT (_15, 

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Hao Liu OS via Gcc-patches
Hi Jeff,

Thanks for your help.

Actually I have write access as I was added to the "contributor list". Anyway, 
that's very kind of you to help committing the patch.

Thanks,
-Hao

From: Jeff Law 
Sent: Friday, July 7, 2023 0:06
To: Richard Biener; Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] Vect: use a small step to calculate induction for the 
unrolled loop (PR tree-optimization/110449)



On 7/6/23 06:44, Richard Biener via Gcc-patches wrote:
> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> If a loop is unrolled by n times during vectoriation, two steps are used to
>> calculate the induction variable:
>>- The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>>- The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>>
>> This patch calculates an extra vec_n to replace vec_loop:
>>vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>>
>> So that we can save the large step register and related operations.
>
> OK.  It would be nice to avoid the dead stmts created earlier though.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>  PR tree-optimization/110449
>>  * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
>>  vec_loop for the unrolled loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/aarch64/pr110449.c: New testcase.
I didn't see Hao Liu in the MAINTAINERS file, so probably doesn't have
write access.  Therefore I went ahead and pushed this for Hao.

jeff


Re: [PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/7/6 23:33, Carl Love wrote:
> GCC maintainers:
> 
> Ver 4. Fixed a few typos.  Redid the tests to create separate run and
> compile tests.

Thanks!  This new version looks good, excepting that we need vsx_hw
for run and two nits, see below.

> 
> Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
> of the scan-assembler-times checks to cover multiple similar
> instructions.  Change the function check macro to a macro to generate a
> function to do the test and check the results.  Retested on the various
> processor types and BE/LE versions.
> 
> Ver 2.  Switched to using code macros to generate the call to the
> builtin and test the results.  Added in instruction counts for the key
> instruction for the builtin.  Moved the tests into an additional
> function call to ensure the compile doesn't replace the builtin call
> code with the statically computed results.  The compiler was doing this
> for a few of the simpler tests.  
> 
> The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
> 6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
> test files by functionality rather than processor version.
> 
> Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
> no regresions.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>Carl
> 
> 
> 
> -
> rs6000: Update the vsx-vector-6.* tests.
> 
> The vsx-vector-6.h file is included into the processor specific test files
> vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
> contains a large number of vsx vector builtin tests.  The processor
> specific files contain the number of instructions that the tests are
> expected to generate for that processor.  The tests are compile only.
> 
> This patch reworks the tests into a series of files for related tests.
> The new tests consist of a runnable test to verify the builtin argument
> types and the functional correctness of each builtin.  There is also a
> compile only test that verifies the builtins generate the expected number
> of instructions for the various builtin tests.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-1op-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2lop-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2op-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-3op-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-all-compile.c: New test
>   file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
> ---
>  .../powerpc/vsx-vector-6-func-1op-compile.c   |  22 ++
>  .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
>  .../powerpc/vsx-vector-6-func-1op.h   |  43 
>  .../powerpc/vsx-vector-6-func-2lop-compile.c  |  14 ++
>  .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
>  .../powerpc/vsx-vector-6-func-2lop.h  |  47 
>  .../powerpc/vsx-vector-6-func-2op-compile.c   |  21 ++
>  .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
>  .../powerpc/vsx-vector-6-func-2op.h   |  42 
>  .../powerpc/vsx-vector-6-func-3op-compile.c   |  17 ++
>  .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
>  .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
>  .../vsx-vector-6-func-cmp-all-compile.c   |  17 ++
>  .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
>  .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
>  .../powerpc/vsx-vector-6-func-cmp-compile.c   |  16 ++
>  .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
>  .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
>  

RE: [PATCH v5] RISC-V: Fix one bug for floating-point static frm

2023-07-06 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin and Kito.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, July 6, 2023 11:30 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, 
Yanzhang ; kito.ch...@gmail.com; Robin Dapp 

Subject: Re: [PATCH v5] RISC-V: Fix one bug for floating-point static frm

Hi Pan,

thanks,  I think that works for me as I'm expecting these
parts to change a bit anyway in the near future.

There is no functional change to the last revision that
Kito already OK'ed so I think you can go ahead.

Regards
 Robin


[Bug tree-optimization/110538] [14 Regression] Dead Code Elimination Regression since r14-368-ge1366a7e4ce

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110538

--- Comment #2 from Andrew Pinski  ---
So dom3 was able to optimize that via a jump threading before in GCC 13 but no
longer on the trunk (I don't understand why though).

Anyways the only pass which is able to optimize:
```
int f123(int a, int c, int i)
{
  int *d;
  if (a) d =  else d = 
  int e = d == 
  int f = d == 
  return e | f;
}
```
to `1` is PRE ...

But note
```
int g123();
int h123();
int f123_1(int a, int c, int i)
{
  int *d;
  if (a) d =  else d = 
  int e = d == 
  int f = d == 
  if (e | f)
return h123();
  return g123();
}
```
Still can be optimized by dom2 on the trunk (via jump threading).

So how did dom3 miss the original testcase 

Re: Stepping down as maintainer for ARC and Epiphany

2023-07-06 Thread Jeff Law via Gcc




On 7/5/23 12:43, Joern Rennecke wrote:

I haven't worked with these targets in years and can't really do sensible 
maintenance or reviews of patches for them.
I am currently working on optimizations for other ports like RISC-V.

ARC has still an active maintainer in Claudiu Zissulescu, so is basically 
unaffected.
I am not aware of any ongoing development of or for Epiphany. We might consider 
depreciating it unless there are other takers.
I've suggested deprecating Epiphany in the past as it would fault in 
reload after seemingly innocuous changes in the IL making run to run 
comparisons of the testsuite virtually impossible.


I did convert to LRA.  But I didn't check to see if that brought 
stability to the testsuite though.


jeff


[Bug tree-optimization/110538] [14 Regression] Dead Code Elimination Regression since r14-368-ge1366a7e4ce

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110538

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-07-07
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.

In .optimized we get:
  # j_24 = PHI <(7), (3)>
  _2 = j_24 == 
  _22 =  == j_24;
  _23 = _2 | _22;

Obvious _23 is always 1.

committed: Stepping down as maintainer for ARC and Epiphany

2023-07-06 Thread Joern Wolfgang Rennecke

Stepping down as maintainer for ARC and Epiphany

* MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
epiphany maintainer.
(Write After Approval): Add myself.commit b3f20dd75e9255fc9d56d4f020972469dd671a3a
Author: Joern Rennecke 
Date:   Fri Jul 7 01:02:28 2023 +0100

Stepping down as maintainer for ARC and Epiphany

* MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
epiphany maintainer.
(Write After Approval): Add myself.

diff --git a/ChangeLog b/ChangeLog
index 140127b851d..374a0a497c8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2023-07-07  Joern Rennecke  
+
+   * MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
+   epiphany maintainer.
+   (Write After Approval): Add myself.
+
 2023-06-30  Rishi Raj  
 
* MAINTAINERS: Added myself to Write After Approval and DCO
diff --git a/MAINTAINERS b/MAINTAINERS
index 2a0eb5b52b5..95228596628 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -56,7 +56,6 @@ aarch64 port  Kyrylo Tkachov  

 alpha port Richard Henderson   
 amdgcn portJulian Brown
 amdgcn portAndrew Stubbs   
-arc port   Joern Rennecke  
 arc port   Claudiu Zissulescu  
 arm port   Nick Clifton
 arm port   Richard Earnshaw
@@ -69,7 +68,6 @@ c6x port  Bernd Schmidt   

 cris port  Hans-Peter Nilsson  
 c-sky port Xianmiao Qu 
 c-sky port Yunhai Shang
-epiphany port  Joern Rennecke  
 fr30 port  Nick Clifton
 frv port   Nick Clifton
 frv port   Alexandre Oliva 
@@ -616,6 +614,7 @@ Joe Ramsay  

 Rolf Rasmussen 
 Fritz Reese
 Volker Reichelt

+Jörn Rennecke  
 Bernhard Reutner-Fischer   
 Tom Rix
 Thomas Rodgers 


[Bug target/106895] powerpc64 unable to specify even/odd register pairs in extended inline asm

2023-07-06 Thread npiggin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106895

--- Comment #11 from Nicholas Piggin  ---
(In reply to Segher Boessenkool from comment #10)
> (In reply to Nicholas Piggin from comment #9)
> > I don't know why constraint is wrong and mode is right
> 
> Simple: you would need O(2**T*N) constraints for our existing N register
> constraints, together with T features like this.  But only O(2**T) modes at
> most.

I guess that would be annoying if you couldn't have modifiers on constraints or
a bad algorithm for working them out. Fair enough.

> 
> > or why TI doesn't work but PTI apparently would,
> 
> Because this is exactly what PTImode is *for*!

Right I accept it is, I meant I just would not have been able to work it out
(assuming if PTI was documented it would be "Partial Tetra Integer" and be no
more useful than the other P?I type documentation.

> 
> > but I'll take anything that works. Could we
> > get PTI implemented? Does it need a new issue opened?
> 
> It was implemented in 2013.  The restriction to only even pairs was a bugfix,
> also from 2013.
> 
> If you have code like
> 
>   typedef __int128 __attribute__((mode(PTI))) even;
> 
> you get an error like
> 
>   error: no data type for mode 'PTI'
> 
> This needs fixing.  You can keep it in this PR?

Sure,  that would be much appreciated.

[Bug rtl-optimization/110573] branch delay slots are not filled with atomic stores

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573

--- Comment #6 from Andrew Pinski  ---
(In reply to Luke Geeson from comment #4)
> I understand treating atomics as volatile has historical precedent but a
> case can be made, at least on modern architectures and with improved
> understanding of models, that atomics are not volatile and more
> optimisations can be applied.
> What do you think?

Not really. The problem is you will need to add a new kind of memory access
type on the RTL level, this is not something which can be done without getting
things wrong and/or forgetting to update every place that might change
volatileness (including the scheduler which itself getting right is hard).
So treating them as volatile memory access on the RTL level is the easiest and
best form here.

Now on the gimple level, they are treated as a function call which itself is
another can of worms.

[Bug rtl-optimization/110573] branch delay slots are not filled with atomic stores

2023-07-06 Thread luke.geeson at cs dot ucl.ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573

--- Comment #5 from Luke Geeson  ---
For the record the %registers are symbolic - simply replace them with concrete
ones containing the location x,y,etc...

[Bug rtl-optimization/110573] branch delay slots are not filled with atomic stores

2023-07-06 Thread luke.geeson at cs dot ucl.ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573

--- Comment #4 from Luke Geeson  ---
Ah so since atomics are treated as volatile (like LLVM) instructions that
access them cannot inhabit a delay slot. Is it still valid to treat atomics as
volatile?

Consider the following MIPS litmus test:
```
{ %x0=x; %y0=y; %y1=y; %x1=x; }
 P0   | P1   ;
 lw $2,0(%x0) | lw $2,0(%y1) ;
 ori $3,$0,1  | ori $3,$0,1  ;
 sw $3,0(%y0) | sw $3,0(%x1) ;

exists (0:$2=1 /\ 1:$2=1)
```
When run under the mips model we do not observe the outcome in the exists
clause:
```
0:$2=0; 1:$2=0;
0:$2=0; 1:$2=1;
0:$2=1; 1:$2=0;
```
That is, from an ordering perspective it is unlikely that unexpected behaviours
can occur - in this case putting sw in a delay slot should be ok (the same
doesn't hold for RISC-V/Arm models of course).

I understand treating atomics as volatile has historical precedent but a case
can be made, at least on modern architectures and with improved understanding
of models, that atomics are not volatile and more optimisations can be applied.
What do you think?

Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Segher Boessenkool
On Thu, Jul 06, 2023 at 02:48:19PM -0500, Peter Bergner wrote:
> On 7/6/23 12:33 PM, Segher Boessenkool wrote:
> > On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx 
> >> x, bool reg_ok_strict)
> >>  
> >>/* Handle unaligned altivec lvx/stvx type addresses.  */
> >>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> >> +  && mode !=  OOmode
> >> +  && mode !=  XOmode
> >>&& GET_CODE (x) == AND
> >>&& CONST_INT_P (XEXP (x, 1))
> >>&& INTVAL (XEXP (x, 1)) == -16)
> > 
> > Why do we need this for OOmode and XOmode here, but not for the other
> > modes that are equally not allowed?  That makes no sense.
> 
> VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) already filters those modes out
> (eg, SImode, DFmode, etc.), just not OOmode and XOmode, since those both
> are modes used in/with VSX registers.

It does not filter anything out, no.  That simply checks if a datum of
that mode can be loaded into vector registers or not.  For example
SImode could very well be loaded into vector registers!  (It just is not
such a great idea).

And for some reason there is VECTOR_P8_VECTOR as well, which is mixing
multiple concepts already.  Let's not add more, _please_.

> > Should you check for anything that is more than a register, for example?
> > If so, do *that*?
> 
> Well rs6000_legitimate_address_p() is only passed the MEM rtx, so we have
> no idea if this is a load or store, so we're clueless on number of regs
> needed to hold this mode.  The best we could do is something like

That is *bigger than* a register.  It's the same in Dutch, sorry, I am
tired :-(

>   GET_MODE_SIZE (mode) == GET_MODE_SIZE (V16QImode)
> 
> or some such thing.  Would you prefer something like that?

That is even worse :-(

> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
> >> @@ -0,0 +1,21 @@
> >> +/* PR target/110411 */
> >> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } 
> >> */
> > 
> > -S in testcases is wrong.  Why do you want this?  It is *good* if this
> > is hauled through the assembler as well!  If you *really* want this you
> > use "dg-do assemble", but you shouldn't.
> 
> For test cases checking for ICEs, we don't need to assemble, so I agree,
> we just need to remove the -S option, which is implied by this being a
> dg-do compile test case (the default for this test directory).

We *do* want to assemble.  It is a general principle that we want to
test as much as possible whenever possible.  *Most* problems are found
with the help of testcases that were never designed for the problem
found!

dg-do compile *does* invoke the assembler, btw.  As it should.


Segher


semantics of uninitialized values in GIMPLE

2023-07-06 Thread Krister Walfridsson via Gcc
I have implemented support for uninitialized memory in my translation 
validator. But I am not sure how well this corresponds to the GIMPLE 
semantics, so I have some questions...


My implementation tracks uninitialized bits. Use of uninitialized bits is 
in general treated as UB (for example, `x + y` is UB if `x` or `y` has any 
uninitialized bits), but there are a few exceptions:


 * load/store: It is always OK to load/store values having uninitialized
   bits.
 * Phi nodes: Phi nodes propagates uninitialized bits.
 * selection: Instructions that are selecting an element (COND_EXPR,
   VEC_PERM_EXPR, etc.) may select between values having uninitialized
   bits, and the resulting value may have uninitialized bits. But the
   condition/mask must not have uninitialized bits.
 * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.)
   may have uninitialized bits in both the input/output.
 * Insertion: Instructions that are constructing/inserting values
   (COMPLEX_EXPR, etc.) may have uninitialized bits in both the
   input/output.

All other use of values having uninitialized bits are considered UB.

Does this behavior make sense?

The above seems to work fine so far, with one exception that can be seen 
in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit 
field


  unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;

which is written to as

  x.c6 = 0;
  x.c7 = 0;
  x.c8 = 0;
  x.c9 = 7;

The store merging pass changes this to

  _71 = MEM  [(struct C *) + 8B];
  _42 = _71 & 128;
  _45 = _42 | 56;

and the translation validator is now complaining that the pass has 
introduced UB that was not in the original IR (because the most 
significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).


I could solve this by allowing uninitialized bits in BIT_AND_EXPR and 
BIT_OR_EXP, and propagating each bit according to


  * `0 & uninit` is an initialized `0`
  * `1 & uninit` is uninitialized
  * `0 | uninit` is uninitialized
  * `1 | uninit` is an initialized `1`

Is that the correct GIMPLE semantics?

   /Krister


Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Peter Bergner via Gcc-patches
On 7/6/23 5:54 PM, Peter Bergner wrote:
> On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
>> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
>> @@ -0,0 +1,153 @@
>> +/* { dg-do run { target { powerpc*-*-* } } } */
> 
> powerpc*-*-* is the default for this test directory, so you can drop that,
> but you need to disable this test for soft-float systems, so you probably 
> want:
> 
>   /* { dg-do run { target powerpc_fprs } } */

We actually want something like powerpc_fprs_hw, but that doesn't exist.

Peter




Re: [PATCH v4 4/9] MIPS: Add bitwise instructions for mips16e2

2023-07-06 Thread Jan-Benedict Glaw
Hi!

On Mon, 2023-06-19 16:29:53 +0800, Jie Mei  wrote:
> There are shortened bitwise instructions in the mips16e2 ASE,
> for instance, ANDI, ORI/XORI, EXT, INS etc. .
> 
> This patch adds these instrutions with corresponding tests.

[...]

Starting with this patch, I see some new warning:

[all 2023-07-06 23:04:01] g++ -c   -g -O2   -DIN_GCC 
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -I. 
-Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include  \
[all 2023-07-06 23:04:01]  -o build/gencondmd.o build/gencondmd.cc
[all 2023-07-06 23:04:02] ../../gcc/gcc/config/mips/mips-msa.md:435:26: 
warning: 'and' of mutually exclusive equal-tests is always 0
[all 2023-07-06 23:04:02]   435 |   DONE;
[all 2023-07-06 23:04:02] ../../gcc/gcc/config/mips/mips-msa.md:435:26: 
warning: 'and' of mutually exclusive equal-tests is always 0
[all 2023-07-06 23:04:03] ../../gcc/gcc/config/mips/mips.md:822:1: warning: 
'and' of mutually exclusive equal-tests is always 0
[all 2023-07-06 23:04:03]   822 | ;; conditional-move-type condition is needed.
[all 2023-07-06 23:04:03]   | ^
[all 2023-07-06 23:04:03] g++   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
-Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  
-DGENERATOR_FILE -static-libstdc++ -static-libgcc  -o build/gencondmd \
[all 2023-07-06 23:04:03] build/gencondmd.o build/errors.o 
../build-x86_64-pc-linux-gnu/libiberty/libiberty.a
[all 2023-07-06 23:04:03] build/gencondmd > tmp-cond.md


(Full build log available as eg. 
http://toolchain.lug-owl.de/laminar/jobs/gcc-mips-linux/76)

Thanks, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Peter Bergner via Gcc-patches
On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
> rs6000, __builtin_set_fpscr_rn add retrun value

s/retrun/return/

Maybe better written as:

rs6000: Add return value to __builtin_set_fpscr_rn


> Change the return value from void to double.  The return value consists of
> the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
> overloaded version which accepts a double argument.

You're not adding an overloaded version anymore, so I think you can just
remove the last sentence.



> The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
> double reterun value and the new double argument.

s/reterun/return/   ...and there is no double argument anymore, so that
part can be removed.



>   * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
>   define_expand.

Too many '('.



>   (rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new

Looks like a  after Added instead of a space.


>   rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
>_expands.

Don't split define_expand across two lines.



>   * doc/extend.texi (__builtin_set_fpscr_rn): Update description for
>   the return value and new double argument.  Add descripton for
>   __SET_FPSCR_RN_RETURNS_FPSCR__ macro.

s/descripton/description/






> +  /* Tell the user the __builtin_set_fpscr_rn now returns the FPSCR fields
> + in a double.  Originally the builtin returned void.  */

Either:
  1) s/Tell the user the __builtin_set_fpscr_rn/Tell the user 
__builtin_set_fpscr_rn/ 
  2) s/the __builtin_set_fpscr_rn now/the __builtin_set_fpscr_rn built-in now/ 


> +  if ((flags & OPTION_MASK_SOFT_FLOAT) == 0)
> +  rs6000_define_or_undefine_macro (define_p, 
> "__SET_FPSCR_RN_RETURNS_FPSCR__");

This doesn't look like it's indented correctly.




> +(define_expand "rs6000_get_fpscr_fields"
> + [(match_operand:DF 0 "gpc_reg_operand")]
> +  "TARGET_HARD_FLOAT"
> +{
> +  /* Extract fields bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI,
> + RN) from the FPSCR and return them.  */
> +  rtx tmp_df = gen_reg_rtx (DFmode);
> +  rtx tmp_di = gen_reg_rtx (DImode);
> +
> +  emit_insn (gen_rs6000_mffs (tmp_df));
> +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x000700FFULL)));
> +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> +  emit_move_insn (operands[0], tmp_rtn);
> +  DONE;
> +})

This doesn't look correct.  You first set tmp_di to a new reg rtx but then
throw that away with the return value of simplify_gen_subreg().  I'm guessing
you want that tmp_di as a gen_reg_rtx for the destination of the gen_anddi3, so
you probably want a different rtx for the subreg that feeds the gen_anddi3.



> +(define_expand "rs6000_update_fpscr_rn_field"
> + [(match_operand:DI 0 "gpc_reg_operand")]
> +  "TARGET_HARD_FLOAT"
> +{
> +  /* Insert the new RN value from operands[0] into FPSCR bit [62:63].  */
> +  rtx tmp_di = gen_reg_rtx (DImode);
> +  rtx tmp_df = gen_reg_rtx (DFmode);
> +
> +  emit_insn (gen_rs6000_mffs (tmp_df));
> +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);

Ditto.




> +The @code{__builtin_set_fpscr_rn} builtin allows changing both of the 
> floating
> +point rounding mode bits and returning the various FPSCR fields before the RN
> +field is updated.  The builtin returns a double consisting of the initial 
> value
> +of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions with 
> all
> +other bits set to zero. The builtin argument is a 2-bit value for the new RN
> +field value.  The argument can either be an @code{const int} or stored in a
> +variable.  Earlier versions of @code{__builtin_set_fpscr_rn} returned void.  
> A
> +@code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added.  If defined, then
> +the @code{__builtin_set_fpscr_rn} builtin returns the FPSCR fields.  If not
> +defined, the @code{__builtin_set_fpscr_rn} does not return a vaule.  If the
> +@option{-msoft-float} option is used, the @code{__builtin_set_fpscr_rn} 
> builtin
> +will not return a value.

Multiple occurrences of "builtin" that should be spelled "built-in" (not in the
built-in function name itself though).



> +/* Originally the __builtin_set_fpscr_rn builtin was defined to return
> +   void.  It was later extended to return a double with the various
> +   FPSCR bits.  The extended builtin is inteded to be a drop in replacement
> +   for the original version.  This test is for the original version of the
> +   builtin and should work exactly as before.  */

Ditto.




> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
> @@ -0,0 +1,153 @@
> +/* { dg-do run { target { powerpc*-*-* } } } */

powerpc*-*-* is the default for this test directory, so you can drop that,
but you need to disable this test for soft-float systems, so you probably want:

  /* { dg-do run { target powerpc_fprs } } */

I know you didn't write it, but 

[Bug tree-optimization/110539] [14 Regression] Dead Code Elimination Regression at since r14-338-g1dd154f6407

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110539

--- Comment #5 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #4)
> yes adding:
> /* (convert)(zeroone != 0) into (convert)zeroone */
> /* (convert)(zeroone == 0) into ((convert)zeroone)^1 */
> (for neeq (ne eq)
>  (simplify
>   (convert (neeq zero_one_valued_p@0 integer_zerop))
>   (if (neeq == NE_EXPR)
>(convert @0)
>(bit_xor (convert @0) { build_one_cst (type); } 
> 
> 
> Fixes the original testcase.

One simple regression:
/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 1;" 1 "dom3"} } */
/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 0;" 1 "dom3"} } */

fail now but that is they just don't match the above on accident.

Before:
  Replaced 'bufferstep_36' with constant '0'
gimple_simplified to _5 = 1;
  Folded to: _5 = 1;
After:
  Replaced 'bufferstep_36' with constant '0'
gimple_simplified to bufferstep_23 = 1;
  Folded to: bufferstep_23 = 1;

gcc-11-20230706 is now available

2023-07-06 Thread GCC Administrator via Gcc
Snapshot gcc-11-20230706 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20230706/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision 25ad73ecf3976abe4a2c36c337e4185d1beb1624

You'll find:

 gcc-11-20230706.tar.xz   Complete GCC

  SHA256=5bf70423620934e90a5cd69d1a5458f59ee7f6e8b8e63e0effcbdbe2022db1d2
  SHA1=cdab10e96c10feb2b97956ce55e6649dcdcff960

Diffs from 11-20230629 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


user sets ABI

2023-07-06 Thread André Albergaria Coelho via Gcc

What if the user chooses in own ABI, say specifying a config file like

My abi

" Parameters = pushed in stack"


say

gcc -abi "My abi" some.c -o some

what would be the problems of specifying an ABI?? would that improve the 
usage of user? less complex / more


simpler for user (say user is used to code asm in a way)


thanks

later


andre



[Bug tree-optimization/110539] [14 Regression] Dead Code Elimination Regression at since r14-338-g1dd154f6407

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110539

--- Comment #4 from Andrew Pinski  ---
yes adding:
/* (convert)(zeroone != 0) into (convert)zeroone */
/* (convert)(zeroone == 0) into ((convert)zeroone)^1 */
(for neeq (ne eq)
 (simplify
  (convert (neeq zero_one_valued_p@0 integer_zerop))
  (if (neeq == NE_EXPR)
   (convert @0)
   (bit_xor (convert @0) { build_one_cst (type); } 


Fixes the original testcase.

[Bug c++/81880] thread_local static member template initialisation fails

2023-07-06 Thread ttimo at valvesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81880

Timothee Besset  changed:

   What|Removed |Added

 CC||ttimo at valvesoftware dot com

--- Comment #5 from Timothee Besset  ---
we are observing this with gcc 10.3.0

Re: PING^3 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-07-06 Thread Michael Meissner via Gcc-patches
I get the following warning which prevents gcc from bootstrapping due to
-Werror:

/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc: In 
function ‘void {anonymous}::process_chain_from_load(gimple*)’:
/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc:505:30:
 warning: zero-length gcc_dump_printf format string [-Wformat-zero-length]
  505 |   dump_printf (MSG_NOTE, "");
  |  ^~

I just commented out the dump_printf call.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[Bug c++/110580] New: [14 Regression] gcc fails to typecheck nix-2.16.1 source: error: invalid initialization of reference of type

2023-07-06 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110580

Bug ID: 110580
   Summary: [14 Regression] gcc fails to typecheck nix-2.16.1
source: error: invalid initialization of reference of
type
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: slyfox at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55495
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55495=edit
nar-accessor.cc.cc.xz

Originally observed the build failure on nix-2.16.1 source when building with
gcc r14-2344-g9f4f833455bb35.

I suspect it might be a form of https://gcc.gnu.org/PR110523 regression. But
filing just in case.

Attached unmodified preprocessed example.

The build succeeds on gcc-13:

$ g++-13 -c nar-accessor.cc.cc -std=c++2a
# ok

$ g++-14 -c nar-accessor.cc.cc -std=c++2a
nar-accessor.cc.cc: In instantiation of 'std::pair, std::_Select1st >,
_Compare, typename
__gnu_cxx::__alloc_traits<_Allocator>::rebind
>::other>::iterator, bool> std::map<_Key, _Tp, _Compare,
_Alloc>::emplace(_Args&& ...) [with _Args = {std::basic_string_view >, nix::NarMember}; _Key =
std::__cxx11::basic_string; _Tp = nix::NarMember; _Compare =
std::less >; _Alloc =
std::allocator,
nix::NarMember> >; typename std::_Rb_tree<_Key, std::pair,
std::_Select1st >, _Compare, typename
__gnu_cxx::__alloc_traits<_Allocator>::rebind
>::other>::iterator = std::_Rb_tree_iterator, nix::NarMember> >; typename
__gnu_cxx::__alloc_traits<_Allocator>::rebind
>::other = std::allocator,
nix::NarMember> >; typename
__gnu_cxx::__alloc_traits<_Allocator>::rebind > =
__gnu_cxx::__alloc_traits, nix::NarMember> >, std::pair, nix::NarMember> >::rebind, nix::NarMember> >; typename
_Allocator::value_type = std::pair,
nix::NarMember>]':
nar-accessor.cc.cc:126951:62:   required from here
nar-accessor.cc.cc:33745:23: error: invalid initialization of reference of type
'const std::map, nix::NarMember>::key_type&'
{aka 'const std::__cxx11::basic_string&'} from expression of type
'std::basic_string_view'
33745 |   const key_type& __k = __a;
  |   ^~~

$ g++ -v |& unnix
Using built-in specs.
COLLECT_GCC=/<>/gcc-14.0.0/bin/g++
COLLECT_LTO_WRAPPER=/<>/gcc-14.0.0/libexec/gcc/x86_64-unknown-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with:
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0  (experimental) (GCC)

[Bug tree-optimization/110539] [14 Regression] Dead Code Elimination Regression at since r14-338-g1dd154f6407

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110539

--- Comment #3 from Andrew Pinski  ---
Here is a testcase for the missing optimization (at -O1) which is optimized at
the RTL level (for some targets but not all):
```
int f(int a)
{
int b = a & 1;
int c = b != 0;
return c == b;
}
```

Though it does optimize at -O2 because VRP changes: `b != 0` into `(_Bool)
b_4`.

Re: abi

2023-07-06 Thread Jonathan Wakely via Gcc
On Thu, 6 Jul 2023, 22:20 André Albergaria Coelho via Gcc, 
wrote:

> Could gcc have an option to specify ABI?
>
> say
>
>
> gcc something.c -g -abi 1 -o something
>

Sure, it could do, but what would it do? What would "-abi 1" mean? Which
ABI would it relate to?

What are you actually asking about?


[Bug tree-optimization/110539] [14 Regression] Dead Code Elimination Regression at since r14-338-g1dd154f6407

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110539

--- Comment #2 from Andrew Pinski  ---
So the difference comes from the order. Before in phiopt we had:
-  /* Defer boolean x ? 0 : {1,-1} or x ? {1,-1} : 0 to
- match_simplify_replacement.  */
-  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
-  && (integer_zerop (arg0)
- || integer_zerop (arg1)
- || TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE
- || (TYPE_PRECISION (TREE_TYPE (arg0))
- <= TYPE_PRECISION (TREE_TYPE (lhs)
-return false;

But now the order is such that `?0:{1,-1}`, `?{1,-1}:0` is handled first.

So what we need to pattern match here is `(convert)zero_one_value_p@0!=0` and
simplify that into just (convert)@0

(for neeq (ne eq)
 (simplify
  (convert (neeq zero_one_value_p@0 integer_zerop))
  (if (neeq == NE_EXPR))
   (convert @0)
   (bit_xor (convert @0) { build_one_cst (type); } ))

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-07-06 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #48 from anlauf at gcc dot gnu.org ---
(In reply to anlauf from comment #47)
> However, when I use -O2 together with an -march= flag, the code works.
> I've tested -march=sandybridge, -march=haswell, -march=skylake,
> -march=native.
> It FPEs without.

And it FPEs with core2,nehalem,westmere!

Next I tried:

-march=sandybridge -mno-avx  # FPE!
-march=sandybridge   # OK.

[PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c

2023-07-06 Thread Maciej W. Rozycki
The pr97428.c test assumes support for vectors of doubles, but some 
targets only support vectors of floats, causing this test to fail with 
such targets.  Limit this test to targets that support vectors of 
doubles then.

gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
---
 gcc/testsuite/gcc.dg/vect/pr97428.c |1 +
 1 file changed, 1 insertion(+)

gcc-test-pr97428-vect-double.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
===
--- gcc.orig/gcc/testsuite/gcc.dg/vect/pr97428.c
+++ gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
 
 typedef struct { double re, im; } dcmlx_t;
 typedef struct { double re[4], im[4]; } dcmlx4_t;


[PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-06 Thread Maciej W. Rozycki
The bb-slp-pr95839.c test assumes quad-single float vector support, but 
some targets only support pairs of floats, causing this test to fail 
with such targets.  Limit this test to targets that support at least 
128-bit vectors then, and add a complementing test that can be run with 
targets that have support for 64-bit vectors only.  There is no need to 
adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest 
vector of doubles, so support is implied by the presence of vectors of 
doubles.

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr95839.c: Limit to `vect128' targets.
* gcc.dg/vect/bb-slp-pr95839-v8.c: New test.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c |   14 ++
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c|1 +
 2 files changed, 15 insertions(+)

gcc-test-bb-slp-pr95839-vect128.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect64 } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef float __attribute__((vector_size(8))) v2f32;
+
+v2f32 f(v2f32 a, v2f32 b)
+{
+  /* Check that we vectorize this CTOR without any loads.  */
+  return (v2f32){a[0] + b[0], a[1] + b[1]};
+}
+
+/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
===
--- gcc.orig/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect128 } */
 /* { dg-additional-options "-w -Wno-psabi" } */
 
 typedef float __attribute__((vector_size(16))) v4f32;


[PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported

2023-07-06 Thread Maciej W. Rozycki
Similarly to checks for vectors of 32 bits and 64 bits being supported 
add one for vectors of 128 bits.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect128): New 
procedure.
---
 gcc/testsuite/lib/target-supports.exp |6 ++
 1 file changed, 6 insertions(+)

gcc-test-effective-target-vect128.diff
Index: gcc/gcc/testsuite/lib/target-supports.exp
===
--- gcc.orig/gcc/testsuite/lib/target-supports.exp
+++ gcc/gcc/testsuite/lib/target-supports.exp
@@ -8599,6 +8599,12 @@ proc check_effective_target_vect_variabl
 return [expr { [lindex [available_vector_sizes] 0] == 0 }]
 }
 
+# Return 1 if the target supports vectors of 128 bits.
+
+proc check_effective_target_vect128 { } {
+return [expr { [lsearch -exact [available_vector_sizes] 128] >= 0 }]
+}
+
 # Return 1 if the target supports vectors of 64 bits.
 
 proc check_effective_target_vect64 { } {


[PATCH 0/3] testsuite: Exclude vector tests for unsupported targets

2023-07-06 Thread Maciej W. Rozycki
Hi,

 In the course of verifying an out-of-tree RISC-V target that has a vendor
extension providing hardware support for vector operations on pairs of 
single floating-point values (similar to MIPS paired-single or Power SPE 
vector types) I have come across a couple of tests that fail just because 
they expect GCC to produce code this particular hardware does not support.  
Therefore I have created this small patch series, which marks the features 
required for the test cases to be relevant, which makes them unsupported 
for the hardware concerned.  For further details see individual change 
descriptions.

 This patch series has been verified with an `x86_64-linux-gnu' native 
configuration.  I could verify it with MIPS paired-single hw sometime, but 
I'm not currently set up for it and I think the changes are obvious enough 
regardless.

 OK to apply?  As testsuite fixes I think the changes also qualify for 
backporting to active release branches.

  Maciej


Re: abi

2023-07-06 Thread Paul Koning via Gcc
It does, for machine architectures that have multiple ABIs.  MIPS is an example 
where GCC has supported this for at least 20 years.

paul

> On Jul 6, 2023, at 5:19 PM, André Albergaria Coelho via Gcc  
> wrote:
> 
> Could gcc have an option to specify ABI?
> 
> say
> 
> 
> gcc something.c -g -abi 1 -o something
> 
> 
> thanks
> 
> 
> andre
> 



[Bug tree-optimization/110539] [14 Regression] Dead Code Elimination Regression at since r14-338-g1dd154f6407

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110539

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2023-07-06

--- Comment #1 from Andrew Pinski  ---

  # RANGE [irange] int [0, 1] NONZERO 0x1
  i_7 = a.0_1 & 1;

  _17 = i_7 != 0;
  _12 = (int) _17;
  if (i_7 == _12)


So this should have been optimized to _17 = (bool) i_7;
and then if (1)

Maybe it is an order of the stuff in match.pd ...

abi

2023-07-06 Thread André Albergaria Coelho via Gcc

Could gcc have an option to specify ABI?

say


gcc something.c -g -abi 1 -o something


thanks


andre



[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-07-06 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #47 from anlauf at gcc dot gnu.org ---
(In reply to Jürgen Reuter from comment #46)
> The issue goes away with -O0, with -O1 and with -O2 -fno-tree-vectorize. 
> I might want to find the offending commit in the week of June 12-19 in the
> tree-optimizer, but I don't know whether I have time to do so. Hopefully,
> with this 
> smaller reproducer you can figure out what happens (and help solving it)

I recommend adding -ffpe-trap=zero,overflow,invalid to the flags.

It is code2.f90 that is sensible to -ftree-vectorize; the two other files
can be compiled even with -O3.

However, when I use -O2 together with an -march= flag, the code works.
I've tested -march=sandybridge, -march=haswell, -march=skylake, -march=native.
It FPEs without.

Do you see the same?

Inquiry about SME support for gcov modifications

2023-07-06 Thread Daria Shatalinska via Gcc
Hello,

My name is Daria Shatalinska and I am a Project Manager at Freelancer. I am
contacting you to see if you might be interested in collaborating with us
on a project for NASA's Open Innovation Services program (NOIS2). As an
awardee of the $175 million NOIS2 contract ,
we are one of a few approved vendors by NASA Tournament Labs to work on
this opportunity.


We received a new project from NASA's Orion Avionics, Power, and Software
(APS) Office that seeks to modify the open source GNU Coverage (gcov)
project to explicitly measure and report Modified Condition/Decision
Coverage (MC/DC). Many different NASA projects, including the Orion
Multipurpose Crew Vehicle (MPCV), use the gcc compiler and the gcov
profiling tool to provide coverage metrics for Unit Tests. NPR 7150.2
standards now require full MC/DC analysis and 100% coverage. Adding MC/DC
capability to gcov will benefit software development on many NASA programs
and projects across all mission directorates.


We are looking for someone from the GCC team to serve as a subject matter
expert for the software developer that will be executing this project. The
role as the subject matter expert (SME) would be to provide your knowledge
and expertise to help us and NASA develop the necessary modifications
that will meet the standards of contribution to the gcov project.


Is there anyone from the team who would be interested to partner with us on
this project? If this is something you are unable to assist with, is there
anyone you could recommend that might be suited to this project?

Thank you and best regards,

*Daria Shatalinska*
Project Manager, Freelancer Enterprise
Freelancer.com
[image: Freelancer logo] 


Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-06 Thread Martin Uecker via Gcc-patches
Am Donnerstag, dem 06.07.2023 um 18:56 + schrieb Qing Zhao:
> Hi, Kees,
> 
> I have updated my V1 patch with the following changes:
> A. changed the name to "counted_by"
> B. changed the argument from a string to an identifier
> C. updated the documentation and testing cases accordingly.
> 
> And then used this new gcc to test 
> https://github.com/kees/kernel-tools/blob/trunk/fortify/array-bounds.c (with 
> the following change)
> [opc@qinzhao-ol8u3-x86 Kees]$ !1091
> diff array-bounds.c array-bounds.c.org
> 32c32
> < # define __counted_by(member)   __attribute__((counted_by (member)))
> ---
> > # define __counted_by(member)   
> > __attribute__((__element_count__(#member)))
> 34c34
> < # define __counted_by(member)   __attribute__((counted_by (member)))
> ---
> > # define __counted_by(member)   /* 
> > __attribute__((__element_count__(#member))) */
> 
> Then I got the following result:
> [opc@qinzhao-ol8u3-x86 Kees]$ ./array-bounds 2>&1 | grep -v ^'#'
> TAP version 13
> 1..12
> ok 1 global.fixed_size_seen_by_bdos
> ok 2 global.fixed_size_enforced_by_sanitizer
> not ok 3 global.unknown_size_unknown_to_bdos
> not ok 4 global.unknown_size_ignored_by_sanitizer
> ok 5 global.alloc_size_seen_by_bdos
> ok 6 global.alloc_size_enforced_by_sanitizer
> not ok 7 global.element_count_seen_by_bdos
> ok 8 global.element_count_enforced_by_sanitizer
> not ok 9 global.alloc_size_with_smaller_element_count_seen_by_bdos
> not ok 10 global.alloc_size_with_smaller_element_count_enforced_by_sanitizer
> ok 11 global.alloc_size_with_bigger_element_count_seen_by_bdos
> ok 12 global.alloc_size_with_bigger_element_count_enforced_by_sanitizer
> 
> The same as your previous results. Then I took a look at all the failed 
> testing: 3, 4, 7, 9, and 10. And studied the reasons for all of them.
> 
>  in a summary, there are two major issues:
> 1.  The reason for the failed testing 7 is the same issue as I observed in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557
> Which is not a bug, it’s an expected behavior. 
> 
> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
> 
> for the following annotated structure: 
> 
> 
> struct annotated {
> unsigned long flags;
> size_t foo;
> int array[] __attribute__((counted_by (foo)));
> };
> 
> 
> struct annotated *p;
> int index = 16;
> 
> p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real size 
> 
> p->foo = index + 2;  // p->foo was set by a different value than the real 
> size of p->array as in test 9 and 10
> or
> p->foo was not set to any value as in test 3 and 4
> 
> 
> 
> i.e, the value of p->foo is NOT synced with the number of elements allocated 
> for the array p->array.  
> 
> I think that this should be considered as an user error, and the 
> documentation of the attribute should include
> this requirement.  (In the LLVM’s RFC, such requirement was included in the 
> programing model: 
> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)
> 
> We can add a new warning option -Wcounted-by to report such user error if 
> needed.
> 
> What’s your opinion on this?


Additionally, we could also have a sanitizer that
checks this at run-time.


Personally, I am still not very happy that in the
following example the two 'n's refer to different
entities:

void f(int n)
{
struct foo {
int n;   
int (*p[])[n] [[counted_by(n)]];
};
}

But I guess it will be difficult to convince everybody
that it would be wise to use a new syntax for
disambiguation:

void f(int n)
{
struct foo {
int n;   
int (*p[])[n] [[counted_by(.n)]];
};
}

Martin


> 
> thanks.
> 
> Qing
> 
> 
> > On May 26, 2023, at 4:40 PM, Kees Cook  wrote:
> > 
> > On Thu, May 25, 2023 at 04:14:47PM +, Qing Zhao wrote:
> > > GCC will pass the number of elements info from the attached attribute to 
> > > both 
> > > __builtin_dynamic_object_size and bounds sanitizer to check the 
> > > out-of-bounds
> > > or dynamic object size issues during runtime for flexible array members.
> > > 
> > > This new feature will provide nice protection to flexible array members 
> > > (which
> > > currently are completely ignored by both __builtin_dynamic_object_size and
> > > bounds sanitizers).
> > 
> > Testing went pretty well, though I think I found some bdos issues:
> > 
> > - some things that bdos can't know the size of, and correctly returned
> >  SIZE_MAX in the past, now thinks are 0-sized.
> > - while bdos correctly knows the size of an element_count-annotated
> >  flexible array, it doesn't know the size of the containing object
> >  (i.e. it returns SIZE_MAX).
> > 
> > Also, I think I found a precedence issue:
> > 
> > - if both __alloc_size and 'element_count' are in use, the _smallest_
> >  of the two is what I would expect to be enforced by the sanitizer
> >  and reported by __bdos. As is, alloc_size appears to be used when
> >  

[Bug tree-optimization/110540] [14 Regression] Dead Code Elimination Regression since r14-1163-gd8b058d3ca4

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110540

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-06
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed.

After threadfull2 we have:
  # VUSE <.MEM_17>
  _6 = *f.3_5;
  # VUSE <.MEM_17>
  _7 = *_6;
  if (_7 <= 0)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 55807731]:
  # .MEM_19 = VDEF <.MEM_17>
  *_6 = 0;
  # VUSE <.MEM_19>
  _10 = *f.3_5;
  if (_10 != 0B)
goto ; [0.00%]
  else
goto ; [100.00%]


But there is nothing which will optimize the load of *f.3_5 to be _6 before
vrp2 (which is the next pass).

There seems to be some other issues earlier on which gets us to that IR that
late but I am not sure how ...

[Bug tree-optimization/110579] O2, O1 opmtimizations cause a buffer overflow panic during a strcpy

2023-07-06 Thread gabriel.torres at ll dot mit.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

--- Comment #5 from Gabriel  ---
I see. That makes sense.

Our research project has a dataset with tar 1.14. Our plan is to compare our
work with existing work in the dataset and to be consistent, use tar 1.14. We
noticed our binary compiled with gcc would abort when creating an archive while
using clang was fine.

[Bug tree-optimization/110501] Invalid use-after-free / realloc with a store/load happening

2023-07-06 Thread cheyenne.wills at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110501

--- Comment #6 from Cheyenne Wills  ---
Just another bit of information.

Specifying just -Werror=use-after-free appears to be not not enough to trigger
the problem.  Using -Wall however does trigger the problem.

(tried on gcc-12 and gcc-13)

[Bug tree-optimization/110579] O2, O1 opmtimizations cause a buffer overflow panic during a strcpy

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

--- Comment #4 from Andrew Pinski  ---
All of these FORTIFY issues have been fixed for a long time now (over 10
years).

Why are you trying to use an old version of gnu tar?

e.g. https://lists.gnu.org/archive/html/bug-tar/2010-02/msg00010.html

[Bug tree-optimization/88443] [meta-bug] bogus/missing -Wstringop-overflow warnings

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88443
Bug 88443 depends on bug 110579, which changed state.

Bug 110579 Summary: O2, O1 opmtimizations cause a buffer overflow panic during 
a strcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

[Bug tree-optimization/110579] O2, O1 opmtimizations cause a buffer overflow panic during a strcpy

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Andrew Pinski  ---
The warning:
In function ‘strcpy’,
inlined from ‘start_header’ at create.c:695:7:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:79:10: warning:
‘__builtin___strcpy_chk’ writing 8 bytes into a region of size 6
[-Wstringop-overflow=]
   79 |   return __builtin___strcpy_chk (__dest, __src, __glibc_objsize
(__dest));
  | 
^~~~

Which comes from:

  strcpy (header->header.magic, "ustar  "); //8


The code is not _FORTIFY_SOURCE=2 safe which requires strcpy to only write
exactly the amount to those fields and not combine character fields as
different.


  char magic[6];
  char version[2];

[Bug tree-optimization/110579] O2, O1 opmtimizations cause a buffer overflow panic during a strcpy

2023-07-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

--- Comment #2 from Sam James  ---
Could you give us a backtrace with -ggdb3 when it aborts at runtime?

[Bug c/110579] O2, O1 opmtimizations cause a buffer overflow panic during a strcpy

2023-07-06 Thread gabriel.torres at ll dot mit.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

--- Comment #1 from Gabriel  ---
Created attachment 55494
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55494=edit
Processed *.i files

[Bug c/110579] New: O2, O1 opmtimizations cause a buffer overflow panic during a strcpy

2023-07-06 Thread gabriel.torres at ll dot mit.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110579

Bug ID: 110579
   Summary: O2, O1 opmtimizations cause a buffer overflow panic
during a strcpy
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabriel.torres at ll dot mit.edu
  Target Milestone: ---

Created attachment 55493
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55493=edit
Output of compiling the source code.

O2, O1 opmtimizations of the attached .i file trigger a buffer overflow panic
during a strcpy.
The project being compiled is tar 1.14.
The unoptimzed version does not panic and performs the expected behavior,
creating an archive.

* the exact version of GCC;
  - 11.3.0, 12.1.0, 9.5.0
* the system type;
  - Ubuntu 22.04.1
* the options given when GCC was configured/built;
  - 11.3.0: Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.3.0-1ubuntu1~22.04.1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
  - 9.5.0: Configured with: ../src/configure -v --with-pkgversion='Ubuntu
9.5.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-9
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-9-5Q4PKF/gcc-9-9.5.0/debian/tmp-nvptx/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-mutex
  - 12.1.0: Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12.1.0-2ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-sZcx2y/gcc-12-12.1.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
* the complete command line that triggers the bug;
 - ./tar cf foo.tar bar
* the compiler output (error messages, warnings, etc.); and
 - See make_output file
* the preprocessed file (*.i*) that triggers the bug

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches wrote:
> > +  (if (types_match (type, @1))
> > +   (bit_not (bit_and @1 (convert @0)))
> > +   (if (types_match (type, @0))
> > +(bit_not (bit_and (convert @1) @0))
> > +(convert (bit_not (bit_and @0 (convert @1)))
> 
> You can elide the types_match checks and instead always emit
> 
>   (convert (bit_not (bit_and @0 (convert @1)))
> 
> the conversions are elided when the types match.

If all types match, sure, any of the variants will be good.
But if say @1 matches type and doesn't match @0, then
(convert (bit_not (bit_and @0 (convert @1)))
will result in 2 conversions instead of just 1.
Of course, it could be alternatively solved by some other simplify
that would reduce the number of conversions.

Jakub



Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Peter Bergner via Gcc-patches
On 7/6/23 12:33 PM, Segher Boessenkool wrote:
> On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
>> bool reg_ok_strict)
>>  
>>/* Handle unaligned altivec lvx/stvx type addresses.  */
>>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
>> +  && mode !=  OOmode
>> +  && mode !=  XOmode
>>&& GET_CODE (x) == AND
>>&& CONST_INT_P (XEXP (x, 1))
>>&& INTVAL (XEXP (x, 1)) == -16)
> 
> Why do we need this for OOmode and XOmode here, but not for the other
> modes that are equally not allowed?  That makes no sense.

VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) already filters those modes out
(eg, SImode, DFmode, etc.), just not OOmode and XOmode, since those both
are modes used in/with VSX registers.



> Should you check for anything that is more than a register, for example?
> If so, do *that*?

Well rs6000_legitimate_address_p() is only passed the MEM rtx, so we have
no idea if this is a load or store, so we're clueless on number of regs
needed to hold this mode.  The best we could do is something like

  GET_MODE_SIZE (mode) == GET_MODE_SIZE (V16QImode)

or some such thing.  Would you prefer something like that?



>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
>> @@ -0,0 +1,21 @@
>> +/* PR target/110411 */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } */
> 
> -S in testcases is wrong.  Why do you want this?  It is *good* if this
> is hauled through the assembler as well!  If you *really* want this you
> use "dg-do assemble", but you shouldn't.

For test cases checking for ICEs, we don't need to assemble, so I agree,
we just need to remove the -S option, which is implied by this being a
dg-do compile test case (the default for this test directory).


Peter




[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-07-06 Thread juergen.reuter at desy dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #46 from Jürgen Reuter  ---
(In reply to Jürgen Reuter from comment #45)
> Created attachment 55492 [details]
> Smaller stand-alone reproducer
> 
> I will give more information in a comment, this contains 3 files and a
> Makefile.

This is a standalone reproducer with a total of 8k lines. It needs to be in
three different files, as fusing the 2nd and 3rd file eliminates the optimizer
problem of this issue, while fusing the 1st and the 2nd leeds to an ICE in
trans-array.c (reported separately) and is independent of this problem here.
The issue goes away with -O0, with -O1 and with -O2 -fno-tree-vectorize. 
I might want to find the offending commit in the week of June 12-19 in the
tree-optimizer, but I don't know whether I have time to do so. Hopefully, with
this 
smaller reproducer you can figure out what happens (and help solving it)

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-07-06 Thread juergen.reuter at desy dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #45 from Jürgen Reuter  ---
Created attachment 55492
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55492=edit
Smaller stand-alone reproducer

I will give more information in a comment, this contains 3 files and a
Makefile.

[Bug analyzer/110578] New: Support dynamic_cast within the analyzer

2023-07-06 Thread vultkayn at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110578

Bug ID: 110578
   Summary: Support dynamic_cast within the analyzer
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: vultkayn at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55491
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55491=edit
First draft of a test case for dynamic_cast

In the attached file you will find a first draft of test cases toward
supporting dynamic_cast in the analyzer. We [the analyzer] have already gained
RTTI capabilities thanks to PR97114, so supporting dynamic_cast no longer feels
that far away.
Test cases currently failing have been marked 'xfail'.

I didn't add any test case about dynamic_cast failure given reference to types,
but rather stuck to pointers, since we don't support exceptions whatsoever.

  "If the cast fails and target-type is a reference type, it throws an
exception
   that matches a handler of type std::bad_cast."

  https://en.cppreference.com/w/cpp/language/dynamic_cast

What are your thoughts on these tests ? Do they feel complete enough for a
first implementation
of dynamic_cast into the analyzer ?

[Bug rtl-optimization/104914] [MIPS] wrong comparison with scrabbled int value

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2023-07-06
 Status|UNCONFIRMED |NEW
  Known to fail||11.2.0

--- Comment #7 from Andrew Pinski  ---
The initial RTL has a signed extend in there:


(insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4)))
"/app/example.cpp":7:29 -1
 (nil))
(jump_insn 23 20 24 2 (set (pc)
(if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
(const_int 0 [0]))
(label_ref 32)
(pc))) "/app/example.cpp":8:5 -1
 (int_list:REG_BR_PROB 440234148 (nil))
 -> 32)


Before combine also looks fine:
(insn 20 19 23 2 (set (reg/v:DI 200 [ val+-4 ])
(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val+-4 ]) 4)))
"/app/example.cpp":7:29 235 {extendsidi2}
 (nil))
(jump_insn 23 20 24 2 (set (pc)
(if_then_else (le (subreg/s/u:SI (reg/v:DI 200 [ val+-4 ]) 4)
(const_int 0 [0]))
(label_ref 32)
(pc))) "/app/example.cpp":8:5 471 {*branch_ordersi}
 (expr_list:REG_DEAD (reg/v:DI 200 [ val+-4 ])
(int_list:REG_BR_PROB 440234148 (nil)))
 -> 32)

But combine does the wrong thing:
Trying 20 -> 23:
   20: r200:DI=sign_extend(r200:DI#4)
   23: pc={(r200:DI#4<=0)?L32:pc}
  REG_DEAD r200:DI
  REG_BR_PROB 440234148
Successfully matched this instruction:
(set (pc)
(if_then_else (le (subreg:SI (reg/v:DI 200 [ valD.1959+-4 ]) 4)
(const_int 0 [0]))
(label_ref 32)
(pc)))
allowing combination of insns 20 and 23
original costs 4 + 16 = 20
replacement cost 16
deferring deletion of insn with uid = 20.
modifying insn i323: pc={(r200:DI#4<=0)?L32:pc}
  REG_BR_PROB 440234148
  REG_DEAD r200:DI
deferring rescan insn with uid = 23.

Instead of a subreg here, it should have been a truncate.

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-06 Thread Qing Zhao via Gcc-patches
Hi, Kees,

I have updated my V1 patch with the following changes:
A. changed the name to "counted_by"
B. changed the argument from a string to an identifier
C. updated the documentation and testing cases accordingly.

And then used this new gcc to test 
https://github.com/kees/kernel-tools/blob/trunk/fortify/array-bounds.c (with 
the following change)
[opc@qinzhao-ol8u3-x86 Kees]$ !1091
diff array-bounds.c array-bounds.c.org
32c32
< # define __counted_by(member) __attribute__((counted_by (member)))
---
> # define __counted_by(member) __attribute__((__element_count__(#member)))
34c34
< # define __counted_by(member)   __attribute__((counted_by (member)))
---
> # define __counted_by(member) /* __attribute__((__element_count__(#member))) 
> */

Then I got the following result:
[opc@qinzhao-ol8u3-x86 Kees]$ ./array-bounds 2>&1 | grep -v ^'#'
TAP version 13
1..12
ok 1 global.fixed_size_seen_by_bdos
ok 2 global.fixed_size_enforced_by_sanitizer
not ok 3 global.unknown_size_unknown_to_bdos
not ok 4 global.unknown_size_ignored_by_sanitizer
ok 5 global.alloc_size_seen_by_bdos
ok 6 global.alloc_size_enforced_by_sanitizer
not ok 7 global.element_count_seen_by_bdos
ok 8 global.element_count_enforced_by_sanitizer
not ok 9 global.alloc_size_with_smaller_element_count_seen_by_bdos
not ok 10 global.alloc_size_with_smaller_element_count_enforced_by_sanitizer
ok 11 global.alloc_size_with_bigger_element_count_seen_by_bdos
ok 12 global.alloc_size_with_bigger_element_count_enforced_by_sanitizer

The same as your previous results. Then I took a look at all the failed 
testing: 3, 4, 7, 9, and 10. And studied the reasons for all of them.

 in a summary, there are two major issues:
1.  The reason for the failed testing 7 is the same issue as I observed in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557
Which is not a bug, it’s an expected behavior. 

2. The common issue for  the failed testing 3, 4, 9, 10 is:

for the following annotated structure: 


struct annotated {
unsigned long flags;
size_t foo;
int array[] __attribute__((counted_by (foo)));
};


struct annotated *p;
int index = 16;

p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real size 

p->foo = index + 2;  // p->foo was set by a different value than the real size 
of p->array as in test 9 and 10
or
p->foo was not set to any value as in test 3 and 4



i.e, the value of p->foo is NOT synced with the number of elements allocated 
for the array p->array.  

I think that this should be considered as an user error, and the documentation 
of the attribute should include
this requirement.  (In the LLVM’s RFC, such requirement was included in the 
programing model: 
https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)

We can add a new warning option -Wcounted-by to report such user error if 
needed.

What’s your opinion on this?

thanks.

Qing


> On May 26, 2023, at 4:40 PM, Kees Cook  wrote:
> 
> On Thu, May 25, 2023 at 04:14:47PM +, Qing Zhao wrote:
>> GCC will pass the number of elements info from the attached attribute to 
>> both 
>> __builtin_dynamic_object_size and bounds sanitizer to check the out-of-bounds
>> or dynamic object size issues during runtime for flexible array members.
>> 
>> This new feature will provide nice protection to flexible array members 
>> (which
>> currently are completely ignored by both __builtin_dynamic_object_size and
>> bounds sanitizers).
> 
> Testing went pretty well, though I think I found some bdos issues:
> 
> - some things that bdos can't know the size of, and correctly returned
>  SIZE_MAX in the past, now thinks are 0-sized.
> - while bdos correctly knows the size of an element_count-annotated
>  flexible array, it doesn't know the size of the containing object
>  (i.e. it returns SIZE_MAX).
> 
> Also, I think I found a precedence issue:
> 
> - if both __alloc_size and 'element_count' are in use, the _smallest_
>  of the two is what I would expect to be enforced by the sanitizer
>  and reported by __bdos. As is, alloc_size appears to be used when
>  it is available, regardless of what 'element_count' shows.
> 
> I've updated my test cases to show it more clearly, but here is the
> before/after:
> 
> 
> GCC 13 (correctly does not implement "element_count"):
> 
> $ ./array-bounds 2>&1 | grep -v ^'#'
> TAP version 13
> 1..12
> ok 1 global.fixed_size_seen_by_bdos
> ok 2 global.fixed_size_enforced_by_sanitizer
> ok 3 global.unknown_size_unknown_to_bdos
> ok 4 global.unknown_size_ignored_by_sanitizer
> ok 5 global.alloc_size_seen_by_bdos
> ok 6 global.alloc_size_enforced_by_sanitizer
> not ok 7 global.element_count_seen_by_bdos
> not ok 8 global.element_count_enforced_by_sanitizer
> not ok 9 global.alloc_size_with_smaller_element_count_seen_by_bdos
> not ok 10 global.alloc_size_with_smaller_element_count_enforced_by_sanitizer
> ok 11 global.alloc_size_with_bigger_element_count_seen_by_bdos
> ok 12 

GGC: Remove 'const char *' 'gt_ggc_mx', 'gt_pch_nx' variants (was: [PATCH] support ggc hash_map and hash_set)

2023-07-06 Thread Thomas Schwinge
Hi!

On 2014-09-01T21:56:28-0400, tsaund...@mozilla.com wrote:
> [...] this part [...]

... became commit b086d5308de0d25444243f482f2f3d1dfd3a9a62
(Subversion r214834), which added GGC support to 'hash_map', 'hash_set',
and converted to those a number of 'htab' instances.

It doesn't really interfere with my ongoing work, but I have doubts about
two functions that were added here:

> --- a/gcc/ggc.h
> +++ b/gcc/ggc.h

> +static inline void
> +gt_ggc_mx (const char *s)
> +{
> +  ggc_test_and_set_mark (const_cast (s));
> +}
> +
> +static inline void
> +gt_pch_nx (const char *)
> +{
> +}

If (in current sources) I put '__builtin_abort' calls into these
functions, those don't trigger, so the functions are (currently) unused,
at least in my configuration.  Moreover, comparing these two to other
string-related 'gt_ggc_mx' functions in (nowadays) 'gcc/ggc-page.cc', and
string-related 'gt_pch_nx' functions in (nowadays) 'gcc/stringpool.cc'
(..., which already did exist back then in 2014), we find that this
'gt_ggc_mx' doesn't call 'gt_ggc_m_S', so doesn't get the special string
handling, and this 'gt_pch_nx' doesn't call 'gt_pch_n_S' and also doesn't
'gt_pch_note_object' manually, so I wonder how that ever worked?  So
maybe these two in fact never were used?  Should we dare to put in the
attached "GGC: Remove 'const char *' 'gt_ggc_mx', 'gt_pch_nx' variants"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a1341d0e75ab20ee9ba09a1a8428c9d3dd2fd54a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 6 Jul 2023 17:44:35 +0200
Subject: [PATCH] GGC: Remove 'const char *' 'gt_ggc_mx', 'gt_pch_nx' variants

Those were added in 2014 commit b086d5308de0d25444243f482f2f3d1dfd3a9a62
(Subversion r214834) "support ggc hash_map  and hash_set".

If (in current sources) I put '__builtin_abort' calls into these functions,
those don't trigger, so the functions are (currently) unused, at least in my
configuration.  Moreover, comparing these two to other string-related
'gt_ggc_mx' functions in (nowadays) 'gcc/ggc-page.cc', and string-related
'gt_pch_nx' functions in (nowadays) 'gcc/stringpool.cc' (..., which already did
exist back then in 2014), we find that this 'gt_ggc_mx' doesn't call
'gt_ggc_m_S', so doesn't get the special string handling, and this 'gt_pch_nx'
doesn't call 'gt_pch_n_S' and also doesn't 'gt_pch_note_object' manually, so I
wonder how that ever worked?  So maybe these two in fact never were used?

	gcc/
	* ggc.h (gt_ggc_mx (const char *s), gt_pch_nx (const char *)):
	Remove.
---
 gcc/ggc.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/ggc.h b/gcc/ggc.h
index 78eab7eaba6..1f3d665fc57 100644
--- a/gcc/ggc.h
+++ b/gcc/ggc.h
@@ -331,17 +331,6 @@ ggc_alloc_cleared_gimple_statement_stat (size_t s CXX_MEM_STAT_INFO)
   return (gimple *) ggc_internal_cleared_alloc (s PASS_MEM_STAT);
 }
 
-inline void
-gt_ggc_mx (const char *s)
-{
-  ggc_test_and_set_mark (const_cast (s));
-}
-
-inline void
-gt_pch_nx (const char *)
-{
-}
-
 inline void gt_pch_nx (bool) { }
 inline void gt_pch_nx (char) { }
 inline void gt_pch_nx (signed char) { }
-- 
2.34.1



Re: [PATCH] rs6000: Change GPR2 to volatile & non-fixed register for function that does not use TOC [PR110320]

2023-07-06 Thread Peter Bergner via Gcc-patches
On 6/28/23 3:07 AM, Kewen.Lin wrote:
> I think the reason why we need to check common_deferred_options is at this 
> time
> we can't distinguish the fixed_regs[2] is from the initialization or command 
> line
> user explicit specification.  But could we just update the FIXED_REGISTERS 
> without
> FIXED_R2 and set FIXED_R2 when it's needed in this function instead?  Then I'd
> expect that when we find fixed_regs[2] is set at the beginning of this 
> function, it
> would mean users specify it explicitly and then we don't need this option 
> checking?

Correct, rs6000_conditional_register_usage() is called after the handling of the
-ffixed-* options, so looking at fixed_regs[2] cannot tell us whether the user
used the -ffixed-r2 option or not if we initialize the FIXED_REGISTERS[2] slot
to 1.  I think we went this route for two reasons:

  1) We don't have to worry about anyone in the future adding more uses of
 FIXED_REGISTERS and needing to update the value depending on ABI, options,
 etc.
  2) The options in common_deferred_options are "rare" options, so the common
 case is that common_deferred_options will be NULL and we'll never drop
 into that section.

I believe the untested patch below should also work, without having to scan
the (uncommonly used) options.  Jeevitha, can you bootstrap and regtest the
patch below?



> Besides, IMHO we need a corresponding test case to cover this -ffixed-r2 
> handling.

Good idea.  I think we can duplicate the pr110320_2.c test case, replacing the
-mno-pcrel option with -ffixed-r2.  Jeevitha, can you give that a try?




>> +/* { dg-require-effective-target power10_ok } */
>> +/* { dg-require-effective-target powerpc_pcrel } */
> 
> Do we have some environment combination which supports powerpc_pcrel but not
> power10_ok?  I'd expect that only powerpc_pcrel is enough.

I think I agree testing for powerpc_pcrel should be enough.


Peter





diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d197c3f3289..7c356a73ac6 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10160,9 +10160,13 @@ rs6000_conditional_register_usage (void)
 for (i = 32; i < 64; i++)
   fixed_regs[i] = call_used_regs[i] = 1;
 
+  /* For non PC-relative code, GPR2 is unavailable for register allocation.  */
+  if (FIXED_R2 && !rs6000_pcrel_p ())
+fixed_regs[2] = 1;
+
   /* The TOC register is not killed across calls in a way that is
  visible to the compiler.  */
-  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
+  if (fixed_regs[2] && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2))
 call_used_regs[2] = 0;
 
   if (DEFAULT_ABI == ABI_V4 && flag_pic == 2)
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..2a24fbdf9fd 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -812,7 +812,7 @@ enum data_align { align_abi, align_opt, align_both };
 
 #define FIXED_REGISTERS  \
   {/* GPRs */ \
-   0, 1, FIXED_R2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
+   0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
/* FPRs */ \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \



[Bug target/110577] New: s390x: Some tests fail with -march=z13

2023-07-06 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110577

Bug ID: 110577
   Summary: s390x: Some tests fail with -march=z13
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

These tests:

gcc.dg/vect/vect-cond-reduc-4.c
g++.dg/vect/pr89653.cc
gfortran.dg/vect/fast-math-pr38968.f90
gfortran.dg/vect/fast-math-rnflow-trs2a2.f90
gfortran.dg/vect/pr62283.f 
gcc.target/s390/vector/partial/s390-vec-length-epil-1.c
gcc.target/s390/vector/partial/s390-vec-length-epil-2.c
gcc.target/s390/vector/partial/s390-vec-length-epil-3.c
gcc.target/s390/vector/partial/s390-vec-length-full-1.c
gcc.target/s390/vector/partial/s390-vec-length-full-2.c
gcc.target/s390/vector/partial/s390-vec-length-full-3.c

work with -march=z14, but fail with -march=z13.  E.g.,

# gcc vect-cond-reduc-4.c -fdiagnostics-plain-output --param
min-vect-loop-bound=1 --param max-unrolled-insns=200 --param max-unroll-times=8
--param max-completely-peeled-insns=200 --param max-completely-peel-times=16
-march=z13 -mzarch -ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details --param
vect-epilogues-nomask=0 -march=z14 ; grep "LOOP VECTORIZED"
a-vect-cond-reduc-4.c.172t.vect
vect-cond-reduc-4.c:19:21: note:  LOOP VECTORIZED
vect-cond-reduc-4.c:19:21: note:  LOOP VECTORIZED
# gcc vect-cond-reduc-4.c -fdiagnostics-plain-output --param
min-vect-loop-bound=1 --param max-unrolled-insns=200 --param max-unroll-times=8
--param max-completely-peeled-insns=200 --param max-completely-peel-times=16
-march=z13 -mzarch -ftree-vectorize -fno-tree-loop-distribute-patterns
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details --param
vect-epilogues-nomask=0 -march=z13 ; grep "LOOP VECTORIZED"
a-vect-cond-reduc-4.c.172t.vect
#

or

# gcc s390-vec-length-epil-1.c -fdiagnostics-plain-output  -O2 -ftree-vectorize
-fno-vect-cost-model -fno-unroll-loops -fno-trapping-math
--param=vect-partial-vector-usage=1 --param=min-vect-loop-bound=0
-ffat-lto-objects -fno-ident -S -march=z13 -o s390-vec-length-epil-1.s
# grep vll s390-vec-length-epil-1.s | wc -l
12
# grep vstl s390-vec-length-epil-1.s | wc -l
6
# gcc s390-vec-length-epil-1.c -fdiagnostics-plain-output  -O2 -ftree-vectorize
-fno-vect-cost-model -fno-unroll-loops -fno-trapping-math
--param=vect-partial-vector-usage=1 --param=min-vect-loop-bound=0
-ffat-lto-objects -fno-ident -S -march=z14 -o s390-vec-length-epil-1.s
# grep vll s390-vec-length-epil-1.s | wc -l
14
# grep vstl s390-vec-length-epil-1.s | wc -l
7

[Bug libstdc++/110574] --enable-cstdio=stdio_pure is incompatible with LFS

2023-07-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110574

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-07-06
 Ever confirmed|0   |1
   Target Milestone|--- |11.5

--- Comment #4 from Jonathan Wakely  ---
I have a patch ...

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> If a loop is unrolled by n times during vectoriation, two steps are used to
>> calculate the induction variable:
>>   - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>>   - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>>
>> This patch calculates an extra vec_n to replace vec_loop:
>>   vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>>
>> So that we can save the large step register and related operations.
>
> OK.  It would be nice to avoid the dead stmts created earlier though.

FWIW, I still don't think we should do this.  Part of the point of
unrolling is to shorten loop-carried dependencies, whereas this patch
is going in the opposite direction.

Richard

>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>> PR tree-optimization/110449
>> * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
>> vec_loop for the unrolled loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/pr110449.c: New testcase.
>> ---
>>  gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +
>>  gcc/tree-vect-loop.cc   | 21 +--
>>  2 files changed, 58 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>> new file mode 100644
>> index 000..bb3b6dcfe08
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>> @@ -0,0 +1,40 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param 
>> aarch64-vect-unroll-limit=2" } */
>> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */
>> +
>> +/* Calcualte the vectorized induction with smaller step for an unrolled 
>> loop.
>> +
>> +   before (suggested_unroll_factor=2):
>> + fmovs30, 8.0e+0
>> + fmovs31, 4.0e+0
>> + dup v27.4s, v30.s[0]
>> + dup v28.4s, v31.s[0]
>> + .L6:
>> + mov v30.16b, v31.16b
>> + faddv31.4s, v31.4s, v27.4s
>> + faddv29.4s, v30.4s, v28.4s
>> + stp q30, q29, [x0]
>> + add x0, x0, 32
>> + cmp x1, x0
>> + bne .L6
>> +
>> +   after:
>> + fmovs31, 4.0e+0
>> + dup v29.4s, v31.s[0]
>> + .L6:
>> + faddv30.4s, v31.4s, v29.4s
>> + stp q31, q30, [x0]
>> + add x0, x0, 32
>> + faddv31.4s, v29.4s, v30.4s
>> + cmp x0, x1
>> + bne .L6  */
>> +
>> +void
>> +foo2 (float *arr, float freq, float step)
>> +{
>> +  for (int i = 0; i < 1024; i++)
>> +{
>> +  arr[i] = freq;
>> +  freq += step;
>> +}
>> +}
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index 3b46c58a8d8..706ecbffd0c 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>new_vec, step_vectype, NULL);
>>
>>vec_def = induc_def;
>> -  for (i = 1; i < ncopies; i++)
>> +  for (i = 1; i < ncopies + 1; i++)
>> {
>>   /* vec_i = vec_prev + vec_step  */
>>   gimple_seq stmts = NULL;
>> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>   vec_def = gimple_convert (, vectype, vec_def);
>>
>>   gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
>> - new_stmt = SSA_NAME_DEF_STMT (vec_def);
>> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> + if (i < ncopies)
>> +   {
>> + new_stmt = SSA_NAME_DEF_STMT (vec_def);
>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> +   }
>> + else
>> +   {
>> + /* vec_1 = vec_iv + (VF/n * S)
>> +vec_2 = vec_1 + (VF/n * S)
>> +...
>> +vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop
>> +
>> +vec_n is used as vec_loop to save the large step register 
>> and
>> +related operations.  */
>> + add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
>> +  UNKNOWN_LOCATION);
>> +   }
>> }
>>  }
>>
>> --
>> 2.34.1


☝ Buildbot (Sourceware): gccrust - retry lost connection compile (retry) (master)

2023-07-06 Thread builder--- via Gcc-rust
A retry build has been detected on builder gccrust-gentoo-sparc while building 
gccrust.

Full details are available at:
https://builder.sourceware.org/buildbot/#builders/241/builds/791

Build state: retry lost connection compile (retry)
Revision: b9566fddf2915f68f050844df699389474c49ac4
Worker: gentoo-sparc
Build Reason: (unknown)
Blamelist: A. Wilcox , Abdul Rafey 
, Alan Modra , Aldy Hernandez 
, Alex Coplan , Alexander Monakov 
, Alexandre Oliva , Alexandre Oliva 
, Allan McRae , Andre Simoes Dias Vieira 
, Andre Vehreschild , Andre 
Vieira , Andrea Corallo 
, Andreas Krebbel , Andreas 
Schwab , Andreas Schwab , Andrew 
Carlotti , Andrew Carlotti , 
Andrew Jenner , Andrew MacLeod , 
Andrew Pinski , Andrew Pinski , Andrew 
Stubbs , Anthony Green , Antoni 
Boucher , Ard Biesheuvel , Arjun Shankar 
, Arnaud Charlet , Arsen Arsenovic 
, Arsen Arsenović , ArshErgon 
, Artem Klimov , Arthur Cohen 
, Avinash Sonawane , Benno Evers 
, Benson Muite , Bernd 
Kuhls , Bernhard Reutner-Fischer , 
Bernhard Reutner-Fischer , Bill Schmidt 
, Bill Seurer , Björn Schäpers 
, Bob Duff , Boris Yakobowski 
, Bruce Korb , Bruno Haible 
, Cedric Landet , Cesar Philippidis 
, Charalampos Mitrodimas , 
Charles-François Natali , Chenghua Xu 
, Chenghua Xu , Christoph 
Müllner , Christophe Lyon 
, Christophe Lyon , 
Chung-Ju Wu , Chung-Lin Tang , 
Claire Dross , Claudiu Zissulescu , 
Claudiu Zissulescu , Clément Chigot , 
Clément Chigot , CohenArthur , 
Costas Argyris , Cui,Lili , 
Cupertino Miranda , Dan Li 
, Daniel Mercier , Dave 
, Dave Evans , David Edelsohn 
, David Faust , David Malcolm 
, David Seifert , Detlef Vollmann 
, Dimitar Dimitrov , Dimitrij Mijoski 
, Dimitrije Milosevic , 
Dimitrije Milošević , Dmitriy Anisimkov 
, Dongsheng Song , Doug Rupp 
, Ed Catmur , Ed Schonberg 
, Ed Smith-Rowland , Emanuele 
Micheletti , Eric Biggers 
, Eric Botcazou , Eric Botcazou 
, Eric Gallager , Etienne Servais 
, Eugene Rozenfeld , Faisal Abbas 
<90.abbasfai...@gmail.com>, Faisal Abbas , Fedor 
Rybin , Fei Gao , Flavio Cruz 
, Florian Weimer , Francois-Xavier 
Coudert , Francois-Xavier Coudert , 
François Dumont , Frederik Harwath 
, Fritz Reese , Frolov Daniil 
, GCC Administrator , Gaius 
Mulley , Gary Dismukes , 
Georg-Johann Lay , Gerald Pfeifer , Ghjuvan 
Lacambre , Giuliano Belinassi , 
Guillaume Gomez , Guillermo E. Martinez 
, H.J. Lu , Hafiz Abid 
Qadeer , Hans-Peter Nilsson , Haochen 
Gui , Haochen Jiang , Harald 
Anlauf , Hongyu Wang , Hu, Lin1 
, Iain Buclaw , Iain Sandoe 
, Ian Lance Taylor , Ilya Leoshkevich 
, Immad Mir , Immad Mir 
, Indu Bhagat , Iskander 
Shakirzyanov , Jakob Hasse 
<0xja...@users.noreply.github.com>, Jakub Dupak , Jakub 
Jelinek , Jan Beulich , Jan Hubicka 
, Jan-Benedict Glaw , Jason Merrill 
, Javier Miranda , Jeff Chapman II 
, Jeff Law , Jeff Law 
, Jeff Law , Jerry DeLisle 
, Jia-Wei Chen , Jia-wei Chen 
, Jiakun Fan <120090...@link.cuhk.edu.cn>, Jiawei 
, Jin Ma , Jinyang He 
, Jiufu Guo , Joao Azevedo 
, Joel Brobecker , Joel Holdsworth 
, Joel Phillips , Joel 
Teichroeb , Joffrey Huguet , Johannes 
Kanig , Johannes Kliemann , John David 
Anglin , Jonathan Grant , Jonathan Wakely 
, Jonathan Yong <10wa...@gmail.com>, Jonny Grant 
, Jose E. Marchesi , Joseph Myers 
, Josue Nava Bello , José Rui 
Faustino de Sousa , Ju-Zhe Zhong , 
Julia Lapenko , Julian Brown 
, Julien Bortolussi , Junxian 
Zhu , Justin Squirek , 
Juzhe-Zhong , Jørgen Kvalsvik 
, Keef Aragon , 
Kewen Lin , Kewen.Lin , Kim Kuparinen 
, Kito Cheng , Kong 
Lingling , Kwok Cheung Yeung , 
Kyrylo Tkachov , Kévin Le Gouguec 
, LIU Hao , Lewis Hyatt 
, Li Xu , Liaiss Merzougue 
, Liao Shihua , LiaoShihua 
, Lili Cui , Lin Sinan 
, Lin Sinan , Liwei Xu 
, Lorenzo Salvadore , Lulu 
Cheng , Lyra , M V V S Manoj Kumar 
, MAHAD , Maciej W. Rozycki 
, Maciej W. Rozycki , Mahmoud Mohamed 
, Marc Nieper-Wißkirchen , 
Marc Poulhiès , Marc Poulhiès , Marcel 
Vollweiler , Marco Falke , 
Marek Polacek , Mark Mentovai , Mark 
Wielaard , Martin Jambor , Martin Liska 
, Martin Liška , Martin Sebor 
, Martin Uecker , Matthew Jasper 
, Matthias Kretz , Max Filippov 
, Mayshao , Meghan Denny 
, Michael Collison , Michael Eager 
, Michael Meissner , Mikael Morin 
, Mikhail Ablakatov , Monk Chiang 
, Muhammad Mahad , Murray Steele 
, Nathan Sidwell , Nathaniel Shead 
, Navid Rahimi , Nick 
Clifton , Nikos Alexandris , 
Nirmal Patel , Olivier Hainque , Owen 
Avery , Palmer Dabbelt , Pan 
Li , Parthib <94271200+parthib...@users.noreply.github.com>, 
Parthib , Pascal Obry , Pat Haugen 
, Patrick Bernardi , Patrick 
Palka , Paul A. Clarke , Paul Thomas 
, Paul-Antoine Arras , Pekka Seppänen 
, Peter Bergner , Peter Foley 
, Petter Tomner , Philip Herron 
, Philip Herron , 
Philipp Fent , Philipp Tomsich , 
Pierre-Emmanuel Patry , Pierre-Marie de 
Rodat , Piotr Trojanek , Prajwal S N 
, Prathamesh Kulkarni 
, Przemyslaw Wirkus 
, Qian Jianhua , Qian Jianhua 
, Qing Zhao , Quentin Ochem 
, Raiki Tamura , Rainer Orth 
, Rainer Orth , Ramana 

Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Segher Boessenkool
Hi!

On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> while generating vector pairs of load & store instruction, the src address
> was treated as an altivec type and that type of address is invalid for 
> lxvp and stxvp insns. The solution for this is to avoid altivec type address
> for OOmode and XOmode.

The mail message you send should be what will end up in the Git commit
message.  Your lines are too long for that (and the subject is much too
long btw), and the content isn't right either.

Maybe something like

"""
rs6000: Don't allow OOmode or XOmode in AltiVec addresses (PR110411)

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects.  Don't allow those in rs6000_legitimate_address_p.
"""

> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Avoid altivec
>   address for OOmode and XOmde.

(XOmode, sp.)

Not "avoid", disallow.  If you avoid something you still allow it, you
just prefer to see something else.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
> bool reg_ok_strict)
>  
>/* Handle unaligned altivec lvx/stvx type addresses.  */
>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> +  && mode !=  OOmode
> +  && mode !=  XOmode
>&& GET_CODE (x) == AND
>&& CONST_INT_P (XEXP (x, 1))
>&& INTVAL (XEXP (x, 1)) == -16)

Why do we need this for OOmode and XOmode here, but not for the other
modes that are equally not allowed?  That makes no sense.

Should you check for anything that is more than a register, for example?
If so, do *that*?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
> @@ -0,0 +1,21 @@
> +/* PR target/110411 */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } */

-S in testcases is wrong.  Why do you want this?  It is *good* if this
is hauled through the assembler as well!  If you *really* want this you
use "dg-do assemble", but you shouldn't.


Segher


[Bug middle-end/110573] MIPS64: Enhancement PR of load of pointer to atomic

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573

--- Comment #3 from Andrew Pinski  ---
See
https://inbox.sourceware.org/gcc/d7787b3f-9450-5642-ffac-21cf36176...@redhat.com/
also.

[Bug middle-end/110573] MIPS64: Enhancement PR of load of pointer to atomic

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573

--- Comment #2 from Andrew Pinski  ---
volatile (atomics) stores are not considered for branch delay slots.

https://inbox.sourceware.org/gcc/3077458.gu9dx72...@arcturus.home/

[Bug c/110575] gcc: internal compiler error: tree check: expected class 'type', have 'exceptional' (error_mark) in build_aligned_type

2023-07-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110575

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-07-06
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/22401] DOM messes up the profiling info

2023-07-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22401

--- Comment #5 from Jan Hubicka  ---
This is now threaded by threadfull2:

Checking profitability of path (backwards):  bb:3 (2 insns) bb:2
  Control statement insns: 2
  Overall: 0 insns

path: 2->3->xx REJECTED
Checking profitability of path (backwards):  bb:3 (2 insns) bb:5 (latch)
  Control statement insns: 2
  Overall: 0 insns

Checking profitability of path (backwards):
  [1] Registering jump thread: (5, 3) incoming edge;  (3, 5) nocopy;
path: 5->3->5 SUCCESS
Merging blocks 2 and 3
Merging blocks 5 and 6
fix_loop_structure: fixing up loops for function
fix_loop_structure: removing loop 1
flow_loops_find: discovered new loop 2 with header 3

and we get correct profile:

void __show_backtrace (void * rw)
{
;;   basic block 2, loop depth 0, count 118111600 (estimated locally), maybe
hot
;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [always]  count:118111600 (estimated locally)
(FALLTHRU,EXECUTABLE)
  if (rw_1(D) != 0B)
goto ; [0.00%]
  else
goto ; [100.00%]
;;succ:   3 [never (guessed)]  count:0 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
;;4 [always (guessed)]  count:118111600 (estimated locally)
(FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 1, count 955630224 (estimated locally), maybe
hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [never (guessed)]  count:0 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
;;3 [always]  count:955630224 (estimated locally)
(FALLTHRU,DFS_BACK)
  goto ; [100.00%]
;;succ:   3 [always]  count:955630224 (estimated locally)
(FALLTHRU,DFS_BACK)

;;   basic block 4, loop depth 0, count 118111600 (estimated locally), maybe
hot
;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [always (guessed)]  count:118111600 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
  return;
;;succ:   EXIT [always]  count:118111600 (estimated locally)
(EXECUTABLE) q.c:4:1

}

[Bug tree-optimization/25623] jump threading/cfg cleanup messes up "incoming counts" for some BBs

2023-07-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25623

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Jan Hubicka :

https://gcc.gnu.org/g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d

commit r14-2369-g3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
Author: Jan Hubicka 
Date:   Thu Jul 6 18:56:22 2023 +0200

Improve profile updates after loop-ch and cunroll

Extend loop-ch and loop unrolling to fix profile in case the loop is
known to not iterate at all (or iterate few times) while profile claims it
iterates more.  While this is kind of symptomatic fix, it is best we can do
incase profile was originally esitmated incorrectly.

In the testcase the problematic loop is produced by vectorizer and I think
vectorizer should know and account into its costs that vectorizer loop
and/or
epilogue is not going to loop after the transformation.  So it would be
nice
to fix it on that side, too.

The patch avoids about half of profile mismatches caused by cunroll.

Pass dump id and name|static mismatcdynamic mismatch
 |in count |in count
107t cunrolli|  3+3|17251   +17251
115t threadfull  |  3  |14376-2875
116t vrp |  5+2|30908   +16532
117t dse |  5  |30908
118t dce |  3-2|17251   -13657
127t ch  | 13   +10|17251
131t dom | 39   +26|17251
133t isolate-paths   | 47+8|17251
134t reassoc | 49+2|17251
136t forwprop| 53+4|   202501  +185250
159t cddce   | 61+8|   216211   +13710
161t ldist   | 62+1|   216211
172t ifcvt   | 66+4|   373711  +157500
173t vect|143   +77|  9802097 +9428386
176t cunroll |221   +78| 15639591 +5837494
183t loopdone|218-3| 15577640   -61951
195t fre |214-4| 15577640
197t dom |213-1| 16671606 +1093966
199t threadfull  |215+2| 16879581  +207975
200t vrp |217+2| 17077750  +198169
204t dce |215-2| 17004486   -73264
206t sink|213-2| 17004486
211t cddce   |219+6| 17005926+1440
255t optimized   |217-2| 17005926
256r expand  |210-7| 19571573 +2565647
258r into_cfglayout  |208-2| 19571573
275r loop2_unroll|212+4| 22992432 +3420859
291r ce2 |210-2| 23011838
312r pro_and_epilogue|230   +20| 23073776   +61938
315r jump2   |236+6| 27110534 +4036758
323r bbro|229-7| 21826835 -5283699

W/o the patch cunroll does:

176t cunroll |294  +151|126548439   +116746342

and we end up with 291 mismatches at bbro.

Bootstrapped/regtested x86_64-linux. Plan to commit it after the
scale_loop_frequency patch.

gcc/ChangeLog:

PR middle-end/25623
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Scale loop frequency
to maximal number
of iterations determined.
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Likewise.

gcc/testsuite/ChangeLog:

PR middle-end/25623
* gfortran.dg/pr25623-2.f90: New test.

RE: [PATCH] arm: Fix MVE intrinsics support with LTO (PR target/110268)

2023-07-06 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, July 6, 2023 4:21 PM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: Re: [PATCH] arm: Fix MVE intrinsics support with LTO (PR
> target/110268)
> 
> 
> 
> On Wed, 5 Jul 2023 at 19:07, Kyrylo Tkachov   > wrote:
> 
> 
>   Hi Christophe,
> 
>   > -Original Message-
>   > From: Christophe Lyon   >
>   > Sent: Monday, June 26, 2023 4:03 PM
>   > To: gcc-patches@gcc.gnu.org  ;
> Kyrylo Tkachov   >;
>   > Richard Sandiford   >
>   > Cc: Christophe Lyon   >
>   > Subject: [PATCH] arm: Fix MVE intrinsics support with LTO (PR
> target/110268)
>   >
>   > After the recent MVE intrinsics re-implementation, LTO stopped
> working
>   > because the intrinsics would no longer be defined.
>   >
>   > The main part of the patch is simple and similar to what we do for
>   > AArch64:
>   > - call handle_arm_mve_h() from arm_init_mve_builtins to declare
> the
>   >   intrinsics when the compiler is in LTO mode
>   > - actually implement arm_builtin_decl for MVE.
>   >
>   > It was just a bit tricky to handle
> __ARM_MVE_PRESERVE_USER_NAMESPACE:
>   > its value in the user code cannot be guessed at LTO time, so we
> always
>   > have to assume that it was not defined.  The led to a few fixes in the
>   > way we register MVE builtins as placeholders or not.  Without this
>   > patch, we would just omit some versions of the inttrinsics when
>   > __ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for
> the C/C++
>   > placeholders, we need to always keep entries for all of them to
> ensure
>   > that we have a consistent numbering scheme.
>   >
>   >   2023-06-26  Christophe Lyon >
>   >
>   >   PR target/110268
>   >   gcc/
>   >   * config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle
> LTO.
>   >   (arm_builtin_decl): Hahndle MVE builtins.
>   >   * config/arm/arm-mve-builtins.cc (builtin_decl): New function.
>   >   (add_unique_function): Fix handling of
>   >   __ARM_MVE_PRESERVE_USER_NAMESPACE.
>   >   (add_overloaded_function): Likewise.
>   >   * config/arm/arm-protos.h (builtin_decl): New declaration.
>   >
>   >   gcc/testsuite/
>   >   * gcc.target/arm/pr110268-1.c: New test.
>   >   * gcc.target/arm/pr110268-2.c: New test.
>   > ---
>   >  gcc/config/arm/arm-builtins.cc| 11 +++-
>   >  gcc/config/arm/arm-mve-builtins.cc| 61 --
> -
>   >  gcc/config/arm/arm-protos.h   |  1 +
>   >  gcc/testsuite/gcc.target/arm/pr110268-1.c | 11 
>   >  gcc/testsuite/gcc.target/arm/pr110268-2.c | 22 
>   >  5 files changed, 76 insertions(+), 30 deletions(-)
>   >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-1.c
>   >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-2.c
>   >
>   > diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-
> builtins.cc
>   > index 36365e40a5b..fca7dcaf565 100644
>   > --- a/gcc/config/arm/arm-builtins.cc
>   > +++ b/gcc/config/arm/arm-builtins.cc
>   > @@ -1918,6 +1918,15 @@ arm_init_mve_builtins (void)
>   >arm_builtin_datum *d = _builtin_data[i];
>   >arm_init_builtin (fcode, d, "__builtin_mve");
>   >  }
>   > +
>   > +  if (in_lto_p)
>   > +{
>   > +  arm_mve::handle_arm_mve_types_h ();
>   > +  /* Under LTO, we cannot know whether
>   > +  __ARM_MVE_PRESERVE_USER_NAMESPACE was defined, so
> assume
>   > it
>   > +  was not.  */
>   > +  arm_mve::handle_arm_mve_h (false);
>   > +}
>   >  }
>   >
>   >  /* Set up all the NEON builtins, even builtins for instructions that
> are not
>   > @@ -2723,7 +2732,7 @@ arm_builtin_decl (unsigned code, bool
> initialize_p
>   > ATTRIBUTE_UNUSED)
>   >  case ARM_BUILTIN_GENERAL:
>   >return arm_general_builtin_decl (subcode);
>   >  case ARM_BUILTIN_MVE:
>   > -  return error_mark_node;
>   > +  return arm_mve::builtin_decl (subcode);
>   >  default:
>   >gcc_unreachable ();
>   >  }
>   > diff --git a/gcc/config/arm/arm-mve-builtins.cc
> b/gcc/config/arm/arm-mve-
>   > builtins.cc
>   > index 7033e41a571..e9a12f27411 100644
>   > --- a/gcc/config/arm/arm-mve-builtins.cc
>   > +++ b/gcc/config/arm/arm-mve-builtins.cc
>   > @@ -493,6 +493,16 @@ handle_arm_mve_h (bool
>   > preserve_user_namespace)
>   >  

Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-06 Thread Olivier Dion via Gcc
On Tue, 04 Jul 2023, Alan Stern  wrote:
> On Tue, Jul 04, 2023 at 01:19:23PM -0400, Olivier Dion wrote:
>> On Mon, 03 Jul 2023, Alan Stern  wrote:
>> > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
[...]
> Oh, is that it?  Then I misunderstood entirely; I thought you were 
> talking about augmenting the set of functions or macros made available 
> in liburcu.  I did not realize you intended to change the compilers.

Yes.  We want to extend the atomic builtins API of the toolchains.

>> Indeed, our intent is to discuss the Userspace RCU uatomic API by extending
>> the toolchain's atomic builtins and not the LKMM itself.  The reason why
>> we've reached out to the Linux kernel developers is because the
>> original Userspace RCU uatomic API is based on the LKMM.
>
> But why do you want to change the compilers to better support urcu?  
> That seems like going about things backward; wouldn't it make more sense 
> to change urcu to better match the facilities offered by the current 
> compilers?

The initial motivation for the migration of the Userspace RCU atomics
API from custom inline assembler (mimicking the LKMM) to the C11/C++11
memory model was for supporting userspace tools such as TSAN.

We did that by porting everything to the compiler's atomic builtins API.
However, because of the "fully-ordered" atomic semantic of the LKMM, we
had no other choices than to add memory fences which are redundant on
some strongly ordered architectures.

> What if everybody started to do this: modifying the compilers to better 
> support their pet projects?  The end result would be chaos!

This is why we are starting this discussion which involves members of
the Kernel and toolchains communities.  We have prior experience, e.g. with
asm gotos which were implemented in GCC, and Clang afterward, in
response to Linux Kernel tracepoint's requirements.

Note that the motivation for supporting TSAN in Userspace RCU is coming
from the requirements of the ISC for the BIND 9 project.

[...]
>> If we go for the grouping in a), we have to take into account that the
>> barriers emitted need to cover the worse case scenario.  As an example,
>> Clang can emit a store for a exchange with SEQ_CST on x86-64, if the
>> returned value is not used.
>> 
>> Therefore, for the grouping in a), all RMW would need to emit a memory
>> barrier (with Clang on x86-64).  But with the scheme in b), we can emit
>> the barrier explicitly for the exchange operation.  We however question
>> the usefulness of this kind of optimization made by the compiler, since
>> a user should use a store operation instead.
>
> So in the end you settled on a compromise?

We have not settled on anything yet.  Choosing between options a) and b)
is open to discussion.

[...]


Thanks,
Olivier
-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com


[Bug fortran/110576] New: ICE on compilation

2023-07-06 Thread juergen.reuter at desy dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110576

Bug ID: 110576
   Summary: ICE on compilation
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juergen.reuter at desy dot de
  Target Milestone: ---

Created attachment 55490
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55490=edit
reproducer

The following reproducer leads to an ICE which I see already with gfortran
11.3. It was intended to become a reproducer for the optimization bug in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311
but this is a separate issue. I will work around this one in the reproducer for 
110311.
In the most recent master branch, 14.0.0, it leads to
internal compiler error: Segmentation fault
0xd6eabf crash_signal
../../gcc/toplev.cc:314
0x7fe2411f151f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x844f2b structure_alloc_comps
../../gcc/fortran/trans-array.cc:9228
0x8459bf structure_alloc_comps
../../gcc/fortran/trans-array.cc:9167
0x847e8c gfc_deallocate_alloc_comp(gfc_symbol*, tree_node*, int, int)
../../gcc/fortran/trans-array.cc:10265
0x86980a gfc_conv_procedure_call(gfc_se*, gfc_symbol*, gfc_actual_arglist*,
gfc_expr*, vec*)
../../gcc/fortran/trans-expr.cc:6940
0x8b1952 gfc_trans_call(gfc_code*, bool, tree_node*, tree_node*, bool)
../../gcc/fortran/trans-stmt.cc:424
0x82f93b trans_code
../../gcc/fortran/trans.cc:2297
0x8b5c30 gfc_trans_block_construct(gfc_code*)
../../gcc/fortran/trans-stmt.cc:2351
0x82f887 trans_code
../../gcc/fortran/trans.cc:2325
0x85da69 gfc_generate_function_code(gfc_namespace*)
../../gcc/fortran/trans-decl.cc:7717
0x833ec1 gfc_generate_module_code(gfc_namespace*)
../../gcc/fortran/trans.cc:2651
0x7d42f5 translate_all_program_units
../../gcc/fortran/parse.cc:6914
0x7d42f5 gfc_parse_file()
../../gcc/fortran/parse.cc:7233
0x82c6ef gfc_be_parse_file
../../gcc/fortran/f95-lang.cc:229
Please submit a full bug report, with preprocessed source.
Please include the complete backtrace with any bug report.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

--- Comment #6 from Xi Ruoyao  ---
(In reply to avieira from comment #5)
> Hi Xi,
> 
> Feel free to test your patch and submit it to the list for review. I had a
> look over and it looks correct to me.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623782.html

The changes from the version posted here:

1. Add a test case (I already made it sandwiched because a very first, not
posted version of the patch failed with sandwiched cases).
2. Slightly adjusted the comment.

There is another issue: if mask_width + shift_n == prec, we should omit the
AND_EXPR even for unsigned bit-field.  For example

movq$-256, %rax
vmovq   %rax, %xmm1
vpunpcklqdq %xmm1, %xmm1, %xmm1
vpand   (%rcx,%rdi,8), %xmm1, %xmm1
vpsrlq  $8, %xmm1, %xmm1

can be just

vmovdqu (%rcx,%rdi,8), %xmm1
vpsrlq  $8, %xmm1, %xmm1

But it's a different issue so we can fix it in a different patch.

[Bug libstdc++/110574] --enable-cstdio=stdio_pure is incompatible with LFS

2023-07-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110574

--- Comment #3 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #0)
> Using --enable-cstdio=stdio_pure on x86_64-pc-linux-gnu results in test
> failures:
> 
> FAIL: 27_io/basic_filebuf/imbue/char/13171-2.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/12790-3.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/45628-2.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/1-in.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/1-io.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/1-out.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/2-in.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/2-io.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/2-out.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/26777.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/char/4.cc execution test
> FAIL: 27_io/basic_filebuf/seekoff/wchar_t/4.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/12790-2.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/12790-3.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/char/1-in.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/char/1-io.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/char/1-out.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/char/2-in.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/char/2-io.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/char/2-out.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/wchar_t/9874.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/wchar_t/9875_seekpos.cc execution test
> FAIL: 27_io/basic_filebuf/sgetn/char/2-in.cc execution test
> FAIL: 27_io/basic_filebuf/sgetn/char/2-io.cc execution test
> FAIL: 27_io/basic_filebuf/sputbackc/char/1-io.cc execution test
> FAIL: 27_io/basic_filebuf/sputbackc/char/2-io.cc execution test
> FAIL: 27_io/basic_filebuf/sungetc/char/1-io.cc execution test
> FAIL: 27_io/basic_filebuf/sungetc/char/2-io.cc execution test
> FAIL: 27_io/basic_filebuf/underflow/char/10097.cc execution test
> FAIL: 27_io/basic_filebuf/underflow/wchar_t/5.cc execution test
> FAIL: 27_io/basic_fstream/53984.cc execution test
> FAIL: 27_io/basic_istream/peek/char/6414.cc execution test
> FAIL: 27_io/basic_istream/peek/wchar_t/6414.cc execution test
> FAIL: 27_io/basic_istream/seekg/char/fstream.cc execution test
> FAIL: 27_io/basic_istream/seekg/wchar_t/fstream.cc execution test
> FAIL: 27_io/basic_istream/tellg/char/fstream.cc execution test
> FAIL: 27_io/basic_istream/tellg/wchar_t/fstream.cc execution test
> FAIL: 27_io/objects/wchar_t/12.cc execution test
> 
> This seems to be because of code like:
> 
>   streamoff
>   __basic_file::seekoff(streamoff __off, ios_base::seekdir __way)
> throw ()
>   {
> #ifdef _GLIBCXX_USE_LFS
> return lseek64(this->fd(), __off, __way);
> #else
> if (__off > numeric_limits::max()
>   || __off < numeric_limits::min())
>   return -1L;
> #ifdef _GLIBCXX_USE_STDIO_PURE
> return fseek(this->file(), __off, __way);

Oh, and fseek returns 0 or -1, not the position, so we shouldn't return its
value here.

[Bug tree-optimization/110557] [13/14 Regression] Wrong code for x86_64-linux-gnu with -O3 -mavx2: vectorized loop mishandles signed bit-fields

2023-07-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110557

Xi Ruoyao  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-July/62
   ||3782.html
   Keywords||patch
 Status|NEW |ASSIGNED

[PATCH] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-06 Thread Xi Ruoyao via Gcc-patches
If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

int x : 8;
long y : 55;
bool z : 1;

The vectorized extraction of y was:

vect__ifc__49.29_110 =
  MEM  [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
  vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
  VIEW_CONVERT_EXPR(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

vect__ifc__25.16_62 =
  MEM  [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
  VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.
---

Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk and gcc-13
branch?

 gcc/testsuite/g++.dg/vect/pr110557.cc | 37 +
 gcc/tree-vect-patterns.cc | 58 ---
 2 files changed, 81 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr110557.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr110557.cc 
b/gcc/testsuite/g++.dg/vect/pr110557.cc
new file mode 100644
index 000..e1fbe1caac4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr110557.cc
@@ -0,0 +1,37 @@
+// { dg-additional-options "-mavx" { target { avx_runtime } } }
+
+static inline long
+min (long a, long b)
+{
+  return a < b ? a : b;
+}
+
+struct Item
+{
+  int x : 8;
+  long y : 55;
+  bool z : 1;
+};
+
+__attribute__ ((noipa)) long
+test (Item *a, int cnt)
+{
+  long size = 0;
+  for (int i = 0; i < cnt; i++)
+size = min ((long)a[i].y, size);
+  return size;
+}
+
+int
+main ()
+{
+  struct Item items[] = {
+{ 1, -1 },
+{ 2, -2 },
+{ 3, -3 },
+{ 4, -4 },
+  };
+
+  if (test (items, 4) != -4)
+__builtin_trap ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1bc36b043a0..20412c27ead 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2566,7 +2566,7 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
Widening with mask first, shift later:
container = (type_out) container;
masked = container & (((1 << bitsize) - 1) << bitpos);
-   result = patt2 >> masked;
+   result = masked >> bitpos;
 
Widening with shift first, mask last:
container = (type_out) container;
@@ -2578,6 +2578,15 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
result = masked >> bitpos;
result = (type_out) result;
 
+   If the bitfield is signed and it's wider than type_out, we need to
+   keep the result sign-extended:
+   container = (type) container;
+   masked = container << (prec - bitsize - bitpos);
+   result = (type_out) (masked >> (prec - bitsize));
+
+   Here type is the signed variant of the wider of type_out and the type
+   of container.
+
The shifting is always optional depending on whether bitpos != 0.
 
 */
@@ -2636,14 +2645,22 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (BYTES_BIG_ENDIAN)
 shift_n = prec - shift_n - mask_width;
 
+  bool sign_ext = (!TYPE_UNSIGNED (TREE_TYPE (bf_ref)) &&
+  TYPE_PRECISION (ret_type) > mask_width);
+  bool widening = ((TYPE_PRECISION (TREE_TYPE (container)) <
+   TYPE_PRECISION (ret_type))
+  && !useless_type_conversion_p (TREE_TYPE (container),
+ ret_type));
+
   /* We move the conversion earlier if the loaded type is smaller than the
  return type to enable the use of widening loads.  */
-  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
-  && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+  if (sign_ext || widening)
 {
-  pattern_stmt
-   = gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
-  NOP_EXPR, container);
+  tree type = widening ? ret_type : container_type;
+  if (sign_ext)
+   type = gimple_signed_type (type);
+  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (type),
+ NOP_EXPR, container);
   container = gimple_get_lhs (pattern_stmt);
   container_type = TREE_TYPE (container);
   prec = tree_to_uhwi (TYPE_SIZE (container_type));
@@ -2671,7 +2688,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
 shift_first = true;
 
   tree result;
-  if (shift_first)
+  if (shift_first && !sign_ext)
 {
   tree shifted = container;
   if (shift_n)
@@ -2694,14 +2711,27 @@ 

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Jeff Law via Gcc-patches




On 7/6/23 06:44, Richard Biener via Gcc-patches wrote:

On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
 wrote:


Hi,

If a loop is unrolled by n times during vectoriation, two steps are used to
calculate the induction variable:
   - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
   - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)

This patch calculates an extra vec_n to replace vec_loop:
   vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.

So that we can save the large step register and related operations.


OK.  It would be nice to avoid the dead stmts created earlier though.

Thanks,
Richard.


gcc/ChangeLog:

 PR tree-optimization/110449
 * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
 vec_loop for the unrolled loop.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/pr110449.c: New testcase.
I didn't see Hao Liu in the MAINTAINERS file, so probably doesn't have 
write access.  Therefore I went ahead and pushed this for Hao.


jeff


[Bug tree-optimization/110449] Vect: use a small step to calculate the loop induction if the loop is unrolled during loop vectorization

2023-07-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:224fd59b2dc8a5fa78a309a09863afe9b3cf2111

commit r14-2367-g224fd59b2dc8a5fa78a309a09863afe9b3cf2111
Author: Hao Liu OS 
Date:   Thu Jul 6 10:04:46 2023 -0600

Vect: use a small step to calculate induction for the unrolled loop (PR
tree-optimization/110449)

If a loop is unrolled by n times during vectoriation, two steps are used to
calculate the induction variable:
  - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n *
Step)
  - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)

This patch calculates an extra vec_n to replace vec_loop:
  vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.

So that we can save the large step register and related operations.

gcc/ChangeLog:

PR tree-optimization/110449
* tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
vec_loop for the unrolled loop.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110449.c: New testcase.

[PATCH] [og13] OpenMP: Expand "declare mapper" mappers for target {enter, exit, } data directives

2023-07-06 Thread Julian Brown
This patch allows 'declare mapper' mappers to be used on 'omp target
data', 'omp target enter data' and 'omp target exit data' directives.
For each of these, only explicit mappings are supported, unlike for
'omp target' directives where implicit uses of variables inside an
offload region might trigger mappers also.

Each of C, C++ and Fortran are supported.

The patch also adjusts 'map kind decay' to match OpenMP 5.2 semantics,
which is particularly important with regard to 'exit data' operations.

Tested with offloading to AMD GCN.  I will apply (to the og13 branch)
shortly.

2023-07-06  Julian Brown  

gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA.
(c_omp_instantiate_mappers): Add region type parameter.
* c-omp.cc (omp_split_map_kind, omp_join_map_kind,
omp_map_decayed_kind): New functions.
(omp_instantiate_mapper): Add ORT parameter.  Implement map kind decay
for instantiated mapper clauses.
(c_omp_instantiate_mappers): Add ORT parameter, pass to
omp_instantiate_mapper.

gcc/c/
* c-parser.cc (c_parser_omp_target_data): Instantiate mappers for
'omp target data'.
(c_parser_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(c_parser_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
(c_parser_omp_target): Add c_omp_region_type argument to
c_omp_instantiate_mappers call.
* c-tree.h (c_omp_instantiate_mappers): Remove spurious prototype.

gcc/cp/
* parser.cc (cp_parser_omp_target_data): Instantiate mappers for 'omp
target data'.
(cp_parser_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(cp_parser_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
(cp_parser_omp_target): Add c_omp_region_type argument to
c_omp_instantiate_mappers call.
* pt.cc (tsubst_omp_clauses): Instantiate mappers for OMP regions other
than just C_ORT_OMP_TARGET.
(tsubst_expr): Update call to tsubst_omp_clauses for OMP_TARGET_UPDATE,
OMP_TARGET_ENTER_DATA, OMP_TARGET_EXIT_DATA stanza.
* semantics.cc (cxx_omp_map_array_section): Avoid calling
build_array_ref for non-array/non-pointer bases (error reported
already).

gcc/fortran/
* trans-openmp.cc (omp_split_map_op, omp_join_map_op,
omp_map_decayed_kind): New functions.
(gfc_trans_omp_instantiate_mapper): Add CD parameter.  Implement map
kind decay.
(gfc_trans_omp_instantiate_mappers): Add CD parameter.  Pass to above
function.
(gfc_trans_omp_target_data): Instantiate mappers for 'omp target data'.
(gfc_trans_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(gfc_trans_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.

gcc/testsuite/
* c-c++-common/gomp/declare-mapper-15.c: New test.
* c-c++-common/gomp/declare-mapper-16.c: New test.
* g++.dg/gomp/declare-mapper-1.C: Adjust expected scan output.
* gfortran.dg/gomp/declare-mapper-22.f90: New test.
* gfortran.dg/gomp/declare-mapper-23.f90: New test.
---
 gcc/c-family/c-common.h   |   4 +-
 gcc/c-family/c-omp.cc | 193 +++-
 gcc/c/c-parser.cc |  14 +-
 gcc/c/c-tree.h|   1 -
 gcc/cp/parser.cc  |  19 +-
 gcc/cp/pt.cc  |   8 +-
 gcc/cp/semantics.cc   |   5 +-
 gcc/fortran/trans-openmp.cc   | 209 --
 .../c-c++-common/gomp/declare-mapper-15.c |  59 +
 .../c-c++-common/gomp/declare-mapper-16.c |  39 
 gcc/testsuite/g++.dg/gomp/declare-mapper-1.C  |   2 +-
 .../gfortran.dg/gomp/declare-mapper-22.f90|  60 +
 .../gfortran.dg/gomp/declare-mapper-23.f90|  25 +++
 13 files changed, 600 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-15.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-16.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-22.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-23.f90

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ea6c479cd62..c805c8b2f7e 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1270,8 +1270,10 @@ enum c_omp_region_type
   C_ORT_ACC= 1 << 1,
   C_ORT_DECLARE_SIMD   = 1 << 2,
   C_ORT_TARGET = 1 << 3,
+  C_ORT_EXIT_DATA  = 1 << 4,
   C_ORT_OMP_DECLARE_SIMD   = C_ORT_OMP | C_ORT_DECLARE_SIMD,
   C_ORT_OMP_TARGET = C_ORT_OMP | C_ORT_TARGET,
+  C_ORT_OMP_EXIT_DATA  = C_ORT_OMP | 

[Bug c++/110555] internal compiler error: Segmentation fault when using std::ranges::range auto as a template parameter

2023-07-06 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110555

Marek Polacek  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-06
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||mpolacek at gcc dot gnu.org

--- Comment #1 from Marek Polacek  ---
Confirmed, even 11 ICEs.

Re: [PATCH] Break false dependence for vpternlog by inserting vpxor.

2023-07-06 Thread simonaytes.yan--- via Gcc-patches

+; False dependency happens on destination register which is not really
+; used when moving all ones to vector register
+(define_split
+  [(set (match_operand:VMOVE 0 "register_operand")
+   (match_operand:VMOVE 1 "int_float_vector_all_ones_operand"))]
+  "TARGET_AVX512F && reload_completed
+  && ( == 64 || EXT_REX_SSE_REG_P (operands[0]))"
+  [(set (match_dup 0) (match_dup 2))
+   (parallel
+ [(set (match_dup 0) (match_dup 1))
+  (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
+  "operands[2] = CONST0_RTX (mode);")


I think we shouldnt emit PXOR when optimizing for size. So should change 
define_split:

define_split
  [(set (match_operand:VMOVE 0 "register_operand")
(match_operand:VMOVE 1 "int_float_vector_all_ones_operand"))]
  "TARGET_AVX512F && reload_completed
  && ( == 64 || EXT_REX_SSE_REG_P (operands[0]))
  && optimize_insn_for_speed_p ()"
  [(set (match_dup 0) (match_dup 2))
   (parallel
 [(set (match_dup 0) (match_dup 1))
  (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
  "operands[2] = CONST0_RTX (mode);")


[Bug libstdc++/104299] Doc: stdio is not the only option in --enable-cstdio=XXX

2023-07-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104299

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:67bda4331dc4f548820ed2f3138aa7f64fd4c77d

commit r12-9757-g67bda4331dc4f548820ed2f3138aa7f64fd4c77d
Author: Jonathan Wakely 
Date:   Thu Jul 6 16:25:47 2023 +0100

libstdc++: Document --enable-cstdio=stdio_pure [PR104299]

libstdc++-v3/ChangeLog:

PR libstdc++/104299
* doc/xml/manual/configure.xml: Describe stdio_pure argument to
--enable-cstdio.
* doc/html/manual/configure.html: Regenerate.

(cherry picked from commit b90a70984a9beee39b41f842b56926f9db2069ca)

[Bug libstdc++/104299] Doc: stdio is not the only option in --enable-cstdio=XXX

2023-07-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104299

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |11.5

[Bug libstdc++/104299] Doc: stdio is not the only option in --enable-cstdio=XXX

2023-07-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104299

--- Comment #3 from Jonathan Wakely  ---
Fixed on trunk and gcc-13 so far.

[Bug libstdc++/104299] Doc: stdio is not the only option in --enable-cstdio=XXX

2023-07-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104299

--- Comment #2 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:94d24f1af684d37b9e1c6ad9b54c98609140eb1f

commit r13-7537-g94d24f1af684d37b9e1c6ad9b54c98609140eb1f
Author: Jonathan Wakely 
Date:   Thu Jul 6 16:25:47 2023 +0100

libstdc++: Document --enable-cstdio=stdio_pure [PR104299]

libstdc++-v3/ChangeLog:

PR libstdc++/104299
* doc/xml/manual/configure.xml: Describe stdio_pure argument to
--enable-cstdio.
* doc/html/manual/configure.html: Regenerate.

(cherry picked from commit b90a70984a9beee39b41f842b56926f9db2069ca)

[Bug libstdc++/110574] --enable-cstdio=stdio_pure is incompatible with LFS

2023-07-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110574

--- Comment #2 from Jonathan Wakely  ---
Doh, I put the wrong PR number in that commit, it's meant to be for PR 104299

[committed] libstdc++: Document --enable-cstdio=stdio_pure [PR110574]

2023-07-06 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk. Backports to 11, 12 and 13 will follow.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/110574
* doc/xml/manual/configure.xml: Describe stdio_pure argument to
--enable-cstdio.
* doc/html/manual/configure.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/configure.html | 11 ---
 libstdc++-v3/doc/xml/manual/configure.xml   | 11 ---
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/configure.xml 
b/libstdc++-v3/doc/xml/manual/configure.xml
index 7ff07aea886..1b8c37ce2a9 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -74,9 +74,14 @@
  
 
  --enable-cstdio=OPTION
- Select a target-specific I/O package. At the moment, the only
-   choice is to use 'stdio', a generic "C" abstraction.
-   The default is 'stdio'. This option can change the library ABI.
+ Select a target-specific I/O package. The choices are 'stdio'
+   which is a generic abstraction using POSIX file I/O APIs
+   (read, write,
+   lseek, etc.), and 'stdio_pure' which is similar
+   but only uses standard C file I/O APIs (fread,
+   fwrite, fseek, etc.).
+   The 'stdio_posix' choice is a synonym for 'stdio'.
+   The default is 'stdio'. This option can change the library ABI.
  
  
 
-- 
2.41.0



[PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches
GCC maintainers:

Ver 4. Fixed a few typos.  Redid the tests to create separate run and
compile tests.

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl



-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the builtin argument
types and the functional correctness of each builtin.  There is also a
compile only test that verifies the builtins generate the expected number
of instructions for the various builtin tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-compile.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op-compile.c   |  22 ++
 .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
 .../powerpc/vsx-vector-6-func-1op.h   |  43 
 .../powerpc/vsx-vector-6-func-2lop-compile.c  |  14 ++
 .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
 .../powerpc/vsx-vector-6-func-2lop.h  |  47 
 .../powerpc/vsx-vector-6-func-2op-compile.c   |  21 ++
 .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
 .../powerpc/vsx-vector-6-func-2op.h   |  42 
 .../powerpc/vsx-vector-6-func-3op-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
 .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
 .../vsx-vector-6-func-cmp-all-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
 .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
 .../powerpc/vsx-vector-6-func-cmp-compile.c   |  16 ++
 .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
 .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
 .../gcc.target/powerpc/vsx-vector-6.h | 154 
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 
 22 files changed, 1267 insertions(+), 282 deletions(-)
 create mode 100644 

Re: [PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches
Kewen:

On Tue, 2023-07-04 at 10:49 +0800, Kewen.Lin wrote:
> 



> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> 
> s/seriers/series/

Fixed

> 
> > new tests are runnable tests to verify the builtin argument types
> > and the
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> 
> Missing "func-" in the names ...

Fixed.

> 
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> 
> should be vsx-vector-6-p{7,8,9}.c, "git gcc-verify" should catch
> these.

Fixed, ran git gcc-verify which found a couple more little file name
typos.
> 
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 217
> > +++
> >  .../powerpc/vsx-vector-6-func-2op.c   | 133 +
> >  .../powerpc/vsx-vector-6-func-3op.c   | 257
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
> >  10 files changed, 1080 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p7.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p8.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p9.c
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > new file mode 100644
> > index 000..52c7ae3e983
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > @@ -0,0 +1,141 @@
> > +/* { dg-do run { target lp64 } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-options "-O2 -save-temps" } */
> 
> I just noticed that we missed an effective target check here to
> ensure the
> support of those bifs during the test run, and since it's a runnable
> test
> case, also need to ensure the generated hw insn supported, it's
> "vsx_hw"
> like:
> 
> /* { dg-require-effective-target vsx_hw } */
> 
> And adding "-mvsx" to the dg-options.

Add the effective-target and -mvsx to all of the tests.

> 
> This is also applied for the other test cases.
> 
> But as the discussion on xxlor and the different effective target
> requirements
> on compilation part and run part, I think we can separate each of
> these cases into
> two files, one for compilation and the other for run, for example,
> for this
> case, update FLOAT_TEST by adding one more global variable like
> 
> #define FLOAT_TEST(NAME)
>   vector float f_##NAME##_result; \
>   void ... \
>   f_##NAME##_result = vec_##NAME(f_src);\
>   }
>   // moving the checking code to its main.
> 
> move #include , FLOAT_TEST(NAME), DOUBLE_TEST(NAME)
> defines
> and their uses into vsx-vector-6-func-1op.h.
> 
> 
> **For compilation file vsx-vector-6-func-1op.c**:
> 
> Include this header file into vsx-vector-6-func-1op.c, which has the
> 
> /* { dg-do compile { target lp64 } } */
> /* { dg-require-effective-target powerpc_vsx_ok } */
> /* { dg-options "-O2 -mvsx" } */
> 
> #include "vsx-vector-6-func-1op.h"
> 
> Then put the expected insn check here, like 
> 
> /* { dg-final { scan-assembler-times {\mxvabssp\M} 1 } } */
> ...
> 
> By organizing it like this, these scan-assembler-times would only
> focus on what
> are generated for bifs (excluding possible noises from main function
> for running).
> 
> 
> **For runnable file 

Re: [PATCH v5] RISC-V: Fix one bug for floating-point static frm

2023-07-06 Thread Robin Dapp via Gcc-patches
Hi Pan,

thanks,  I think that works for me as I'm expecting these
parts to change a bit anyway in the near future.

There is no functional change to the last revision that
Kito already OK'ed so I think you can go ahead.

Regards
 Robin


Re: [PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-07-06 Thread Jeff Law via Gcc-patches




On 7/6/23 00:48, Christoph Müllner wrote:



Thanks for this!
Of course I was "lucky" and ran into the issue that the patterns did not match,
because of unexpected MULT insns where ASHIFTs were expected.
But after reading enough of combiner.cc I understood that this is on purpose
(for addresses) and I have to adjust my INSNs accordingly.
Yea, it's a wart that the same operation has two different canonical 
forms depending on the context where it shows up :(  It's definitely a wart.




I've changed the patches for XTheadMemIdx and XTheadFMemIdx and will
send out a new series.

Sounds good.

Jeff


  1   2   3   >