Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-06 Thread Richard Biener via Gcc-patches
On Thu, 6 Jul 2023, Jan Hubicka wrote:

> Hi,
> original scale_loop_profile was implemented to only handle very simple loops
> produced by vectorizer at that time (basically loops with only one exit and no
> subloops). It also has not been updated to new profile-count API very 
> carefully.
> Since I want to use it from loop peeling and unlooping, I need the
> function to at least not get profile worse on general loops.
> 
> The function does two thigs
>  1) scales down the loop profile by a given probability.
> This is useful, for example, to scale down profile after peeling when loop
> body is executed less often than before
>  2) after scaling is done and if profile indicates too large iteration
> count update profile to cap iteration count by ITERATION_BOUND parameter.
> 
> Step 1 is easy and unchanged.
> 
> I changed ITERATION_BOUND to be actual bound on number of iterations as
> used elsewhere (i.e. number of executions of latch edge) rather then
> number of iterations + 1 as it was before.
> 
> To do 2) one needs to do the following
>   a) scale own loop profile so frquency o header is at most
>  the sum of in-edge counts * (iteration_bound + 1)
>   b) update loop exit probabilities so their count is the same
>  as before scaling.
>   c) reduce frequencies of basic blocks after loop exit
> 
> old code did b) by setting probability to 1 / iteration_bound which is
> correctly only of the basic block containing exit executes precisely one per
> iteration (it is not insie other conditional or inner loop).  This is fixed
> now by using set_edge_probability_and_rescale_others
> 
> aldo c) was implemented only for special case when the exit was just before
> latch bacis block.  I now use dominance info to get right some of addional
> case.
> 
> I still did not try to do anything for multiple exit loops, though the
> implementatoin could be generalized.
> 
> Bootstrapped/regtested x86_64-linux.  Plan to cmmit it tonight if there
> are no complains.

Looks good, but I wonder what we can do to at least make the
multiple exit case behave reasonably?  The vectorizer keeps track
of a "canonical" exit, would it be possible to pass in the main
exit edge and use that instead of single_exit (), would other
exits then behave somewhat reasonable or would we totally screw
things up here?  That is, the "canonical" exit would be the
counting exit while the other exits are on data driven conditions
and thus wouldn't change probability when we reduce the number
of iterations(?)

Richard.

> gcc/ChangeLog:
> 
>   * cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
>   probability update to be safe on loops with subloops.
>   Make bound parameter to be iteration bound.
>   * tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
>   of scale_loop_profile.
>   * tree-vect-loop-manip.cc (vect_do_peeling): Likewise.
> 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 6e09dcbb0b1..524b979a546 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -499,7 +499,7 @@ scale_loop_frequencies (class loop *loop, 
> profile_probability p)
>  }
>  
>  /* Scale profile in LOOP by P.
> -   If ITERATION_BOUND is non-zero, scale even further if loop is predicted
> +   If ITERATION_BOUND is not -1, scale even further if loop is predicted
> to iterate too many times.
> Before caling this function, preheader block profile should be already
> scaled to final count.  This is necessary because loop iterations are
> @@ -510,106 +510,123 @@ void
>  scale_loop_profile (class loop *loop, profile_probability p,
>   gcov_type iteration_bound)
>  {
> -  edge e, preheader_e;
> -  edge_iterator ei;
> -
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> +  if (!(p == profile_probability::always ()))
>  {
> -  fprintf (dump_file, ";; Scaling loop %i with scale ",
> -loop->num);
> -  p.dump (dump_file);
> -  fprintf (dump_file, " bounding iterations to %i\n",
> -(int)iteration_bound);
> -}
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fprintf (dump_file, ";; Scaling loop %i with scale ",
> +loop->num);
> +   p.dump (dump_file);
> +   fprintf (dump_file, "\n");
> + }
>  
> -  /* Scale the probabilities.  */
> -  scale_loop_frequencies (loop, p);
> +  /* Scale the probabilities.  */
> +  scale_loop_frequencies (loop, p);
> +}
>  
> -  if (iteration_bound == 0)
> +  if (iteration_bound == -1)
>  return;
>  
>gcov_type iterations = expected_loop_iterations_unbounded (loop, NULL, 
> true);
> +  if (iterations == -1)
> +return;
>  
>if (dump_file && (dump_flags & TDF_DETAILS))
>  {
> -  fprintf (dump_file, ";; guessed iterations after scaling %i\n",
> -(int)iterations);
> +  fprintf (dump_file,
> +";; guessed iterations of loop %i:%i new upper bound %i:\n",
> +loop->num,
> +

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Richard Biener via Gcc-patches



> Am 06.07.2023 um 19:50 schrieb Richard Sandiford :
> 
> Richard Biener via Gcc-patches  writes:
>>> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> If a loop is unrolled by n times during vectoriation, two steps are used to
>>> calculate the induction variable:
>>>  - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>>>  - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>>> 
>>> This patch calculates an extra vec_n to replace vec_loop:
>>>  vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>>> 
>>> So that we can save the large step register and related operations.
>> 
>> OK.  It would be nice to avoid the dead stmts created earlier though.
> 
> FWIW, I still don't think we should do this.  Part of the point of
> unrolling is to shorten loop-carried dependencies, whereas this patch
> is going in the opposite direction.

Note ncopies can be >1 without additional unrolling.  With non VLA vectors all 
of the updates will be constant folded btw.

Richard 

> Richard
> 
>> 
>> Thanks,
>> Richard.
>> 
>>> gcc/ChangeLog:
>>> 
>>>PR tree-optimization/110449
>>>* tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
>>>vec_loop for the unrolled loop.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>* gcc.target/aarch64/pr110449.c: New testcase.
>>> ---
>>> gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +
>>> gcc/tree-vect-loop.cc   | 21 +--
>>> 2 files changed, 58 insertions(+), 3 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c
>>> 
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c 
>>> b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>>> new file mode 100644
>>> index 000..bb3b6dcfe08
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>>> @@ -0,0 +1,40 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param 
>>> aarch64-vect-unroll-limit=2" } */
>>> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */
>>> +
>>> +/* Calcualte the vectorized induction with smaller step for an unrolled 
>>> loop.
>>> +
>>> +   before (suggested_unroll_factor=2):
>>> + fmovs30, 8.0e+0
>>> + fmovs31, 4.0e+0
>>> + dup v27.4s, v30.s[0]
>>> + dup v28.4s, v31.s[0]
>>> + .L6:
>>> + mov v30.16b, v31.16b
>>> + faddv31.4s, v31.4s, v27.4s
>>> + faddv29.4s, v30.4s, v28.4s
>>> + stp q30, q29, [x0]
>>> + add x0, x0, 32
>>> + cmp x1, x0
>>> + bne .L6
>>> +
>>> +   after:
>>> + fmovs31, 4.0e+0
>>> + dup v29.4s, v31.s[0]
>>> + .L6:
>>> + faddv30.4s, v31.4s, v29.4s
>>> + stp q31, q30, [x0]
>>> + add x0, x0, 32
>>> + faddv31.4s, v29.4s, v30.4s
>>> + cmp x0, x1
>>> + bne .L6  */
>>> +
>>> +void
>>> +foo2 (float *arr, float freq, float step)
>>> +{
>>> +  for (int i = 0; i < 1024; i++)
>>> +{
>>> +  arr[i] = freq;
>>> +  freq += step;
>>> +}
>>> +}
>>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>>> index 3b46c58a8d8..706ecbffd0c 100644
>>> --- a/gcc/tree-vect-loop.cc
>>> +++ b/gcc/tree-vect-loop.cc
>>> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>>   new_vec, step_vectype, NULL);
>>> 
>>>   vec_def = induc_def;
>>> -  for (i = 1; i < ncopies; i++)
>>> +  for (i = 1; i < ncopies + 1; i++)
>>>{
>>>  /* vec_i = vec_prev + vec_step  */
>>>  gimple_seq stmts = NULL;
>>> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>>  vec_def = gimple_convert (, vectype, vec_def);
>>> 
>>>  gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
>>> - new_stmt = SSA_NAME_DEF_STMT (vec_def);
>>> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>>> + if (i < ncopies)
>>> +   {
>>> + new_stmt = SSA_NAME_DEF_STMT (vec_def);
>>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>>> +   }
>>> + else
>>> +   {
>>> + /* vec_1 = vec_iv + (VF/n * S)
>>> +vec_2 = vec_1 + (VF/n * S)
>>> +...
>>> +vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop
>>> +
>>> +vec_n is used as vec_loop to save the large step register 
>>> and
>>> +related operations.  */
>>> + add_phi_arg (induction_phi, vec_def, loop_latch_edge 
>>> (iv_loop),
>>> +  UNKNOWN_LOCATION);
>>> +   }
>>>}
>>> }
>>> 
>>> --
>>> 2.34.1


[PATCH V2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-06 Thread liuhongt via Gcc-patches
> Please split the above pattern into two, one emitting UNSPEC_IEEE_MAX
> and the other emitting UNSPEC_IEEE_MIN.
Splitted.

> The test involves blendv instruction, which is SSE4.1, so it is
> pointless to test it without -msse4.1. Please add -msse4.1 instead of
> -march=x86_64 and use sse4_runtime target selector, as is the case
> with gcc.target/i386/pr90358.c.
Changed.

> Please also use -msse4.1 instead of -march here. With -mfpmath=sse,
> the test is valid also for 32bit targets, you should use -msseregparm
> additional options for ia32 (please see gcc.target/i386/pr43546.c
> testcase) in the same way as -mregparm to pass SSE arguments in
> registers.
32-bit target still failed to do condition elimination for DFmode due to
below code in rtx_cost

  /* A size N times larger than UNITS_PER_WORD likely needs N times as
 many insns, taking N times as long.  */
  factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1;

It looks like a separate issue for DFmode operation under 32-bit target.

I've enable 32-bit for the testcase, but only scan for minss/maxss
currently.

Here's updated patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.

This patch adds pre_reload splitter to detect the min/max pattern.

Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (*ieee_max3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min3_1): Ditto, but for fp min pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.
---
 gcc/config/i386/i386.md  | 43 +
 gcc/testsuite/g++.target/i386/pr110170.C | 78 
 gcc/testsuite/gcc.target/i386/pr110170.c | 21 +++
 3 files changed, 142 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/pr110170.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110170.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a82cc353cfd..6f415f899ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23163,6 +23163,49 @@ (define_insn "*ieee_s3"
(set_attr "type" "sseadd")
(set_attr "mode" "")])
 
+;; Operands order in min/max instruction matters for signed zero and NANs.
+(define_insn_and_split "*ieee_max3_1"
+  [(set (match_operand:MODEF 0 "register_operand")
+   (unspec:MODEF
+ [(match_operand:MODEF 1 "register_operand")
+  (match_operand:MODEF 2 "register_operand")
+  (lt:MODEF
+(match_operand:MODEF 3 "register_operand")
+(match_operand:MODEF 4 "register_operand"))]
+ UNSPEC_BLENDV))]
+  "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
+  && (rtx_equal_p (operands[1], operands[3])
+  && rtx_equal_p (operands[2], operands[4]))
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:MODEF
+ [(match_dup 2)
+  (match_dup 1)]
+UNSPEC_IEEE_MAX))])
+
+(define_insn_and_split "*ieee_min3_1"
+  [(set (match_operand:MODEF 0 "register_operand")
+   (unspec:MODEF
+ [(match_operand:MODEF 1 "register_operand")
+  (match_operand:MODEF 2 "register_operand")
+  (lt:MODEF
+(match_operand:MODEF 3 "register_operand")
+(match_operand:MODEF 4 "register_operand"))]
+ UNSPEC_BLENDV))]
+  "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
+  && (rtx_equal_p (operands[1], operands[4])
+  && rtx_equal_p (operands[2], operands[3]))
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:MODEF
+ [(match_dup 2)
+  (match_dup 1)]
+UNSPEC_IEEE_MIN))])
+
 ;; Make two stack loads independent:
 ;;   fld aa  fld aa
 ;;   fld %st(0) ->   fld bb
diff --git a/gcc/testsuite/g++.target/i386/pr110170.C 
b/gcc/testsuite/g++.target/i386/pr110170.C
new file mode 100644
index 000..5d6842270d0
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr110170.C
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-options " -O2 -msse4.1 -mfpmath=sse -std=gnu++20" } */
+#include 
+
+void
+__attribute__((noinline))
+__cond_swap(double* __x, double* __y) {
+  bool __r = (*__x < *__y);
+  auto __tmp = __r ? *__x : *__y;
+  *__y = __r ? *__y : *__x;
+  *__x = __tmp;
+}
+
+auto test1() {
+double nan = -0.0;
+double x = 0.0;
+__cond_swap(, );
+return x == -0.0 && nan == 0.0;
+}
+
+auto test1r() {
+double nan = NAN;
+double x = 1.0;
+__cond_swap(, );
+return isnan(x) && signbit(x) == 0 && nan == 1.0;
+}
+
+auto 

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Kewen.Lin via Gcc-patches
on 2023/7/7 07:00, Peter Bergner wrote:
> On 7/6/23 5:54 PM, Peter Bergner wrote:
>> On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
>>> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
>>> @@ -0,0 +1,153 @@
>>> +/* { dg-do run { target { powerpc*-*-* } } } */
>>
>> powerpc*-*-* is the default for this test directory, so you can drop that,
>> but you need to disable this test for soft-float systems, so you probably 
>> want:
>>
>>   /* { dg-do run { target powerpc_fprs } } */
> 
> We actually want something like powerpc_fprs_hw, but that doesn't exist.
> 

Yeah, good point!  I noticed that we have a few test cases which need to
check soft-float env as well but they don't, I didn't find any related
issues have been reported, so I would assume that there are very few
actual testings on this area.  Based on this, I'm not sure if it's worthy
to add a new effective target for it.  Personally I'm happy with just using
powerpc_fprs here to keep it simple. :)

BR,
Kewen


Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Kewen.Lin via Gcc-patches
Hi Carl,

Some more minor comments are inline below on top of Peter's insightful
review comments.

on 2023/7/1 08:58, Carl Love wrote:
> 
> GCC maintainers:
> 
> Ver 2,  Went back thru the requirements and emails.  Not sure where I
> came up with the requirement for an overloaded version with double
> argument.  Removed the overloaded version with the double argument. 
> Added the macro to announce if the __builtin_set_fpscr_rn returns a
> void or a double with the FPSCR bits.  Updated the documentation file. 
> Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
> file.  Per request, the original test file functionality was not
> changed.  Just changed the name from test_fpscr_rn_builtin.c to 
> test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
> new test file, test_fpscr_rn_builtin_2.c.
> 
> The GLibC team requested a builtin to replace the mffscrn and
> mffscrniinline asm instructions in the GLibC code.  Previously there
> was discussion on adding builtins for the mffscrn instructions.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
> 
> In the end, it was felt that it would be to extend the existing
> __builtin_set_fpscr_rn builtin to return a double instead of a void
> type.  The desire is that we could have the functionality of the
> mffscrn and mffscrni instructions on older ISAs.  The two instructions
> were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
> needed functionality to set the RN field using the mffscrn and mffscrni
> instructions if ISA 3.0 is supported or fall back to using logical
> instructions to mask and set the bits for earlier ISAs.  The
> instructions return the current value of the FPSCR fields DRN, VE, OE,
> UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
> the new RN value provided.
> 
> The current __builtin_set_fpscr_rn builtin has a return type of void. 
> So, changing the return type to double and returning the  FPSCR fields
> DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> functionally equivalent of the mffscrn and mffscrni instructions.  Any
> current uses of the builtin would just ignore the return value yet any
> new uses could use the return value.  So the requirement is for the
> change to the __builtin_set_fpscr_rn builtin to be backwardly
> compatible and work for all ISAs.
> 
> The following patch changes the return type of the
>  __builtin_set_fpscr_rn builtin from void to double.  The return value
> is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
> XE, NI, RN bit positions when the builtin is called.  The builtin then
> updated the RN field with the new value provided as an argument to the
> builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
> check that the builtin returns the current value of the FPSCR fields
> and then updates the RN field.
> 
> The GLibC team has reviewed the patch to make sure it met their needs
> as a drop in replacement for the inline asm mffscr and mffscrni
> statements in the GLibC code.  T
> 
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>Carl 
> 
> 
> --
> rs6000, __builtin_set_fpscr_rn add retrun value
> 
> Change the return value from void to double.  The return value consists of
> the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
> overloaded version which accepts a double argument.
> 
> The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
> double reterun value and the new double argument.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
>   builtin definition return type.
>   * config/rs6000-c.cc(rs6000_target_modify_macros): Add check, define
>   __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
>   * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
>   define_expand.
>   (rs6000_update_fpscr_rn_field): New define_expand.
>   (rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new
>   rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
>_expands.
>   * doc/extend.texi (__builtin_set_fpscr_rn): Update description for
>   the return value and new double argument.  Add descripton for
>   __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> 
> gcc/testsuite/ChangeLog:
>   gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
>   test_fpscr_rn_builtin_1.c.  Added comment.
>   gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
>   return value of __builtin_set_fpscr_rn builtin.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |   2 +-
>  gcc/config/rs6000/rs6000-c.cc |   4 +
>  gcc/config/rs6000/rs6000.md   |  87 +++---
>  gcc/doc/extend.texi   |  26 ++-
>  

[PATCH] VECT: Add COND_LEN_* operations for loop control with length targets

2023-07-06 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch is adding cond_len_* operations pattern for target support loop 
control with length.

These patterns will be used in these following case:

1. Integer division:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
   {
 for (int i = 0; i < n; ++i)
  {
a[i] = b[i] / c[i];
  }
   }

  ARM SVE IR:
  
  ...
  max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });

  Loop:
  ...
  # loop_mask_29 = PHI 
  ...
  vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
  ...
  vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
  vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, 
vect__4.8_28);
  ...
  .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...
  
  For target like RVV who support loop control with length, we want to see IR 
as follows:
  
  Loop:
  ...
  # loop_len_29 = SELECT_VL
  ...
  vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
  ...
  vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
  vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, 
vect__4.8_28, loop_len_29);
  ...
  .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...
  
  Notice here, we use dummp_mask = { -1, -1,  , -1 }

2. Integer conditional division:
   Similar case with (1) but with condtion:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * 
cond, int n)
   {
 for (int i = 0; i < n; ++i)
   {
 if (cond[i])
 a[i] = b[i] / c[i];
   }
   }
   
   ARM SVE:
   ...
   max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });

   Loop:
   ...
   # loop_mask_55 = PHI 
   ...
   vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
   ...
   vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
   ...
   vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
   vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, 
vect__6.13_62);
   ...
   .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
   ...
   next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });
   
   Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to 
gurantee the correct result.
   
   However, target with length control can not perform this elegant flow, for 
RVV, we would expect:
   
   Loop:
   ...
   loop_len_55 = SELECT_VL
   ...
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   ...
   vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, 
vect__8.16_66, vect__6.13_62, loop_len_55);
   ...

   Here we expect COND_LEN_DIV predicated by a real mask which is the outcome 
of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
   and a real length which is produced by loop control : loop_len_55 = SELECT_VL
   
3. conditional Floating-point operations (no -ffast-math):
   
void
f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
  a[i] = b[i] + a[i];
}
}
  
  ARM SVE IR:
  max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });

  ...
  # loop_mask_49 = PHI 
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
  ...
  vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
vect__6.13_56);
  ...
  next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
  ...
  
  For RVV, we would expect IR:
  
  ...
  loop_len_49 = SELECT_VL
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  ...
  vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, 
vect__6.13_56, loop_len_49);
  ...

4. Conditional un-ordered reduction:
   
   int32_t
   f (int32_t *restrict a, 
   int32_t *restrict cond, int n)
   {
 int32_t result = 0;
 for (int i = 0; i < n; ++i)
   {
   if (cond[i])
 result += a[i];
   }
 return result;
   }
   
   ARM SVE IR:
 
 Loop:
 # vect_result_18.7_37 = PHI 
 ...
 # loop_mask_40 = PHI 
 ...
 mask__17.11_43 = vect__4.10_41 != { 0, ... };
 vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
 ...
 vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, 
vect__7.14_47, vect_result_18.7_37);
 ...
 next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
 ...
   
 Epilogue:
 _53 = .REDUC_PLUS (vect__33.16_51); [tail call]
   
   For RVV, we expect:
 
Loop:
 # vect_result_18.7_37 = PHI 
 ...
 loop_len_40 = SELECT_VL
 ...
 mask__17.11_43 = vect__4.10_41 != { 0, ... };
 ...
 vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, 
vect__7.14_47, vect_result_18.7_37, loop_len_40);
 ...
 next_mask_58 = .WHILE_ULT (_15, 

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Hao Liu OS via Gcc-patches
Hi Jeff,

Thanks for your help.

Actually I have write access as I was added to the "contributor list". Anyway, 
that's very kind of you to help committing the patch.

Thanks,
-Hao

From: Jeff Law 
Sent: Friday, July 7, 2023 0:06
To: Richard Biener; Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] Vect: use a small step to calculate induction for the 
unrolled loop (PR tree-optimization/110449)



On 7/6/23 06:44, Richard Biener via Gcc-patches wrote:
> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> If a loop is unrolled by n times during vectoriation, two steps are used to
>> calculate the induction variable:
>>- The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>>- The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>>
>> This patch calculates an extra vec_n to replace vec_loop:
>>vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>>
>> So that we can save the large step register and related operations.
>
> OK.  It would be nice to avoid the dead stmts created earlier though.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>  PR tree-optimization/110449
>>  * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
>>  vec_loop for the unrolled loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/aarch64/pr110449.c: New testcase.
I didn't see Hao Liu in the MAINTAINERS file, so probably doesn't have
write access.  Therefore I went ahead and pushed this for Hao.

jeff


Re: [PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/7/6 23:33, Carl Love wrote:
> GCC maintainers:
> 
> Ver 4. Fixed a few typos.  Redid the tests to create separate run and
> compile tests.

Thanks!  This new version looks good, excepting that we need vsx_hw
for run and two nits, see below.

> 
> Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
> of the scan-assembler-times checks to cover multiple similar
> instructions.  Change the function check macro to a macro to generate a
> function to do the test and check the results.  Retested on the various
> processor types and BE/LE versions.
> 
> Ver 2.  Switched to using code macros to generate the call to the
> builtin and test the results.  Added in instruction counts for the key
> instruction for the builtin.  Moved the tests into an additional
> function call to ensure the compile doesn't replace the builtin call
> code with the statically computed results.  The compiler was doing this
> for a few of the simpler tests.  
> 
> The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
> 6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
> test files by functionality rather than processor version.
> 
> Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
> no regresions.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>Carl
> 
> 
> 
> -
> rs6000: Update the vsx-vector-6.* tests.
> 
> The vsx-vector-6.h file is included into the processor specific test files
> vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
> contains a large number of vsx vector builtin tests.  The processor
> specific files contain the number of instructions that the tests are
> expected to generate for that processor.  The tests are compile only.
> 
> This patch reworks the tests into a series of files for related tests.
> The new tests consist of a runnable test to verify the builtin argument
> types and the functional correctness of each builtin.  There is also a
> compile only test that verifies the builtins generate the expected number
> of instructions for the various builtin tests.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-1op-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2lop-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-2op-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-3op-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-all-compile.c: New test
>   file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6-func-cmp-compile.c: New test file.
>   * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
>   * gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
> ---
>  .../powerpc/vsx-vector-6-func-1op-compile.c   |  22 ++
>  .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
>  .../powerpc/vsx-vector-6-func-1op.h   |  43 
>  .../powerpc/vsx-vector-6-func-2lop-compile.c  |  14 ++
>  .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
>  .../powerpc/vsx-vector-6-func-2lop.h  |  47 
>  .../powerpc/vsx-vector-6-func-2op-compile.c   |  21 ++
>  .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
>  .../powerpc/vsx-vector-6-func-2op.h   |  42 
>  .../powerpc/vsx-vector-6-func-3op-compile.c   |  17 ++
>  .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
>  .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
>  .../vsx-vector-6-func-cmp-all-compile.c   |  17 ++
>  .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
>  .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
>  .../powerpc/vsx-vector-6-func-cmp-compile.c   |  16 ++
>  .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
>  .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
>  

RE: [PATCH v5] RISC-V: Fix one bug for floating-point static frm

2023-07-06 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin and Kito.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, July 6, 2023 11:30 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, 
Yanzhang ; kito.ch...@gmail.com; Robin Dapp 

Subject: Re: [PATCH v5] RISC-V: Fix one bug for floating-point static frm

Hi Pan,

thanks,  I think that works for me as I'm expecting these
parts to change a bit anyway in the near future.

There is no functional change to the last revision that
Kito already OK'ed so I think you can go ahead.

Regards
 Robin


committed: Stepping down as maintainer for ARC and Epiphany

2023-07-06 Thread Joern Wolfgang Rennecke

Stepping down as maintainer for ARC and Epiphany

* MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
epiphany maintainer.
(Write After Approval): Add myself.commit b3f20dd75e9255fc9d56d4f020972469dd671a3a
Author: Joern Rennecke 
Date:   Fri Jul 7 01:02:28 2023 +0100

Stepping down as maintainer for ARC and Epiphany

* MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
epiphany maintainer.
(Write After Approval): Add myself.

diff --git a/ChangeLog b/ChangeLog
index 140127b851d..374a0a497c8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2023-07-07  Joern Rennecke  
+
+   * MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
+   epiphany maintainer.
+   (Write After Approval): Add myself.
+
 2023-06-30  Rishi Raj  
 
* MAINTAINERS: Added myself to Write After Approval and DCO
diff --git a/MAINTAINERS b/MAINTAINERS
index 2a0eb5b52b5..95228596628 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -56,7 +56,6 @@ aarch64 port  Kyrylo Tkachov  

 alpha port Richard Henderson   
 amdgcn portJulian Brown
 amdgcn portAndrew Stubbs   
-arc port   Joern Rennecke  
 arc port   Claudiu Zissulescu  
 arm port   Nick Clifton
 arm port   Richard Earnshaw
@@ -69,7 +68,6 @@ c6x port  Bernd Schmidt   

 cris port  Hans-Peter Nilsson  
 c-sky port Xianmiao Qu 
 c-sky port Yunhai Shang
-epiphany port  Joern Rennecke  
 fr30 port  Nick Clifton
 frv port   Nick Clifton
 frv port   Alexandre Oliva 
@@ -616,6 +614,7 @@ Joe Ramsay  

 Rolf Rasmussen 
 Fritz Reese
 Volker Reichelt

+Jörn Rennecke  
 Bernhard Reutner-Fischer   
 Tom Rix
 Thomas Rodgers 


Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Segher Boessenkool
On Thu, Jul 06, 2023 at 02:48:19PM -0500, Peter Bergner wrote:
> On 7/6/23 12:33 PM, Segher Boessenkool wrote:
> > On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx 
> >> x, bool reg_ok_strict)
> >>  
> >>/* Handle unaligned altivec lvx/stvx type addresses.  */
> >>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> >> +  && mode !=  OOmode
> >> +  && mode !=  XOmode
> >>&& GET_CODE (x) == AND
> >>&& CONST_INT_P (XEXP (x, 1))
> >>&& INTVAL (XEXP (x, 1)) == -16)
> > 
> > Why do we need this for OOmode and XOmode here, but not for the other
> > modes that are equally not allowed?  That makes no sense.
> 
> VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) already filters those modes out
> (eg, SImode, DFmode, etc.), just not OOmode and XOmode, since those both
> are modes used in/with VSX registers.

It does not filter anything out, no.  That simply checks if a datum of
that mode can be loaded into vector registers or not.  For example
SImode could very well be loaded into vector registers!  (It just is not
such a great idea).

And for some reason there is VECTOR_P8_VECTOR as well, which is mixing
multiple concepts already.  Let's not add more, _please_.

> > Should you check for anything that is more than a register, for example?
> > If so, do *that*?
> 
> Well rs6000_legitimate_address_p() is only passed the MEM rtx, so we have
> no idea if this is a load or store, so we're clueless on number of regs
> needed to hold this mode.  The best we could do is something like

That is *bigger than* a register.  It's the same in Dutch, sorry, I am
tired :-(

>   GET_MODE_SIZE (mode) == GET_MODE_SIZE (V16QImode)
> 
> or some such thing.  Would you prefer something like that?

That is even worse :-(

> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
> >> @@ -0,0 +1,21 @@
> >> +/* PR target/110411 */
> >> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } 
> >> */
> > 
> > -S in testcases is wrong.  Why do you want this?  It is *good* if this
> > is hauled through the assembler as well!  If you *really* want this you
> > use "dg-do assemble", but you shouldn't.
> 
> For test cases checking for ICEs, we don't need to assemble, so I agree,
> we just need to remove the -S option, which is implied by this being a
> dg-do compile test case (the default for this test directory).

We *do* want to assemble.  It is a general principle that we want to
test as much as possible whenever possible.  *Most* problems are found
with the help of testcases that were never designed for the problem
found!

dg-do compile *does* invoke the assembler, btw.  As it should.


Segher


Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Peter Bergner via Gcc-patches
On 7/6/23 5:54 PM, Peter Bergner wrote:
> On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
>> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
>> @@ -0,0 +1,153 @@
>> +/* { dg-do run { target { powerpc*-*-* } } } */
> 
> powerpc*-*-* is the default for this test directory, so you can drop that,
> but you need to disable this test for soft-float systems, so you probably 
> want:
> 
>   /* { dg-do run { target powerpc_fprs } } */

We actually want something like powerpc_fprs_hw, but that doesn't exist.

Peter




Re: [PATCH v4 4/9] MIPS: Add bitwise instructions for mips16e2

2023-07-06 Thread Jan-Benedict Glaw
Hi!

On Mon, 2023-06-19 16:29:53 +0800, Jie Mei  wrote:
> There are shortened bitwise instructions in the mips16e2 ASE,
> for instance, ANDI, ORI/XORI, EXT, INS etc. .
> 
> This patch adds these instrutions with corresponding tests.

[...]

Starting with this patch, I see some new warning:

[all 2023-07-06 23:04:01] g++ -c   -g -O2   -DIN_GCC 
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -I. 
-Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include  \
[all 2023-07-06 23:04:01]  -o build/gencondmd.o build/gencondmd.cc
[all 2023-07-06 23:04:02] ../../gcc/gcc/config/mips/mips-msa.md:435:26: 
warning: 'and' of mutually exclusive equal-tests is always 0
[all 2023-07-06 23:04:02]   435 |   DONE;
[all 2023-07-06 23:04:02] ../../gcc/gcc/config/mips/mips-msa.md:435:26: 
warning: 'and' of mutually exclusive equal-tests is always 0
[all 2023-07-06 23:04:03] ../../gcc/gcc/config/mips/mips.md:822:1: warning: 
'and' of mutually exclusive equal-tests is always 0
[all 2023-07-06 23:04:03]   822 | ;; conditional-move-type condition is needed.
[all 2023-07-06 23:04:03]   | ^
[all 2023-07-06 23:04:03] g++   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
-Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  
-DGENERATOR_FILE -static-libstdc++ -static-libgcc  -o build/gencondmd \
[all 2023-07-06 23:04:03] build/gencondmd.o build/errors.o 
../build-x86_64-pc-linux-gnu/libiberty/libiberty.a
[all 2023-07-06 23:04:03] build/gencondmd > tmp-cond.md


(Full build log available as eg. 
http://toolchain.lug-owl.de/laminar/jobs/gcc-mips-linux/76)

Thanks, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-06 Thread Peter Bergner via Gcc-patches
On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
> rs6000, __builtin_set_fpscr_rn add retrun value

s/retrun/return/

Maybe better written as:

rs6000: Add return value to __builtin_set_fpscr_rn


> Change the return value from void to double.  The return value consists of
> the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
> overloaded version which accepts a double argument.

You're not adding an overloaded version anymore, so I think you can just
remove the last sentence.



> The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
> double reterun value and the new double argument.

s/reterun/return/   ...and there is no double argument anymore, so that
part can be removed.



>   * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
>   define_expand.

Too many '('.



>   (rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new

Looks like a  after Added instead of a space.


>   rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
>_expands.

Don't split define_expand across two lines.



>   * doc/extend.texi (__builtin_set_fpscr_rn): Update description for
>   the return value and new double argument.  Add descripton for
>   __SET_FPSCR_RN_RETURNS_FPSCR__ macro.

s/descripton/description/






> +  /* Tell the user the __builtin_set_fpscr_rn now returns the FPSCR fields
> + in a double.  Originally the builtin returned void.  */

Either:
  1) s/Tell the user the __builtin_set_fpscr_rn/Tell the user 
__builtin_set_fpscr_rn/ 
  2) s/the __builtin_set_fpscr_rn now/the __builtin_set_fpscr_rn built-in now/ 


> +  if ((flags & OPTION_MASK_SOFT_FLOAT) == 0)
> +  rs6000_define_or_undefine_macro (define_p, 
> "__SET_FPSCR_RN_RETURNS_FPSCR__");

This doesn't look like it's indented correctly.




> +(define_expand "rs6000_get_fpscr_fields"
> + [(match_operand:DF 0 "gpc_reg_operand")]
> +  "TARGET_HARD_FLOAT"
> +{
> +  /* Extract fields bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI,
> + RN) from the FPSCR and return them.  */
> +  rtx tmp_df = gen_reg_rtx (DFmode);
> +  rtx tmp_di = gen_reg_rtx (DImode);
> +
> +  emit_insn (gen_rs6000_mffs (tmp_df));
> +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x000700FFULL)));
> +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> +  emit_move_insn (operands[0], tmp_rtn);
> +  DONE;
> +})

This doesn't look correct.  You first set tmp_di to a new reg rtx but then
throw that away with the return value of simplify_gen_subreg().  I'm guessing
you want that tmp_di as a gen_reg_rtx for the destination of the gen_anddi3, so
you probably want a different rtx for the subreg that feeds the gen_anddi3.



> +(define_expand "rs6000_update_fpscr_rn_field"
> + [(match_operand:DI 0 "gpc_reg_operand")]
> +  "TARGET_HARD_FLOAT"
> +{
> +  /* Insert the new RN value from operands[0] into FPSCR bit [62:63].  */
> +  rtx tmp_di = gen_reg_rtx (DImode);
> +  rtx tmp_df = gen_reg_rtx (DFmode);
> +
> +  emit_insn (gen_rs6000_mffs (tmp_df));
> +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);

Ditto.




> +The @code{__builtin_set_fpscr_rn} builtin allows changing both of the 
> floating
> +point rounding mode bits and returning the various FPSCR fields before the RN
> +field is updated.  The builtin returns a double consisting of the initial 
> value
> +of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit positions with 
> all
> +other bits set to zero. The builtin argument is a 2-bit value for the new RN
> +field value.  The argument can either be an @code{const int} or stored in a
> +variable.  Earlier versions of @code{__builtin_set_fpscr_rn} returned void.  
> A
> +@code{__SET_FPSCR_RN_RETURNS_FPSCR__} macro has been added.  If defined, then
> +the @code{__builtin_set_fpscr_rn} builtin returns the FPSCR fields.  If not
> +defined, the @code{__builtin_set_fpscr_rn} does not return a vaule.  If the
> +@option{-msoft-float} option is used, the @code{__builtin_set_fpscr_rn} 
> builtin
> +will not return a value.

Multiple occurrences of "builtin" that should be spelled "built-in" (not in the
built-in function name itself though).



> +/* Originally the __builtin_set_fpscr_rn builtin was defined to return
> +   void.  It was later extended to return a double with the various
> +   FPSCR bits.  The extended builtin is inteded to be a drop in replacement
> +   for the original version.  This test is for the original version of the
> +   builtin and should work exactly as before.  */

Ditto.




> +++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_2.c
> @@ -0,0 +1,153 @@
> +/* { dg-do run { target { powerpc*-*-* } } } */

powerpc*-*-* is the default for this test directory, so you can drop that,
but you need to disable this test for soft-float systems, so you probably want:

  /* { dg-do run { target powerpc_fprs } } */

I know you didn't write it, but 

Re: PING^3 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-07-06 Thread Michael Meissner via Gcc-patches
I get the following warning which prevents gcc from bootstrapping due to
-Werror:

/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc: In 
function ‘void {anonymous}::process_chain_from_load(gimple*)’:
/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc:505:30:
 warning: zero-length gcc_dump_printf format string [-Wformat-zero-length]
  505 |   dump_printf (MSG_NOTE, "");
  |  ^~

I just commented out the dump_printf call.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c

2023-07-06 Thread Maciej W. Rozycki
The pr97428.c test assumes support for vectors of doubles, but some 
targets only support vectors of floats, causing this test to fail with 
such targets.  Limit this test to targets that support vectors of 
doubles then.

gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
---
 gcc/testsuite/gcc.dg/vect/pr97428.c |1 +
 1 file changed, 1 insertion(+)

gcc-test-pr97428-vect-double.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
===
--- gcc.orig/gcc/testsuite/gcc.dg/vect/pr97428.c
+++ gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
 
 typedef struct { double re, im; } dcmlx_t;
 typedef struct { double re[4], im[4]; } dcmlx4_t;


[PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-06 Thread Maciej W. Rozycki
The bb-slp-pr95839.c test assumes quad-single float vector support, but 
some targets only support pairs of floats, causing this test to fail 
with such targets.  Limit this test to targets that support at least 
128-bit vectors then, and add a complementing test that can be run with 
targets that have support for 64-bit vectors only.  There is no need to 
adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest 
vector of doubles, so support is implied by the presence of vectors of 
doubles.

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr95839.c: Limit to `vect128' targets.
* gcc.dg/vect/bb-slp-pr95839-v8.c: New test.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c |   14 ++
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c|1 +
 2 files changed, 15 insertions(+)

gcc-test-bb-slp-pr95839-vect128.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect64 } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef float __attribute__((vector_size(8))) v2f32;
+
+v2f32 f(v2f32 a, v2f32 b)
+{
+  /* Check that we vectorize this CTOR without any loads.  */
+  return (v2f32){a[0] + b[0], a[1] + b[1]};
+}
+
+/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
===
--- gcc.orig/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect128 } */
 /* { dg-additional-options "-w -Wno-psabi" } */
 
 typedef float __attribute__((vector_size(16))) v4f32;


[PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported

2023-07-06 Thread Maciej W. Rozycki
Similarly to checks for vectors of 32 bits and 64 bits being supported 
add one for vectors of 128 bits.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect128): New 
procedure.
---
 gcc/testsuite/lib/target-supports.exp |6 ++
 1 file changed, 6 insertions(+)

gcc-test-effective-target-vect128.diff
Index: gcc/gcc/testsuite/lib/target-supports.exp
===
--- gcc.orig/gcc/testsuite/lib/target-supports.exp
+++ gcc/gcc/testsuite/lib/target-supports.exp
@@ -8599,6 +8599,12 @@ proc check_effective_target_vect_variabl
 return [expr { [lindex [available_vector_sizes] 0] == 0 }]
 }
 
+# Return 1 if the target supports vectors of 128 bits.
+
+proc check_effective_target_vect128 { } {
+return [expr { [lsearch -exact [available_vector_sizes] 128] >= 0 }]
+}
+
 # Return 1 if the target supports vectors of 64 bits.
 
 proc check_effective_target_vect64 { } {


[PATCH 0/3] testsuite: Exclude vector tests for unsupported targets

2023-07-06 Thread Maciej W. Rozycki
Hi,

 In the course of verifying an out-of-tree RISC-V target that has a vendor
extension providing hardware support for vector operations on pairs of 
single floating-point values (similar to MIPS paired-single or Power SPE 
vector types) I have come across a couple of tests that fail just because 
they expect GCC to produce code this particular hardware does not support.  
Therefore I have created this small patch series, which marks the features 
required for the test cases to be relevant, which makes them unsupported 
for the hardware concerned.  For further details see individual change 
descriptions.

 This patch series has been verified with an `x86_64-linux-gnu' native 
configuration.  I could verify it with MIPS paired-single hw sometime, but 
I'm not currently set up for it and I think the changes are obvious enough 
regardless.

 OK to apply?  As testsuite fixes I think the changes also qualify for 
backporting to active release branches.

  Maciej


Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-06 Thread Martin Uecker via Gcc-patches
Am Donnerstag, dem 06.07.2023 um 18:56 + schrieb Qing Zhao:
> Hi, Kees,
> 
> I have updated my V1 patch with the following changes:
> A. changed the name to "counted_by"
> B. changed the argument from a string to an identifier
> C. updated the documentation and testing cases accordingly.
> 
> And then used this new gcc to test 
> https://github.com/kees/kernel-tools/blob/trunk/fortify/array-bounds.c (with 
> the following change)
> [opc@qinzhao-ol8u3-x86 Kees]$ !1091
> diff array-bounds.c array-bounds.c.org
> 32c32
> < # define __counted_by(member)   __attribute__((counted_by (member)))
> ---
> > # define __counted_by(member)   
> > __attribute__((__element_count__(#member)))
> 34c34
> < # define __counted_by(member)   __attribute__((counted_by (member)))
> ---
> > # define __counted_by(member)   /* 
> > __attribute__((__element_count__(#member))) */
> 
> Then I got the following result:
> [opc@qinzhao-ol8u3-x86 Kees]$ ./array-bounds 2>&1 | grep -v ^'#'
> TAP version 13
> 1..12
> ok 1 global.fixed_size_seen_by_bdos
> ok 2 global.fixed_size_enforced_by_sanitizer
> not ok 3 global.unknown_size_unknown_to_bdos
> not ok 4 global.unknown_size_ignored_by_sanitizer
> ok 5 global.alloc_size_seen_by_bdos
> ok 6 global.alloc_size_enforced_by_sanitizer
> not ok 7 global.element_count_seen_by_bdos
> ok 8 global.element_count_enforced_by_sanitizer
> not ok 9 global.alloc_size_with_smaller_element_count_seen_by_bdos
> not ok 10 global.alloc_size_with_smaller_element_count_enforced_by_sanitizer
> ok 11 global.alloc_size_with_bigger_element_count_seen_by_bdos
> ok 12 global.alloc_size_with_bigger_element_count_enforced_by_sanitizer
> 
> The same as your previous results. Then I took a look at all the failed 
> testing: 3, 4, 7, 9, and 10. And studied the reasons for all of them.
> 
>  in a summary, there are two major issues:
> 1.  The reason for the failed testing 7 is the same issue as I observed in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557
> Which is not a bug, it’s an expected behavior. 
> 
> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
> 
> for the following annotated structure: 
> 
> 
> struct annotated {
> unsigned long flags;
> size_t foo;
> int array[] __attribute__((counted_by (foo)));
> };
> 
> 
> struct annotated *p;
> int index = 16;
> 
> p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real size 
> 
> p->foo = index + 2;  // p->foo was set by a different value than the real 
> size of p->array as in test 9 and 10
> or
> p->foo was not set to any value as in test 3 and 4
> 
> 
> 
> i.e, the value of p->foo is NOT synced with the number of elements allocated 
> for the array p->array.  
> 
> I think that this should be considered as an user error, and the 
> documentation of the attribute should include
> this requirement.  (In the LLVM’s RFC, such requirement was included in the 
> programing model: 
> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)
> 
> We can add a new warning option -Wcounted-by to report such user error if 
> needed.
> 
> What’s your opinion on this?


Additionally, we could also have a sanitizer that
checks this at run-time.


Personally, I am still not very happy that in the
following example the two 'n's refer to different
entities:

void f(int n)
{
struct foo {
int n;   
int (*p[])[n] [[counted_by(n)]];
};
}

But I guess it will be difficult to convince everybody
that it would be wise to use a new syntax for
disambiguation:

void f(int n)
{
struct foo {
int n;   
int (*p[])[n] [[counted_by(.n)]];
};
}

Martin


> 
> thanks.
> 
> Qing
> 
> 
> > On May 26, 2023, at 4:40 PM, Kees Cook  wrote:
> > 
> > On Thu, May 25, 2023 at 04:14:47PM +, Qing Zhao wrote:
> > > GCC will pass the number of elements info from the attached attribute to 
> > > both 
> > > __builtin_dynamic_object_size and bounds sanitizer to check the 
> > > out-of-bounds
> > > or dynamic object size issues during runtime for flexible array members.
> > > 
> > > This new feature will provide nice protection to flexible array members 
> > > (which
> > > currently are completely ignored by both __builtin_dynamic_object_size and
> > > bounds sanitizers).
> > 
> > Testing went pretty well, though I think I found some bdos issues:
> > 
> > - some things that bdos can't know the size of, and correctly returned
> >  SIZE_MAX in the past, now thinks are 0-sized.
> > - while bdos correctly knows the size of an element_count-annotated
> >  flexible array, it doesn't know the size of the containing object
> >  (i.e. it returns SIZE_MAX).
> > 
> > Also, I think I found a precedence issue:
> > 
> > - if both __alloc_size and 'element_count' are in use, the _smallest_
> >  of the two is what I would expect to be enforced by the sanitizer
> >  and reported by __bdos. As is, alloc_size appears to be used when
> >  

Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches wrote:
> > +  (if (types_match (type, @1))
> > +   (bit_not (bit_and @1 (convert @0)))
> > +   (if (types_match (type, @0))
> > +(bit_not (bit_and (convert @1) @0))
> > +(convert (bit_not (bit_and @0 (convert @1)))
> 
> You can elide the types_match checks and instead always emit
> 
>   (convert (bit_not (bit_and @0 (convert @1)))
> 
> the conversions are elided when the types match.

If all types match, sure, any of the variants will be good.
But if say @1 matches type and doesn't match @0, then
(convert (bit_not (bit_and @0 (convert @1)))
will result in 2 conversions instead of just 1.
Of course, it could be alternatively solved by some other simplify
that would reduce the number of conversions.

Jakub



Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Peter Bergner via Gcc-patches
On 7/6/23 12:33 PM, Segher Boessenkool wrote:
> On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
>> bool reg_ok_strict)
>>  
>>/* Handle unaligned altivec lvx/stvx type addresses.  */
>>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
>> +  && mode !=  OOmode
>> +  && mode !=  XOmode
>>&& GET_CODE (x) == AND
>>&& CONST_INT_P (XEXP (x, 1))
>>&& INTVAL (XEXP (x, 1)) == -16)
> 
> Why do we need this for OOmode and XOmode here, but not for the other
> modes that are equally not allowed?  That makes no sense.

VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) already filters those modes out
(eg, SImode, DFmode, etc.), just not OOmode and XOmode, since those both
are modes used in/with VSX registers.



> Should you check for anything that is more than a register, for example?
> If so, do *that*?

Well rs6000_legitimate_address_p() is only passed the MEM rtx, so we have
no idea if this is a load or store, so we're clueless on number of regs
needed to hold this mode.  The best we could do is something like

  GET_MODE_SIZE (mode) == GET_MODE_SIZE (V16QImode)

or some such thing.  Would you prefer something like that?



>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
>> @@ -0,0 +1,21 @@
>> +/* PR target/110411 */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } */
> 
> -S in testcases is wrong.  Why do you want this?  It is *good* if this
> is hauled through the assembler as well!  If you *really* want this you
> use "dg-do assemble", but you shouldn't.

For test cases checking for ICEs, we don't need to assemble, so I agree,
we just need to remove the -S option, which is implied by this being a
dg-do compile test case (the default for this test directory).


Peter




Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-06 Thread Qing Zhao via Gcc-patches
Hi, Kees,

I have updated my V1 patch with the following changes:
A. changed the name to "counted_by"
B. changed the argument from a string to an identifier
C. updated the documentation and testing cases accordingly.

And then used this new gcc to test 
https://github.com/kees/kernel-tools/blob/trunk/fortify/array-bounds.c (with 
the following change)
[opc@qinzhao-ol8u3-x86 Kees]$ !1091
diff array-bounds.c array-bounds.c.org
32c32
< # define __counted_by(member) __attribute__((counted_by (member)))
---
> # define __counted_by(member) __attribute__((__element_count__(#member)))
34c34
< # define __counted_by(member)   __attribute__((counted_by (member)))
---
> # define __counted_by(member) /* __attribute__((__element_count__(#member))) 
> */

Then I got the following result:
[opc@qinzhao-ol8u3-x86 Kees]$ ./array-bounds 2>&1 | grep -v ^'#'
TAP version 13
1..12
ok 1 global.fixed_size_seen_by_bdos
ok 2 global.fixed_size_enforced_by_sanitizer
not ok 3 global.unknown_size_unknown_to_bdos
not ok 4 global.unknown_size_ignored_by_sanitizer
ok 5 global.alloc_size_seen_by_bdos
ok 6 global.alloc_size_enforced_by_sanitizer
not ok 7 global.element_count_seen_by_bdos
ok 8 global.element_count_enforced_by_sanitizer
not ok 9 global.alloc_size_with_smaller_element_count_seen_by_bdos
not ok 10 global.alloc_size_with_smaller_element_count_enforced_by_sanitizer
ok 11 global.alloc_size_with_bigger_element_count_seen_by_bdos
ok 12 global.alloc_size_with_bigger_element_count_enforced_by_sanitizer

The same as your previous results. Then I took a look at all the failed 
testing: 3, 4, 7, 9, and 10. And studied the reasons for all of them.

 in a summary, there are two major issues:
1.  The reason for the failed testing 7 is the same issue as I observed in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557
Which is not a bug, it’s an expected behavior. 

2. The common issue for  the failed testing 3, 4, 9, 10 is:

for the following annotated structure: 


struct annotated {
unsigned long flags;
size_t foo;
int array[] __attribute__((counted_by (foo)));
};


struct annotated *p;
int index = 16;

p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real size 

p->foo = index + 2;  // p->foo was set by a different value than the real size 
of p->array as in test 9 and 10
or
p->foo was not set to any value as in test 3 and 4



i.e, the value of p->foo is NOT synced with the number of elements allocated 
for the array p->array.  

I think that this should be considered as an user error, and the documentation 
of the attribute should include
this requirement.  (In the LLVM’s RFC, such requirement was included in the 
programing model: 
https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)

We can add a new warning option -Wcounted-by to report such user error if 
needed.

What’s your opinion on this?

thanks.

Qing


> On May 26, 2023, at 4:40 PM, Kees Cook  wrote:
> 
> On Thu, May 25, 2023 at 04:14:47PM +, Qing Zhao wrote:
>> GCC will pass the number of elements info from the attached attribute to 
>> both 
>> __builtin_dynamic_object_size and bounds sanitizer to check the out-of-bounds
>> or dynamic object size issues during runtime for flexible array members.
>> 
>> This new feature will provide nice protection to flexible array members 
>> (which
>> currently are completely ignored by both __builtin_dynamic_object_size and
>> bounds sanitizers).
> 
> Testing went pretty well, though I think I found some bdos issues:
> 
> - some things that bdos can't know the size of, and correctly returned
>  SIZE_MAX in the past, now thinks are 0-sized.
> - while bdos correctly knows the size of an element_count-annotated
>  flexible array, it doesn't know the size of the containing object
>  (i.e. it returns SIZE_MAX).
> 
> Also, I think I found a precedence issue:
> 
> - if both __alloc_size and 'element_count' are in use, the _smallest_
>  of the two is what I would expect to be enforced by the sanitizer
>  and reported by __bdos. As is, alloc_size appears to be used when
>  it is available, regardless of what 'element_count' shows.
> 
> I've updated my test cases to show it more clearly, but here is the
> before/after:
> 
> 
> GCC 13 (correctly does not implement "element_count"):
> 
> $ ./array-bounds 2>&1 | grep -v ^'#'
> TAP version 13
> 1..12
> ok 1 global.fixed_size_seen_by_bdos
> ok 2 global.fixed_size_enforced_by_sanitizer
> ok 3 global.unknown_size_unknown_to_bdos
> ok 4 global.unknown_size_ignored_by_sanitizer
> ok 5 global.alloc_size_seen_by_bdos
> ok 6 global.alloc_size_enforced_by_sanitizer
> not ok 7 global.element_count_seen_by_bdos
> not ok 8 global.element_count_enforced_by_sanitizer
> not ok 9 global.alloc_size_with_smaller_element_count_seen_by_bdos
> not ok 10 global.alloc_size_with_smaller_element_count_enforced_by_sanitizer
> ok 11 global.alloc_size_with_bigger_element_count_seen_by_bdos
> ok 12 

GGC: Remove 'const char *' 'gt_ggc_mx', 'gt_pch_nx' variants (was: [PATCH] support ggc hash_map and hash_set)

2023-07-06 Thread Thomas Schwinge
Hi!

On 2014-09-01T21:56:28-0400, tsaund...@mozilla.com wrote:
> [...] this part [...]

... became commit b086d5308de0d25444243f482f2f3d1dfd3a9a62
(Subversion r214834), which added GGC support to 'hash_map', 'hash_set',
and converted to those a number of 'htab' instances.

It doesn't really interfere with my ongoing work, but I have doubts about
two functions that were added here:

> --- a/gcc/ggc.h
> +++ b/gcc/ggc.h

> +static inline void
> +gt_ggc_mx (const char *s)
> +{
> +  ggc_test_and_set_mark (const_cast (s));
> +}
> +
> +static inline void
> +gt_pch_nx (const char *)
> +{
> +}

If (in current sources) I put '__builtin_abort' calls into these
functions, those don't trigger, so the functions are (currently) unused,
at least in my configuration.  Moreover, comparing these two to other
string-related 'gt_ggc_mx' functions in (nowadays) 'gcc/ggc-page.cc', and
string-related 'gt_pch_nx' functions in (nowadays) 'gcc/stringpool.cc'
(..., which already did exist back then in 2014), we find that this
'gt_ggc_mx' doesn't call 'gt_ggc_m_S', so doesn't get the special string
handling, and this 'gt_pch_nx' doesn't call 'gt_pch_n_S' and also doesn't
'gt_pch_note_object' manually, so I wonder how that ever worked?  So
maybe these two in fact never were used?  Should we dare to put in the
attached "GGC: Remove 'const char *' 'gt_ggc_mx', 'gt_pch_nx' variants"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a1341d0e75ab20ee9ba09a1a8428c9d3dd2fd54a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 6 Jul 2023 17:44:35 +0200
Subject: [PATCH] GGC: Remove 'const char *' 'gt_ggc_mx', 'gt_pch_nx' variants

Those were added in 2014 commit b086d5308de0d25444243f482f2f3d1dfd3a9a62
(Subversion r214834) "support ggc hash_map  and hash_set".

If (in current sources) I put '__builtin_abort' calls into these functions,
those don't trigger, so the functions are (currently) unused, at least in my
configuration.  Moreover, comparing these two to other string-related
'gt_ggc_mx' functions in (nowadays) 'gcc/ggc-page.cc', and string-related
'gt_pch_nx' functions in (nowadays) 'gcc/stringpool.cc' (..., which already did
exist back then in 2014), we find that this 'gt_ggc_mx' doesn't call
'gt_ggc_m_S', so doesn't get the special string handling, and this 'gt_pch_nx'
doesn't call 'gt_pch_n_S' and also doesn't 'gt_pch_note_object' manually, so I
wonder how that ever worked?  So maybe these two in fact never were used?

	gcc/
	* ggc.h (gt_ggc_mx (const char *s), gt_pch_nx (const char *)):
	Remove.
---
 gcc/ggc.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/ggc.h b/gcc/ggc.h
index 78eab7eaba6..1f3d665fc57 100644
--- a/gcc/ggc.h
+++ b/gcc/ggc.h
@@ -331,17 +331,6 @@ ggc_alloc_cleared_gimple_statement_stat (size_t s CXX_MEM_STAT_INFO)
   return (gimple *) ggc_internal_cleared_alloc (s PASS_MEM_STAT);
 }
 
-inline void
-gt_ggc_mx (const char *s)
-{
-  ggc_test_and_set_mark (const_cast (s));
-}
-
-inline void
-gt_pch_nx (const char *)
-{
-}
-
 inline void gt_pch_nx (bool) { }
 inline void gt_pch_nx (char) { }
 inline void gt_pch_nx (signed char) { }
-- 
2.34.1



Re: [PATCH] rs6000: Change GPR2 to volatile & non-fixed register for function that does not use TOC [PR110320]

2023-07-06 Thread Peter Bergner via Gcc-patches
On 6/28/23 3:07 AM, Kewen.Lin wrote:
> I think the reason why we need to check common_deferred_options is at this 
> time
> we can't distinguish the fixed_regs[2] is from the initialization or command 
> line
> user explicit specification.  But could we just update the FIXED_REGISTERS 
> without
> FIXED_R2 and set FIXED_R2 when it's needed in this function instead?  Then I'd
> expect that when we find fixed_regs[2] is set at the beginning of this 
> function, it
> would mean users specify it explicitly and then we don't need this option 
> checking?

Correct, rs6000_conditional_register_usage() is called after the handling of the
-ffixed-* options, so looking at fixed_regs[2] cannot tell us whether the user
used the -ffixed-r2 option or not if we initialize the FIXED_REGISTERS[2] slot
to 1.  I think we went this route for two reasons:

  1) We don't have to worry about anyone in the future adding more uses of
 FIXED_REGISTERS and needing to update the value depending on ABI, options,
 etc.
  2) The options in common_deferred_options are "rare" options, so the common
 case is that common_deferred_options will be NULL and we'll never drop
 into that section.

I believe the untested patch below should also work, without having to scan
the (uncommonly used) options.  Jeevitha, can you bootstrap and regtest the
patch below?



> Besides, IMHO we need a corresponding test case to cover this -ffixed-r2 
> handling.

Good idea.  I think we can duplicate the pr110320_2.c test case, replacing the
-mno-pcrel option with -ffixed-r2.  Jeevitha, can you give that a try?




>> +/* { dg-require-effective-target power10_ok } */
>> +/* { dg-require-effective-target powerpc_pcrel } */
> 
> Do we have some environment combination which supports powerpc_pcrel but not
> power10_ok?  I'd expect that only powerpc_pcrel is enough.

I think I agree testing for powerpc_pcrel should be enough.


Peter





diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d197c3f3289..7c356a73ac6 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10160,9 +10160,13 @@ rs6000_conditional_register_usage (void)
 for (i = 32; i < 64; i++)
   fixed_regs[i] = call_used_regs[i] = 1;
 
+  /* For non PC-relative code, GPR2 is unavailable for register allocation.  */
+  if (FIXED_R2 && !rs6000_pcrel_p ())
+fixed_regs[2] = 1;
+
   /* The TOC register is not killed across calls in a way that is
  visible to the compiler.  */
-  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
+  if (fixed_regs[2] && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2))
 call_used_regs[2] = 0;
 
   if (DEFAULT_ABI == ABI_V4 && flag_pic == 2)
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..2a24fbdf9fd 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -812,7 +812,7 @@ enum data_align { align_abi, align_opt, align_both };
 
 #define FIXED_REGISTERS  \
   {/* GPRs */ \
-   0, 1, FIXED_R2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
+   0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
/* FPRs */ \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \



Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> If a loop is unrolled by n times during vectoriation, two steps are used to
>> calculate the induction variable:
>>   - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>>   - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>>
>> This patch calculates an extra vec_n to replace vec_loop:
>>   vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>>
>> So that we can save the large step register and related operations.
>
> OK.  It would be nice to avoid the dead stmts created earlier though.

FWIW, I still don't think we should do this.  Part of the point of
unrolling is to shorten loop-carried dependencies, whereas this patch
is going in the opposite direction.

Richard

>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>> PR tree-optimization/110449
>> * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
>> vec_loop for the unrolled loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/pr110449.c: New testcase.
>> ---
>>  gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +
>>  gcc/tree-vect-loop.cc   | 21 +--
>>  2 files changed, 58 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>> new file mode 100644
>> index 000..bb3b6dcfe08
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c
>> @@ -0,0 +1,40 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param 
>> aarch64-vect-unroll-limit=2" } */
>> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */
>> +
>> +/* Calcualte the vectorized induction with smaller step for an unrolled 
>> loop.
>> +
>> +   before (suggested_unroll_factor=2):
>> + fmovs30, 8.0e+0
>> + fmovs31, 4.0e+0
>> + dup v27.4s, v30.s[0]
>> + dup v28.4s, v31.s[0]
>> + .L6:
>> + mov v30.16b, v31.16b
>> + faddv31.4s, v31.4s, v27.4s
>> + faddv29.4s, v30.4s, v28.4s
>> + stp q30, q29, [x0]
>> + add x0, x0, 32
>> + cmp x1, x0
>> + bne .L6
>> +
>> +   after:
>> + fmovs31, 4.0e+0
>> + dup v29.4s, v31.s[0]
>> + .L6:
>> + faddv30.4s, v31.4s, v29.4s
>> + stp q31, q30, [x0]
>> + add x0, x0, 32
>> + faddv31.4s, v29.4s, v30.4s
>> + cmp x0, x1
>> + bne .L6  */
>> +
>> +void
>> +foo2 (float *arr, float freq, float step)
>> +{
>> +  for (int i = 0; i < 1024; i++)
>> +{
>> +  arr[i] = freq;
>> +  freq += step;
>> +}
>> +}
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index 3b46c58a8d8..706ecbffd0c 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>new_vec, step_vectype, NULL);
>>
>>vec_def = induc_def;
>> -  for (i = 1; i < ncopies; i++)
>> +  for (i = 1; i < ncopies + 1; i++)
>> {
>>   /* vec_i = vec_prev + vec_step  */
>>   gimple_seq stmts = NULL;
>> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>>   vec_def = gimple_convert (, vectype, vec_def);
>>
>>   gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
>> - new_stmt = SSA_NAME_DEF_STMT (vec_def);
>> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> + if (i < ncopies)
>> +   {
>> + new_stmt = SSA_NAME_DEF_STMT (vec_def);
>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
>> +   }
>> + else
>> +   {
>> + /* vec_1 = vec_iv + (VF/n * S)
>> +vec_2 = vec_1 + (VF/n * S)
>> +...
>> +vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop
>> +
>> +vec_n is used as vec_loop to save the large step register 
>> and
>> +related operations.  */
>> + add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
>> +  UNKNOWN_LOCATION);
>> +   }
>> }
>>  }
>>
>> --
>> 2.34.1


Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Segher Boessenkool
Hi!

On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> while generating vector pairs of load & store instruction, the src address
> was treated as an altivec type and that type of address is invalid for 
> lxvp and stxvp insns. The solution for this is to avoid altivec type address
> for OOmode and XOmode.

The mail message you send should be what will end up in the Git commit
message.  Your lines are too long for that (and the subject is much too
long btw), and the content isn't right either.

Maybe something like

"""
rs6000: Don't allow OOmode or XOmode in AltiVec addresses (PR110411)

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects.  Don't allow those in rs6000_legitimate_address_p.
"""

> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Avoid altivec
>   address for OOmode and XOmde.

(XOmode, sp.)

Not "avoid", disallow.  If you avoid something you still allow it, you
just prefer to see something else.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
> bool reg_ok_strict)
>  
>/* Handle unaligned altivec lvx/stvx type addresses.  */
>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> +  && mode !=  OOmode
> +  && mode !=  XOmode
>&& GET_CODE (x) == AND
>&& CONST_INT_P (XEXP (x, 1))
>&& INTVAL (XEXP (x, 1)) == -16)

Why do we need this for OOmode and XOmode here, but not for the other
modes that are equally not allowed?  That makes no sense.

Should you check for anything that is more than a register, for example?
If so, do *that*?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
> @@ -0,0 +1,21 @@
> +/* PR target/110411 */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } */

-S in testcases is wrong.  Why do you want this?  It is *good* if this
is hauled through the assembler as well!  If you *really* want this you
use "dg-do assemble", but you shouldn't.


Segher


RE: [PATCH] arm: Fix MVE intrinsics support with LTO (PR target/110268)

2023-07-06 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, July 6, 2023 4:21 PM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: Re: [PATCH] arm: Fix MVE intrinsics support with LTO (PR
> target/110268)
> 
> 
> 
> On Wed, 5 Jul 2023 at 19:07, Kyrylo Tkachov   > wrote:
> 
> 
>   Hi Christophe,
> 
>   > -Original Message-
>   > From: Christophe Lyon   >
>   > Sent: Monday, June 26, 2023 4:03 PM
>   > To: gcc-patches@gcc.gnu.org  ;
> Kyrylo Tkachov   >;
>   > Richard Sandiford   >
>   > Cc: Christophe Lyon   >
>   > Subject: [PATCH] arm: Fix MVE intrinsics support with LTO (PR
> target/110268)
>   >
>   > After the recent MVE intrinsics re-implementation, LTO stopped
> working
>   > because the intrinsics would no longer be defined.
>   >
>   > The main part of the patch is simple and similar to what we do for
>   > AArch64:
>   > - call handle_arm_mve_h() from arm_init_mve_builtins to declare
> the
>   >   intrinsics when the compiler is in LTO mode
>   > - actually implement arm_builtin_decl for MVE.
>   >
>   > It was just a bit tricky to handle
> __ARM_MVE_PRESERVE_USER_NAMESPACE:
>   > its value in the user code cannot be guessed at LTO time, so we
> always
>   > have to assume that it was not defined.  The led to a few fixes in the
>   > way we register MVE builtins as placeholders or not.  Without this
>   > patch, we would just omit some versions of the inttrinsics when
>   > __ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for
> the C/C++
>   > placeholders, we need to always keep entries for all of them to
> ensure
>   > that we have a consistent numbering scheme.
>   >
>   >   2023-06-26  Christophe Lyon >
>   >
>   >   PR target/110268
>   >   gcc/
>   >   * config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle
> LTO.
>   >   (arm_builtin_decl): Hahndle MVE builtins.
>   >   * config/arm/arm-mve-builtins.cc (builtin_decl): New function.
>   >   (add_unique_function): Fix handling of
>   >   __ARM_MVE_PRESERVE_USER_NAMESPACE.
>   >   (add_overloaded_function): Likewise.
>   >   * config/arm/arm-protos.h (builtin_decl): New declaration.
>   >
>   >   gcc/testsuite/
>   >   * gcc.target/arm/pr110268-1.c: New test.
>   >   * gcc.target/arm/pr110268-2.c: New test.
>   > ---
>   >  gcc/config/arm/arm-builtins.cc| 11 +++-
>   >  gcc/config/arm/arm-mve-builtins.cc| 61 --
> -
>   >  gcc/config/arm/arm-protos.h   |  1 +
>   >  gcc/testsuite/gcc.target/arm/pr110268-1.c | 11 
>   >  gcc/testsuite/gcc.target/arm/pr110268-2.c | 22 
>   >  5 files changed, 76 insertions(+), 30 deletions(-)
>   >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-1.c
>   >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-2.c
>   >
>   > diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-
> builtins.cc
>   > index 36365e40a5b..fca7dcaf565 100644
>   > --- a/gcc/config/arm/arm-builtins.cc
>   > +++ b/gcc/config/arm/arm-builtins.cc
>   > @@ -1918,6 +1918,15 @@ arm_init_mve_builtins (void)
>   >arm_builtin_datum *d = _builtin_data[i];
>   >arm_init_builtin (fcode, d, "__builtin_mve");
>   >  }
>   > +
>   > +  if (in_lto_p)
>   > +{
>   > +  arm_mve::handle_arm_mve_types_h ();
>   > +  /* Under LTO, we cannot know whether
>   > +  __ARM_MVE_PRESERVE_USER_NAMESPACE was defined, so
> assume
>   > it
>   > +  was not.  */
>   > +  arm_mve::handle_arm_mve_h (false);
>   > +}
>   >  }
>   >
>   >  /* Set up all the NEON builtins, even builtins for instructions that
> are not
>   > @@ -2723,7 +2732,7 @@ arm_builtin_decl (unsigned code, bool
> initialize_p
>   > ATTRIBUTE_UNUSED)
>   >  case ARM_BUILTIN_GENERAL:
>   >return arm_general_builtin_decl (subcode);
>   >  case ARM_BUILTIN_MVE:
>   > -  return error_mark_node;
>   > +  return arm_mve::builtin_decl (subcode);
>   >  default:
>   >gcc_unreachable ();
>   >  }
>   > diff --git a/gcc/config/arm/arm-mve-builtins.cc
> b/gcc/config/arm/arm-mve-
>   > builtins.cc
>   > index 7033e41a571..e9a12f27411 100644
>   > --- a/gcc/config/arm/arm-mve-builtins.cc
>   > +++ b/gcc/config/arm/arm-mve-builtins.cc
>   > @@ -493,6 +493,16 @@ handle_arm_mve_h (bool
>   > preserve_user_namespace)
>   >  

[PATCH] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-06 Thread Xi Ruoyao via Gcc-patches
If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

int x : 8;
long y : 55;
bool z : 1;

The vectorized extraction of y was:

vect__ifc__49.29_110 =
  MEM  [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
  vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
  VIEW_CONVERT_EXPR(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

vect__ifc__25.16_62 =
  MEM  [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
  VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.
---

Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk and gcc-13
branch?

 gcc/testsuite/g++.dg/vect/pr110557.cc | 37 +
 gcc/tree-vect-patterns.cc | 58 ---
 2 files changed, 81 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr110557.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr110557.cc 
b/gcc/testsuite/g++.dg/vect/pr110557.cc
new file mode 100644
index 000..e1fbe1caac4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr110557.cc
@@ -0,0 +1,37 @@
+// { dg-additional-options "-mavx" { target { avx_runtime } } }
+
+static inline long
+min (long a, long b)
+{
+  return a < b ? a : b;
+}
+
+struct Item
+{
+  int x : 8;
+  long y : 55;
+  bool z : 1;
+};
+
+__attribute__ ((noipa)) long
+test (Item *a, int cnt)
+{
+  long size = 0;
+  for (int i = 0; i < cnt; i++)
+size = min ((long)a[i].y, size);
+  return size;
+}
+
+int
+main ()
+{
+  struct Item items[] = {
+{ 1, -1 },
+{ 2, -2 },
+{ 3, -3 },
+{ 4, -4 },
+  };
+
+  if (test (items, 4) != -4)
+__builtin_trap ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1bc36b043a0..20412c27ead 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2566,7 +2566,7 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
Widening with mask first, shift later:
container = (type_out) container;
masked = container & (((1 << bitsize) - 1) << bitpos);
-   result = patt2 >> masked;
+   result = masked >> bitpos;
 
Widening with shift first, mask last:
container = (type_out) container;
@@ -2578,6 +2578,15 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
result = masked >> bitpos;
result = (type_out) result;
 
+   If the bitfield is signed and it's wider than type_out, we need to
+   keep the result sign-extended:
+   container = (type) container;
+   masked = container << (prec - bitsize - bitpos);
+   result = (type_out) (masked >> (prec - bitsize));
+
+   Here type is the signed variant of the wider of type_out and the type
+   of container.
+
The shifting is always optional depending on whether bitpos != 0.
 
 */
@@ -2636,14 +2645,22 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (BYTES_BIG_ENDIAN)
 shift_n = prec - shift_n - mask_width;
 
+  bool sign_ext = (!TYPE_UNSIGNED (TREE_TYPE (bf_ref)) &&
+  TYPE_PRECISION (ret_type) > mask_width);
+  bool widening = ((TYPE_PRECISION (TREE_TYPE (container)) <
+   TYPE_PRECISION (ret_type))
+  && !useless_type_conversion_p (TREE_TYPE (container),
+ ret_type));
+
   /* We move the conversion earlier if the loaded type is smaller than the
  return type to enable the use of widening loads.  */
-  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
-  && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+  if (sign_ext || widening)
 {
-  pattern_stmt
-   = gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
-  NOP_EXPR, container);
+  tree type = widening ? ret_type : container_type;
+  if (sign_ext)
+   type = gimple_signed_type (type);
+  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (type),
+ NOP_EXPR, container);
   container = gimple_get_lhs (pattern_stmt);
   container_type = TREE_TYPE (container);
   prec = tree_to_uhwi (TYPE_SIZE (container_type));
@@ -2671,7 +2688,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
 shift_first = true;
 
   tree result;
-  if (shift_first)
+  if (shift_first && !sign_ext)
 {
   tree shifted = container;
   if (shift_n)
@@ -2694,14 +2711,27 @@ 

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Jeff Law via Gcc-patches




On 7/6/23 06:44, Richard Biener via Gcc-patches wrote:

On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
 wrote:


Hi,

If a loop is unrolled by n times during vectoriation, two steps are used to
calculate the induction variable:
   - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
   - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)

This patch calculates an extra vec_n to replace vec_loop:
   vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.

So that we can save the large step register and related operations.


OK.  It would be nice to avoid the dead stmts created earlier though.

Thanks,
Richard.


gcc/ChangeLog:

 PR tree-optimization/110449
 * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
 vec_loop for the unrolled loop.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/pr110449.c: New testcase.
I didn't see Hao Liu in the MAINTAINERS file, so probably doesn't have 
write access.  Therefore I went ahead and pushed this for Hao.


jeff


[PATCH] [og13] OpenMP: Expand "declare mapper" mappers for target {enter, exit, } data directives

2023-07-06 Thread Julian Brown
This patch allows 'declare mapper' mappers to be used on 'omp target
data', 'omp target enter data' and 'omp target exit data' directives.
For each of these, only explicit mappings are supported, unlike for
'omp target' directives where implicit uses of variables inside an
offload region might trigger mappers also.

Each of C, C++ and Fortran are supported.

The patch also adjusts 'map kind decay' to match OpenMP 5.2 semantics,
which is particularly important with regard to 'exit data' operations.

Tested with offloading to AMD GCN.  I will apply (to the og13 branch)
shortly.

2023-07-06  Julian Brown  

gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA.
(c_omp_instantiate_mappers): Add region type parameter.
* c-omp.cc (omp_split_map_kind, omp_join_map_kind,
omp_map_decayed_kind): New functions.
(omp_instantiate_mapper): Add ORT parameter.  Implement map kind decay
for instantiated mapper clauses.
(c_omp_instantiate_mappers): Add ORT parameter, pass to
omp_instantiate_mapper.

gcc/c/
* c-parser.cc (c_parser_omp_target_data): Instantiate mappers for
'omp target data'.
(c_parser_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(c_parser_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
(c_parser_omp_target): Add c_omp_region_type argument to
c_omp_instantiate_mappers call.
* c-tree.h (c_omp_instantiate_mappers): Remove spurious prototype.

gcc/cp/
* parser.cc (cp_parser_omp_target_data): Instantiate mappers for 'omp
target data'.
(cp_parser_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(cp_parser_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.
(cp_parser_omp_target): Add c_omp_region_type argument to
c_omp_instantiate_mappers call.
* pt.cc (tsubst_omp_clauses): Instantiate mappers for OMP regions other
than just C_ORT_OMP_TARGET.
(tsubst_expr): Update call to tsubst_omp_clauses for OMP_TARGET_UPDATE,
OMP_TARGET_ENTER_DATA, OMP_TARGET_EXIT_DATA stanza.
* semantics.cc (cxx_omp_map_array_section): Avoid calling
build_array_ref for non-array/non-pointer bases (error reported
already).

gcc/fortran/
* trans-openmp.cc (omp_split_map_op, omp_join_map_op,
omp_map_decayed_kind): New functions.
(gfc_trans_omp_instantiate_mapper): Add CD parameter.  Implement map
kind decay.
(gfc_trans_omp_instantiate_mappers): Add CD parameter.  Pass to above
function.
(gfc_trans_omp_target_data): Instantiate mappers for 'omp target data'.
(gfc_trans_omp_target_enter_data): Instantiate mappers for 'omp target
enter data'.
(gfc_trans_omp_target_exit_data): Instantiate mappers for 'omp target
exit data'.

gcc/testsuite/
* c-c++-common/gomp/declare-mapper-15.c: New test.
* c-c++-common/gomp/declare-mapper-16.c: New test.
* g++.dg/gomp/declare-mapper-1.C: Adjust expected scan output.
* gfortran.dg/gomp/declare-mapper-22.f90: New test.
* gfortran.dg/gomp/declare-mapper-23.f90: New test.
---
 gcc/c-family/c-common.h   |   4 +-
 gcc/c-family/c-omp.cc | 193 +++-
 gcc/c/c-parser.cc |  14 +-
 gcc/c/c-tree.h|   1 -
 gcc/cp/parser.cc  |  19 +-
 gcc/cp/pt.cc  |   8 +-
 gcc/cp/semantics.cc   |   5 +-
 gcc/fortran/trans-openmp.cc   | 209 --
 .../c-c++-common/gomp/declare-mapper-15.c |  59 +
 .../c-c++-common/gomp/declare-mapper-16.c |  39 
 gcc/testsuite/g++.dg/gomp/declare-mapper-1.C  |   2 +-
 .../gfortran.dg/gomp/declare-mapper-22.f90|  60 +
 .../gfortran.dg/gomp/declare-mapper-23.f90|  25 +++
 13 files changed, 600 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-15.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-mapper-16.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-22.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-mapper-23.f90

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ea6c479cd62..c805c8b2f7e 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1270,8 +1270,10 @@ enum c_omp_region_type
   C_ORT_ACC= 1 << 1,
   C_ORT_DECLARE_SIMD   = 1 << 2,
   C_ORT_TARGET = 1 << 3,
+  C_ORT_EXIT_DATA  = 1 << 4,
   C_ORT_OMP_DECLARE_SIMD   = C_ORT_OMP | C_ORT_DECLARE_SIMD,
   C_ORT_OMP_TARGET = C_ORT_OMP | C_ORT_TARGET,
+  C_ORT_OMP_EXIT_DATA  = C_ORT_OMP | 

Re: [PATCH] Break false dependence for vpternlog by inserting vpxor.

2023-07-06 Thread simonaytes.yan--- via Gcc-patches

+; False dependency happens on destination register which is not really
+; used when moving all ones to vector register
+(define_split
+  [(set (match_operand:VMOVE 0 "register_operand")
+   (match_operand:VMOVE 1 "int_float_vector_all_ones_operand"))]
+  "TARGET_AVX512F && reload_completed
+  && ( == 64 || EXT_REX_SSE_REG_P (operands[0]))"
+  [(set (match_dup 0) (match_dup 2))
+   (parallel
+ [(set (match_dup 0) (match_dup 1))
+  (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
+  "operands[2] = CONST0_RTX (mode);")


I think we shouldnt emit PXOR when optimizing for size. So should change 
define_split:

define_split
  [(set (match_operand:VMOVE 0 "register_operand")
(match_operand:VMOVE 1 "int_float_vector_all_ones_operand"))]
  "TARGET_AVX512F && reload_completed
  && ( == 64 || EXT_REX_SSE_REG_P (operands[0]))
  && optimize_insn_for_speed_p ()"
  [(set (match_dup 0) (match_dup 2))
   (parallel
 [(set (match_dup 0) (match_dup 1))
  (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
  "operands[2] = CONST0_RTX (mode);")


[committed] libstdc++: Document --enable-cstdio=stdio_pure [PR110574]

2023-07-06 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk. Backports to 11, 12 and 13 will follow.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/110574
* doc/xml/manual/configure.xml: Describe stdio_pure argument to
--enable-cstdio.
* doc/html/manual/configure.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/configure.html | 11 ---
 libstdc++-v3/doc/xml/manual/configure.xml   | 11 ---
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/configure.xml 
b/libstdc++-v3/doc/xml/manual/configure.xml
index 7ff07aea886..1b8c37ce2a9 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -74,9 +74,14 @@
  
 
  --enable-cstdio=OPTION
- Select a target-specific I/O package. At the moment, the only
-   choice is to use 'stdio', a generic "C" abstraction.
-   The default is 'stdio'. This option can change the library ABI.
+ Select a target-specific I/O package. The choices are 'stdio'
+   which is a generic abstraction using POSIX file I/O APIs
+   (read, write,
+   lseek, etc.), and 'stdio_pure' which is similar
+   but only uses standard C file I/O APIs (fread,
+   fwrite, fseek, etc.).
+   The 'stdio_posix' choice is a synonym for 'stdio'.
+   The default is 'stdio'. This option can change the library ABI.
  
  
 
-- 
2.41.0



[PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches
GCC maintainers:

Ver 4. Fixed a few typos.  Redid the tests to create separate run and
compile tests.

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl



-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the builtin argument
types and the functional correctness of each builtin.  There is also a
compile only test that verifies the builtins generate the expected number
of instructions for the various builtin tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-compile.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op-compile.c   |  22 ++
 .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
 .../powerpc/vsx-vector-6-func-1op.h   |  43 
 .../powerpc/vsx-vector-6-func-2lop-compile.c  |  14 ++
 .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
 .../powerpc/vsx-vector-6-func-2lop.h  |  47 
 .../powerpc/vsx-vector-6-func-2op-compile.c   |  21 ++
 .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
 .../powerpc/vsx-vector-6-func-2op.h   |  42 
 .../powerpc/vsx-vector-6-func-3op-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
 .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
 .../vsx-vector-6-func-cmp-all-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
 .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
 .../powerpc/vsx-vector-6-func-cmp-compile.c   |  16 ++
 .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
 .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
 .../gcc.target/powerpc/vsx-vector-6.h | 154 
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 
 22 files changed, 1267 insertions(+), 282 deletions(-)
 create mode 100644 

Re: [PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches
Kewen:

On Tue, 2023-07-04 at 10:49 +0800, Kewen.Lin wrote:
> 



> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> 
> s/seriers/series/

Fixed

> 
> > new tests are runnable tests to verify the builtin argument types
> > and the
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> 
> Missing "func-" in the names ...

Fixed.

> 
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> 
> should be vsx-vector-6-p{7,8,9}.c, "git gcc-verify" should catch
> these.

Fixed, ran git gcc-verify which found a couple more little file name
typos.
> 
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 217
> > +++
> >  .../powerpc/vsx-vector-6-func-2op.c   | 133 +
> >  .../powerpc/vsx-vector-6-func-3op.c   | 257
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
> >  10 files changed, 1080 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p7.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p8.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p9.c
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > new file mode 100644
> > index 000..52c7ae3e983
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > @@ -0,0 +1,141 @@
> > +/* { dg-do run { target lp64 } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-options "-O2 -save-temps" } */
> 
> I just noticed that we missed an effective target check here to
> ensure the
> support of those bifs during the test run, and since it's a runnable
> test
> case, also need to ensure the generated hw insn supported, it's
> "vsx_hw"
> like:
> 
> /* { dg-require-effective-target vsx_hw } */
> 
> And adding "-mvsx" to the dg-options.

Add the effective-target and -mvsx to all of the tests.

> 
> This is also applied for the other test cases.
> 
> But as the discussion on xxlor and the different effective target
> requirements
> on compilation part and run part, I think we can separate each of
> these cases into
> two files, one for compilation and the other for run, for example,
> for this
> case, update FLOAT_TEST by adding one more global variable like
> 
> #define FLOAT_TEST(NAME)
>   vector float f_##NAME##_result; \
>   void ... \
>   f_##NAME##_result = vec_##NAME(f_src);\
>   }
>   // moving the checking code to its main.
> 
> move #include , FLOAT_TEST(NAME), DOUBLE_TEST(NAME)
> defines
> and their uses into vsx-vector-6-func-1op.h.
> 
> 
> **For compilation file vsx-vector-6-func-1op.c**:
> 
> Include this header file into vsx-vector-6-func-1op.c, which has the
> 
> /* { dg-do compile { target lp64 } } */
> /* { dg-require-effective-target powerpc_vsx_ok } */
> /* { dg-options "-O2 -mvsx" } */
> 
> #include "vsx-vector-6-func-1op.h"
> 
> Then put the expected insn check here, like 
> 
> /* { dg-final { scan-assembler-times {\mxvabssp\M} 1 } } */
> ...
> 
> By organizing it like this, these scan-assembler-times would only
> focus on what
> are generated for bifs (excluding possible noises from main function
> for running).
> 
> 
> **For runnable file 

Re: [PATCH v5] RISC-V: Fix one bug for floating-point static frm

2023-07-06 Thread Robin Dapp via Gcc-patches
Hi Pan,

thanks,  I think that works for me as I'm expecting these
parts to change a bit anyway in the near future.

There is no functional change to the last revision that
Kito already OK'ed so I think you can go ahead.

Regards
 Robin


Re: [PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-07-06 Thread Jeff Law via Gcc-patches




On 7/6/23 00:48, Christoph Müllner wrote:



Thanks for this!
Of course I was "lucky" and ran into the issue that the patterns did not match,
because of unexpected MULT insns where ASHIFTs were expected.
But after reading enough of combiner.cc I understood that this is on purpose
(for addresses) and I have to adjust my INSNs accordingly.
Yea, it's a wart that the same operation has two different canonical 
forms depending on the context where it shows up :(  It's definitely a wart.




I've changed the patches for XTheadMemIdx and XTheadFMemIdx and will
send out a new series.

Sounds good.

Jeff


Re: [PATCH][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-07-06 Thread Iain Sandoe
Hi Alex,

> On 6 Jul 2023, at 15:01, Alex Coplan  wrote:
> 
> On 20/06/2023 15:08, Iain Sandoe wrote:

>> again, thanks for working on this and for fixing the SDK blocker.
>> 
>>> On 20 Jun 2023, at 13:30, Alex Coplan  wrote:
>>> 
>> 
>>> The patch can now survive bootstrap on Darwin (it looks like we'll need
>>> to adjust some Objective-C++ tests in light of the new pedwarn, but that
>>> looks to be straightforward).
>> 
>> Yes, I’ll deal with that soon (I was trying to decide whether to fix the the
>> header we have copied from GNUStep, or whether to mark it as a system
>> header).
>> 
 (one reason to allow target opt-in/out of specific features)
 
> with the following omissions:
 
> - Objective-C-specific features.
 
 I can clearly append the objective-c(++) cases to the end of the respective
 lists, but then we need to make them conditional on language, version and
 dialect (some will not be appropriate to GNU runtime).
 
 this is why I think we need more flexible predicates on declaring features
 and extensions.
>>> 
>>> Would it help mitigate these concerns if I implemented some Objective-C
>>> features as part of this patch (say, those implemented by your WIP
>>> patch)?
>>> 
>>> My feeling is that the vast majority of extensions / features have
>>> similar logic, so we should exploit that redundancy to keep things terse
>>> in the encoding for the general case. Where we need more flexible
>>> predicates (e.g. for objc_nonfragile_abi in your WIP patch), those can
>>> be handled on a case-by-case basis by adding a new enumerator and logic
>>> to handle that specially.
>>> 
>>> What do you think, does that sound OK to you?
>> 
>> Sketching out what you have in mind using one or two examples would be
>> helpful.  Again, the fact that some of the answers are target-dependent, is
>> what makes me think of needing a little more generality.
> 
> FWIW I've implemented some Objective-C features (those from your WIP patch)
> in a v2 patch here:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623057.html
> 
> I also tweaked the design to be closer to your patch in that we now have a 
> hash
> table which allows for registering features dynamically. Hopefully it's clear
> that it should be easier to handle target-specific features in that version.
> 
> Any thoughts on the new version?

Yes, I’ve tried it (together with some of my pending patches) on a few systems 
and
it LGTM - agreed we can probably implement a target hook if/when that becomes
necessary to register target-specific cases.

The Objective-C parts are OK (when the rest is approved)

thanks again for working on this.
Iain

> 
> Thanks,
> Alex
> 
>> 
 What about things like this:
 
 attribute_availability_tvos, 
 attribute_availability_watchos, 
 attribute_availability_driverkit, 
>>> 
>>> FWIW, clang looks to define these unconditionally, so restricting these
>>> to a given target would be deviating from its precedent.
>> 
>> Hmm.. i did not check that although (for the sake of keeping target-specific
>> code localised) my current availabilty attribute implementation is Darwin-
>> specific.
>> 
>> Having said that, interoperability with clang is also a very useful goal - 
>> for
>> Darwin, the SDK headers have only been (fully) tested with clang up to
>> now and I am sure we will find more gotchas as we expand what we can
>> parse.
>> 
>>> However, I don't think it would be hard to extend the implementation in
>>> this patch to support target-specific features if required. I think
>>> perhaps a langhook that targets can call to add their own features would
>>> be a reasonable approach.
>> 
>> Indeed, that could work if the result is needed later than pre-processing.
>> 
>> In my patch, IIRC, I added another entry to the libcpp callbacks to handle
>> target-specific __has_ queries.
>> 
>> cheers
>> Iain



Re: [PATCH] arm: Fix MVE intrinsics support with LTO (PR target/110268)

2023-07-06 Thread Christophe Lyon via Gcc-patches
On Wed, 5 Jul 2023 at 19:07, Kyrylo Tkachov  wrote:

> Hi Christophe,
>
> > -Original Message-
> > From: Christophe Lyon 
> > Sent: Monday, June 26, 2023 4:03 PM
> > To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> > Richard Sandiford 
> > Cc: Christophe Lyon 
> > Subject: [PATCH] arm: Fix MVE intrinsics support with LTO (PR
> target/110268)
> >
> > After the recent MVE intrinsics re-implementation, LTO stopped working
> > because the intrinsics would no longer be defined.
> >
> > The main part of the patch is simple and similar to what we do for
> > AArch64:
> > - call handle_arm_mve_h() from arm_init_mve_builtins to declare the
> >   intrinsics when the compiler is in LTO mode
> > - actually implement arm_builtin_decl for MVE.
> >
> > It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
> > its value in the user code cannot be guessed at LTO time, so we always
> > have to assume that it was not defined.  The led to a few fixes in the
> > way we register MVE builtins as placeholders or not.  Without this
> > patch, we would just omit some versions of the inttrinsics when
> > __ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
> > placeholders, we need to always keep entries for all of them to ensure
> > that we have a consistent numbering scheme.
> >
> >   2023-06-26  Christophe Lyon   
> >
> >   PR target/110268
> >   gcc/
> >   * config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
> >   (arm_builtin_decl): Hahndle MVE builtins.
> >   * config/arm/arm-mve-builtins.cc (builtin_decl): New function.
> >   (add_unique_function): Fix handling of
> >   __ARM_MVE_PRESERVE_USER_NAMESPACE.
> >   (add_overloaded_function): Likewise.
> >   * config/arm/arm-protos.h (builtin_decl): New declaration.
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/pr110268-1.c: New test.
> >   * gcc.target/arm/pr110268-2.c: New test.
> > ---
> >  gcc/config/arm/arm-builtins.cc| 11 +++-
> >  gcc/config/arm/arm-mve-builtins.cc| 61 ---
> >  gcc/config/arm/arm-protos.h   |  1 +
> >  gcc/testsuite/gcc.target/arm/pr110268-1.c | 11 
> >  gcc/testsuite/gcc.target/arm/pr110268-2.c | 22 
> >  5 files changed, 76 insertions(+), 30 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr110268-2.c
> >
> > diff --git a/gcc/config/arm/arm-builtins.cc
> b/gcc/config/arm/arm-builtins.cc
> > index 36365e40a5b..fca7dcaf565 100644
> > --- a/gcc/config/arm/arm-builtins.cc
> > +++ b/gcc/config/arm/arm-builtins.cc
> > @@ -1918,6 +1918,15 @@ arm_init_mve_builtins (void)
> >arm_builtin_datum *d = _builtin_data[i];
> >arm_init_builtin (fcode, d, "__builtin_mve");
> >  }
> > +
> > +  if (in_lto_p)
> > +{
> > +  arm_mve::handle_arm_mve_types_h ();
> > +  /* Under LTO, we cannot know whether
> > +  __ARM_MVE_PRESERVE_USER_NAMESPACE was defined, so assume
> > it
> > +  was not.  */
> > +  arm_mve::handle_arm_mve_h (false);
> > +}
> >  }
> >
> >  /* Set up all the NEON builtins, even builtins for instructions that
> are not
> > @@ -2723,7 +2732,7 @@ arm_builtin_decl (unsigned code, bool initialize_p
> > ATTRIBUTE_UNUSED)
> >  case ARM_BUILTIN_GENERAL:
> >return arm_general_builtin_decl (subcode);
> >  case ARM_BUILTIN_MVE:
> > -  return error_mark_node;
> > +  return arm_mve::builtin_decl (subcode);
> >  default:
> >gcc_unreachable ();
> >  }
> > diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> > builtins.cc
> > index 7033e41a571..e9a12f27411 100644
> > --- a/gcc/config/arm/arm-mve-builtins.cc
> > +++ b/gcc/config/arm/arm-mve-builtins.cc
> > @@ -493,6 +493,16 @@ handle_arm_mve_h (bool
> > preserve_user_namespace)
> >preserve_user_namespace);
> >  }
> >
> > +/* Return the function decl with SVE function subcode CODE, or
> > error_mark_node
> > +   if no such function exists.  */
> > +tree
> > +builtin_decl (unsigned int code)
> > +{
> > +  if (code >= vec_safe_length (registered_functions))
> > +return error_mark_node;
> > +  return (*registered_functions)[code]->decl;
> > +}
> > +
> >  /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
> > purposes.  */
> >  static bool
> > @@ -849,7 +859,6 @@ function_builder::add_function (const
> > function_instance ,
> >  ? integer_zero_node
> >  : simulate_builtin_function_decl (input_location, name, fntype,
> > code, NULL, attrs);
> > -
> >registered_function  = *ggc_alloc  ();
> >rfn.instance = instance;
> >rfn.decl = decl;
> > @@ -889,15 +898,12 @@ function_builder::add_unique_function (const
> > function_instance ,
> >gcc_assert (!*rfn_slot);
> >*rfn_slot = 
> >
> > -  /* Also add the non-prefixed non-overloaded function, if the user
> > 

Fix profile update after loop-ch and cunroll

2023-07-06 Thread Jan Hubicka via Gcc-patches
Hi,
this patch makes loop-ch and loop unrolling to fix profile in case the loop is
known to not iterate at all (or iterate few times) while profile claims it
iterates more.  While this is kind of symptomatic fix, it is best we can do
incase profile was originally esitmated incorrectly.

In the testcase the problematic loop is produced by vectorizer and I think
vectorizer should know and account into its costs that vectorizer loop and/or
epilogue is not going to loop after the transformation.  So it would be nice
to fix it on that side, too.

The patch avoids about half of profile mismatches caused by cunroll.

Pass dump id and name|static mismatcdynamic mismatch
 |in count |in count
107t cunrolli|  3+3|17251   +17251
115t threadfull  |  3  |14376-2875
116t vrp |  5+2|30908   +16532
117t dse |  5  |30908
118t dce |  3-2|17251   -13657
127t ch  | 13   +10|17251
131t dom | 39   +26|17251
133t isolate-paths   | 47+8|17251
134t reassoc | 49+2|17251
136t forwprop| 53+4|   202501  +185250
159t cddce   | 61+8|   216211   +13710
161t ldist   | 62+1|   216211
172t ifcvt   | 66+4|   373711  +157500
173t vect|143   +77|  9802097 +9428386
176t cunroll |221   +78| 15639591 +5837494
183t loopdone|218-3| 15577640   -61951
195t fre |214-4| 15577640
197t dom |213-1| 16671606 +1093966
199t threadfull  |215+2| 16879581  +207975
200t vrp |217+2| 17077750  +198169
204t dce |215-2| 17004486   -73264
206t sink|213-2| 17004486
211t cddce   |219+6| 17005926+1440
255t optimized   |217-2| 17005926
256r expand  |210-7| 19571573 +2565647
258r into_cfglayout  |208-2| 19571573
275r loop2_unroll|212+4| 22992432 +3420859
291r ce2 |210-2| 23011838
312r pro_and_epilogue|230   +20| 23073776   +61938
315r jump2   |236+6| 27110534 +4036758
323r bbro|229-7| 21826835 -5283699


W/o the patch cunroll does:

176t cunroll |294  +151|126548439   +116746342

and we end up with 291 mismatches at bbro.

Bootstrapped/regtested x86_64-linux. Plan to commit it after the 
scale_loop_frequency patch.

gcc/ChangeLog:

PR middle-end/25623
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Scale loop frequency to 
maximal number
of iterations determined.
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Likewise.

gcc/testsuite/ChangeLog:

PR middle-end/25623
* gfortran.dg/pr25623-2.f90: New test.

diff --git a/gcc/testsuite/gfortran.dg/pr25623-2.f90 
b/gcc/testsuite/gfortran.dg/pr25623-2.f90
new file mode 100644
index 000..57679e0d6ed
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr25623-2.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-optimized-blocks -O3" }
+
+SUBROUTINE S42(a,b,c,N)
+ IMPLICIT NONE
+ integer :: N
+ real*8  :: a(N),b(N),c(N),tmp,tmp2,tmp4
+ real*8, parameter :: p=1.0D0/3.0D0
+ integer :: i
+ c=0.0D0
+ DO i=1,N
+   tmp=a(i)**p ! could even be done with a cube root
+   tmp2=tmp*tmp
+   tmp4=tmp2*tmp2
+   b(i)=b(i)+tmp4
+   c(i)=c(i)+tmp2
+ ENDDO
+END SUBROUTINE
+! { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } }
diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 291f2dbcab9..72792cec21f 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -422,6 +422,7 @@ ch_base::copy_headers (function *fun)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Loop %d never loops.\n", loop->num);
+ scale_loop_profile (loop, profile_probability::always (), 0);
  loops_to_unloop.safe_push (loop);
  loops_to_unloop_nunroll.safe_push (0);
  continue;
@@ -666,6 +667,7 @@ ch_base::copy_headers (function *fun)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Loop %d no longer loops.\n", loop->num);
+ scale_loop_profile (loop, profile_probability::always (), 0);
  loops_to_unloop.safe_push (loop);
  

Re: [PATCH] analyzer: Add support of placement new and improved operator new [PR105948]

2023-07-06 Thread Benjamin Priour via Gcc-patches
As per David's suggestion.
- Improved leading comment of "is_placement_new_p"
- "kf_operator_new::matches_call_types_p" now checks that arg 0 is of
  integral type and that arg 1, if any, is of pointer type.
- Changed ambiguous "int" to "int8_t" and "int64_t" in placement-new-size.C
  to trigger a target independent out-of-bounds warning.
  Other OOB tests were not based on the size of types, but on the number
  elements, so them using "int" didn't lead to any ambiguity.

contrib/check_GNU_style.sh still complains about a space before square
brackets in string "operator new []", but as before, this one space is
mandatory for a correct recognition of the function.

Changes succesfully regstrapped on x86_64-linux-gnu against trunk
3c776fdf1a8.

Is it OK for trunk ?
Thanks again,
Benjamin.

---

Fixed spurious possibly-NULL warning always tagging along throwing
operator new despite it never returning NULL.
Now, operator new is correctly recognized as possibly returning
NULL if and only if it is non-throwing or exceptions have been disabled.
Different standard signatures of operator new are now properly recognized.

Added support of placement new, so that is now properly recognized,
and a 'heap_allocated' region is no longer created for it.
Placement new size is also checked and a 'Wanalyzer-allocation-size'
is emitted when relevant, as well as always a 'Wanalyzer-out-of-bounds'.

gcc/analyzer/ChangeLog:

PR analyzer/105948
* analyzer.h (is_placement_new_p): New declaration.
* call-details.cc
(call_details::maybe_get_arg_region): New function.
Returns the region of the argument at given index if possible.
* call-details.h: Declaration of above function.
* kf-lang-cp.cc (is_placement_new_p): Returns true if the
gcall is recognized as a placement new.
* region-model.cc (region_model::eval_condition):
Now recursively call itself if one the operand is wrapped in a cast.
* sm-malloc.cc (malloc_state_machine::on_stmt): Added
recognition of placement new.

gcc/testsuite/ChangeLog:

PR analyzer/105948
* g++.dg/analyzer/out-of-bounds-placement-new.C: Added a directive.
* g++.dg/analyzer/placement-new.C: Added tests.
* g++.dg/analyzer/new-2.C: New test.
* g++.dg/analyzer/noexcept-new.C: New test.
* g++.dg/analyzer/placement-new-size.C: New test.

Signed-off-by: benjamin priour 
---
 gcc/analyzer/analyzer.h   |   1 +
 gcc/analyzer/call-details.cc  |  11 ++
 gcc/analyzer/call-details.h   |   1 +
 gcc/analyzer/kf-lang-cp.cc| 105 +-
 gcc/analyzer/region-model.cc  |  21 
 gcc/analyzer/sm-malloc.cc |  17 ++-
 gcc/testsuite/g++.dg/analyzer/new-2.C |  50 +
 gcc/testsuite/g++.dg/analyzer/noexcept-new.C  |  48 
 .../analyzer/out-of-bounds-placement-new.C|   2 +-
 .../g++.dg/analyzer/placement-new-size.C  |  27 +
 gcc/testsuite/g++.dg/analyzer/placement-new.C |  63 ++-
 11 files changed, 332 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/new-2.C
 create mode 100644 gcc/testsuite/g++.dg/analyzer/noexcept-new.C
 create mode 100644 gcc/testsuite/g++.dg/analyzer/placement-new-size.C

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 579517c23e6..b86e5cac74d 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -391,6 +391,7 @@ extern bool is_std_named_call_p (const_tree fndecl, const 
char *funcname,
 const gcall *call, unsigned int num_args);
 extern bool is_setjmp_call_p (const gcall *call);
 extern bool is_longjmp_call_p (const gcall *call);
+extern bool is_placement_new_p (const gcall *call);
 
 extern const char *get_user_facing_name (const gcall *call);
 
diff --git a/gcc/analyzer/call-details.cc b/gcc/analyzer/call-details.cc
index 17edaf26276..01f061d774e 100644
--- a/gcc/analyzer/call-details.cc
+++ b/gcc/analyzer/call-details.cc
@@ -152,6 +152,17 @@ call_details::get_arg_svalue (unsigned idx) const
   return m_model->get_rvalue (arg, m_ctxt);
 }
 
+/* If argument IDX's svalue at the callsite is a region_svalue,
+   return the region it points to.
+   Otherwise return NULL.  */
+
+const region *
+call_details::maybe_get_arg_region (unsigned idx) const
+{
+  const svalue *sval = get_arg_svalue (idx);
+  return sval->maybe_get_region ();
+}
+
 /* Attempt to get the string literal for argument IDX, or return NULL
otherwise.
For use when implementing "__analyzer_*" functions that take
diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 14a206ff5d6..aac2b7d33d8 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -55,6 +55,7 @@ public:
   tree get_arg_tree (unsigned idx) const;
   tree get_arg_type (unsigned idx) const;
   const svalue *get_arg_svalue (unsigned 

Re: [PATCH v1 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-07-06 Thread Chenghui Pan

No, vld/vst can't guaranteed to be atomic in this condition. Seems we
can't implement this on LoongArch for now.

On 2023/7/5 20:57, Xi Ruoyao wrote:

A question: is vld/vst guaranteed to be atomic if the accessed address
is aligned?  If true we can use them to implement lock-free 128-bit
atomic load and store.  See https://gcc.gnu.org/bugzilla/PR104688 for
the background, and some people really hate using a lock for atomics.

On Fri, 2023-06-30 at 10:16 +0800, Chenghui Pan wrote:

These patches add the Loongson SX/ASX instruction support to the
LoongArch
target, and can be utilized by using the new "-mlsx" and
"-mlasx" option.

Patches are bootstrapped and tested on loongarch64-linux-gnu target.

Lulu Cheng (6):
   LoongArch: Added Loongson SX vector directive compilation framework.
   LoongArch: Added Loongson SX base instruction support.
   LoongArch: Added Loongson SX directive builtin function support.
   LoongArch: Added Loongson ASX vector directive compilation
framework.
   LoongArch: Added Loongson ASX base instruction support.
   LoongArch: Added Loongson ASX directive builtin function support.

  gcc/config.gcc    |    2 +-
  gcc/config/loongarch/constraints.md   |  128 +-
  .../loongarch/genopts/loongarch-strings   |    4 +
  gcc/config/loongarch/genopts/loongarch.opt.in |   16 +-
  gcc/config/loongarch/lasx.md  | 5147 
  gcc/config/loongarch/lasxintrin.h | 5342
+
  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
  gcc/config/loongarch/loongarch-c.cc   |   18 +
  gcc/config/loongarch/loongarch-def.c  |    6 +
  gcc/config/loongarch/loongarch-def.h  |    9 +-
  gcc/config/loongarch/loongarch-driver.cc  |   10 +
  gcc/config/loongarch/loongarch-driver.h   |    2 +
  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
  gcc/config/loongarch/loongarch-modes.def  |   39 +
  gcc/config/loongarch/loongarch-opts.cc    |   89 +-
  gcc/config/loongarch/loongarch-opts.h |    3 +
  gcc/config/loongarch/loongarch-protos.h   |   35 +
  gcc/config/loongarch/loongarch-str.h  |    3 +
  gcc/config/loongarch/loongarch.cc | 4615 +-
  gcc/config/loongarch/loongarch.h  |  117 +-
  gcc/config/loongarch/loongarch.md |   56 +-
  gcc/config/loongarch/loongarch.opt    |   16 +-
  gcc/config/loongarch/lsx.md   | 4490 ++
  gcc/config/loongarch/lsxintrin.h  | 5181 
  gcc/config/loongarch/predicates.md    |  333 +-
  25 files changed, 28723 insertions(+), 290 deletions(-)
  create mode 100644 gcc/config/loongarch/lasx.md
  create mode 100644 gcc/config/loongarch/lasxintrin.h
  create mode 100644 gcc/config/loongarch/lsx.md
  create mode 100644 gcc/config/loongarch/lsxintrin.h





Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-06 Thread Jan Hubicka via Gcc-patches
Hi,
original scale_loop_profile was implemented to only handle very simple loops
produced by vectorizer at that time (basically loops with only one exit and no
subloops). It also has not been updated to new profile-count API very carefully.
Since I want to use it from loop peeling and unlooping, I need the
function to at least not get profile worse on general loops.

The function does two thigs
 1) scales down the loop profile by a given probability.
This is useful, for example, to scale down profile after peeling when loop
body is executed less often than before
 2) after scaling is done and if profile indicates too large iteration
count update profile to cap iteration count by ITERATION_BOUND parameter.

Step 1 is easy and unchanged.

I changed ITERATION_BOUND to be actual bound on number of iterations as
used elsewhere (i.e. number of executions of latch edge) rather then
number of iterations + 1 as it was before.

To do 2) one needs to do the following
  a) scale own loop profile so frquency o header is at most
 the sum of in-edge counts * (iteration_bound + 1)
  b) update loop exit probabilities so their count is the same
 as before scaling.
  c) reduce frequencies of basic blocks after loop exit

old code did b) by setting probability to 1 / iteration_bound which is
correctly only of the basic block containing exit executes precisely one per
iteration (it is not insie other conditional or inner loop).  This is fixed
now by using set_edge_probability_and_rescale_others

aldo c) was implemented only for special case when the exit was just before
latch bacis block.  I now use dominance info to get right some of addional
case.

I still did not try to do anything for multiple exit loops, though the
implementatoin could be generalized.

Bootstrapped/regtested x86_64-linux.  Plan to cmmit it tonight if there
are no complains.

gcc/ChangeLog:

* cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
probability update to be safe on loops with subloops.
Make bound parameter to be iteration bound.
* tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
of scale_loop_profile.
* tree-vect-loop-manip.cc (vect_do_peeling): Likewise.

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 6e09dcbb0b1..524b979a546 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -499,7 +499,7 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
 }
 
 /* Scale profile in LOOP by P.
-   If ITERATION_BOUND is non-zero, scale even further if loop is predicted
+   If ITERATION_BOUND is not -1, scale even further if loop is predicted
to iterate too many times.
Before caling this function, preheader block profile should be already
scaled to final count.  This is necessary because loop iterations are
@@ -510,106 +510,123 @@ void
 scale_loop_profile (class loop *loop, profile_probability p,
gcov_type iteration_bound)
 {
-  edge e, preheader_e;
-  edge_iterator ei;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  if (!(p == profile_probability::always ()))
 {
-  fprintf (dump_file, ";; Scaling loop %i with scale ",
-  loop->num);
-  p.dump (dump_file);
-  fprintf (dump_file, " bounding iterations to %i\n",
-  (int)iteration_bound);
-}
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, ";; Scaling loop %i with scale ",
+  loop->num);
+ p.dump (dump_file);
+ fprintf (dump_file, "\n");
+   }
 
-  /* Scale the probabilities.  */
-  scale_loop_frequencies (loop, p);
+  /* Scale the probabilities.  */
+  scale_loop_frequencies (loop, p);
+}
 
-  if (iteration_bound == 0)
+  if (iteration_bound == -1)
 return;
 
   gcov_type iterations = expected_loop_iterations_unbounded (loop, NULL, true);
+  if (iterations == -1)
+return;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  fprintf (dump_file, ";; guessed iterations after scaling %i\n",
-  (int)iterations);
+  fprintf (dump_file,
+  ";; guessed iterations of loop %i:%i new upper bound %i:\n",
+  loop->num,
+  (int)iterations,
+  (int)iteration_bound);
 }
 
   /* See if loop is predicted to iterate too many times.  */
   if (iterations <= iteration_bound)
 return;
 
-  preheader_e = loop_preheader_edge (loop);
-
-  /* We could handle also loops without preheaders, but bounding is
- currently used only by optimizers that have preheaders constructed.  */
-  gcc_checking_assert (preheader_e);
-  profile_count count_in = preheader_e->count ();
+  /* Compute number of invocations of the loop.  */
+  profile_count count_in = profile_count::zero ();
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+count_in += e->count ();
 
-  if (count_in > profile_count::zero ()
-  && loop->header->count.initialized_p 

Re: [x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 3:48 PM Roger Sayle  wrote:
>
> > On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle 
> > wrote:
> > >
> > >
> > > Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
> > > result in surprising code.  Consider the example below (from PR 43644):
> > >
> > > __uint128 foo(__uint128 x, unsigned long long y) {
> > >   return x+y;
> > > }
> > >
> > > which currently results in 6 consecutive movq instructions:
> > >
> > > foo:movq%rsi, %rax
> > > movq%rdi, %rsi
> > > movq%rdx, %rcx
> > > movq%rax, %rdi
> > > movq%rsi, %rax
> > > movq%rdi, %rdx
> > > addq%rcx, %rax
> > > adcq$0, %rdx
> > > ret
> > >
> > > The underlying issue is that during RTL expansion, we generate the
> > > following initial RTL for the x argument:
> > >
> > > (insn 4 3 5 2 (set (reg:TI 85)
> > > (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
> > >  (nil))
> > > (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
> > > (reg:DI 87)) "pr43644-2.c":5:1 -1
> > >  (nil))
> > > (insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
> > > (reg:TI 85)) "pr43644-2.c":5:1 -1
> > >  (nil))
> > >
> > > which by combine/reload becomes
> > >
> > > (insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
> > > (const_int 0 [0])) "pr43644-2.c":5:1 -1
> > >  (nil))
> > > (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
> > > (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
> > >  (expr_list:REG_DEAD (reg:DI 93)
> > > (nil)))
> > > (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
> > > (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
> > >  (expr_list:REG_DEAD (reg:DI 94)
> > > (nil)))
> > >
> > > where the heavy use of SUBREG SET_DESTs creates challenges for both
> > > combine and register allocation.
> > >
> > > The improvement proposed here is to avoid these problematic SUBREGs by
> > > adding (two) special cases to ix86_expand_move.  For insn 4, which
> > > sets a TImode destination from a paradoxical SUBREG, to assign the
> > > lowpart, we can use an explicit zero extension (zero_extendditi2 was
> > > added in July 2022), and for insn 5, which sets the highpart of a
> > > TImode register we can use the *insvti_highpart_1 instruction (that
> > > was added in May 2023, after being approved for stage1 in January).
> > > This allows combine to work its magic, merging these insns into a
> > > *concatditi3 and from there into other optimized forms.
> >
> > How about we introduce *insvti_lowpart_1, similar to *insvti_highpart_1, in 
> > the
> > hope that combine is smart enough to also combine these two instructions? 
> > IMO,
> > faking insert to lowpart of the register with zero_extend is a bit 
> > overkill, and could
> > hinder some other optimization opportunities (as perhaps hinted by failing
> > testcases).
>
> The use of ZERO_EXTEND serves two purposes, both the setting of the lowpart
> and of informing the RTL passes that the highpart is dead.  Notice in the 
> original
> RTL stream, i.e. current GCC, insn 25 is inserted by the .286r.init-regs 
> pass, clearing
> the entirety of the TImode register (like a clobber), and preventing TI:84 
> from
> occupying the same registers as DI:93 and DI:94.
>
> If the middle-end had asked the backend to generate a SET to STRICT_LOWPART
> then our hands would be tied, but a paradoxical SUBREG allows us the freedom
> to set the highpart bits to a defined value (we could have used sign 
> extension if
> that was cheap), which then simplifies data-flow and liveness analysis.  
> Allowing the
> highpart to contain undefined or untouched data is exactly the sort of 
> security
> side-channel leakage that the clear regs pass attempts to address.
>
> I can investigate an *insvti_lowpart_1, but I don't think it will help with 
> this
> issue, i.e. it won't prevent init-regs from clobbering/clearing TImode 
> parameters.

Thanks for the explanation, the patch is OK then.

Thanks,
Uros.

>
> > > So for the test case above, we now generate only a single movq:
> > >
> > > foo:movq%rdx, %rax
> > > xorl%edx, %edx
> > > addq%rdi, %rax
> > > adcq%rsi, %rdx
> > > ret
> > >
> > > But there is a little bad news.  This patch causes two (minor) missed
> > > optimization regressions on x86_64; gcc.target/i386/pr82580.c and
> > > gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
> > > no longer generating adcq $0, but instead using xorl.  For the other
> > > FAIL, register allocation now has more freedom and is (arbitrarily)
> > > choosing a register assignment that doesn't match what the test is
> > > expecting.  These issues are easier to explain and fix once this patch
> > > is in the tree.
> > >
> > > The good news is that this approach fixes a number of long standing
> > > issues, that need to checked in bugzilla, including PR target/110533
> > > which was just 

update_bb_profile_for_threading TLC

2023-07-06 Thread Jan Hubicka via Gcc-patches
Hi,
this patch applies some TLC to update_bb_profile_for_threading.  The function 
resales
probabilities by:
   FOR_EACH_EDGE (c, ei, bb->succs)
c->probability /= prob;
which is correct but in case prob is 0 (took all execution counts to the newly
constructed path), this leads to undefined results which do not sum to 100%.

In several other plpaces we need to change probability of one edge and rescale
remaining to sum to 100% so I decided to break this off to helper function
set_edge_probability_and_rescale_others

For jump threading the probability of edge is always reduced, so division is 
right
update, however in general case we also may want to increase probability of the 
edge
which needs different scalling.  This is bit hard to do staying with 
probabilities
in range 0...1 for all temporaries.

For this reason I decided to add profile_probability::apply_scale which is 
symmetric
to what we already have in profile_count::apply_scale and does right thing in
both directions.

Finally I added few early exits so we do not produce confused dumps when
profile is missing and special case the common situation where edges out of BB
are precisely two.  In this case we can set the other edge to inverted
probability and not try to scale (which drops probabily quality from
PRECISE to ADJUSTED).

Bootstrapped/regtested x86_64-linux. The patch has no effect on in count 
mismatches
in tramp3d build and improves out-count.  Will commit it shortly.

gcc/ChangeLog:

* cfg.cc (set_edge_probability_and_rescale_others): New function.
(update_bb_profile_for_threading): Use it; simplify the rest.
* cfg.h (set_edge_probability_and_rescale_others): Declare.
* profile-count.h (profile_probability::apply_scale): New.

diff --git a/gcc/cfg.cc b/gcc/cfg.cc
index 57b40110960..740d4f3581d 100644
--- a/gcc/cfg.cc
+++ b/gcc/cfg.cc
@@ -901,6 +901,67 @@ brief_dump_cfg (FILE *file, dump_flags_t flags)
 }
 }
 
+/* Set probability of E to NEW_PROB and rescale other edges
+   from E->src so their sum remains the same.  */
+
+void
+set_edge_probability_and_rescale_others (edge e, profile_probability new_prob)
+{
+  edge e2;
+  edge_iterator ei;
+  if (e->probability == new_prob)
+return;
+  /* If we made E unconditional, drop other frequencies to 0.  */
+  if (new_prob == profile_probability::always ())
+{
+  FOR_EACH_EDGE (e2, ei, e->src->succs)
+   if (e2 != e)
+ e2->probability = profile_probability::never ();
+}
+  else
+{
+  int n = 0;
+  edge other_e = NULL;
+
+  /* See how many other edges are leaving exit_edge->src.  */
+  FOR_EACH_EDGE (e2, ei, e->src->succs)
+   if (e2 != e && !(e2->flags & EDGE_FAKE))
+ {
+   other_e = e2;
+   n++;
+ }
+  /* If there is only one other edge with non-zero probability we do not
+need to scale which drops quality of profile from precise
+to adjusted.  */
+  if (n == 1)
+   other_e->probability = new_prob.invert ();
+  /* Nothing to do if there are no other edges.  */
+  else if (!n)
+   ;
+  /* Do scaling if possible.  */
+  else if (e->probability.invert ().nonzero_p ())
+   {
+ profile_probability num = new_prob.invert (),
+ den = e->probability.invert ();
+ FOR_EACH_EDGE (e2, ei, e->src->succs)
+   if (e2 != e && !(e2->flags & EDGE_FAKE))
+ e2->probability = e2->probability.apply_scale (num, den);
+   }
+  else
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+";; probability of edge %i->%i set reduced from 1."
+" The remaining edges are left inconsistent.\n",
+e->src->index, e->dest->index);
+ FOR_EACH_EDGE (e2, ei, e->src->succs)
+   if (e2 != e && !(e2->flags & EDGE_FAKE))
+ e2->probability = new_prob.invert ().guessed () / n;
+   }
+}
+  e->probability = new_prob;
+}
+
 /* An edge originally destinating BB of COUNT has been proved to
leave the block by TAKEN_EDGE.  Update profile of BB such that edge E can be
redirected to destination of TAKEN_EDGE.
@@ -912,62 +973,57 @@ void
 update_bb_profile_for_threading (basic_block bb, 
 profile_count count, edge taken_edge)
 {
-  edge c;
-  profile_probability prob;
-  edge_iterator ei;
+  gcc_assert (bb == taken_edge->src);
+
+  /* If there is no profile or the threaded path is never executed
+ we don't need to upate.  */
+  if (!bb->count.initialized_p ()
+  || count == profile_count::zero ())
+return;
 
   if (bb->count < count)
 {
   if (dump_file)
fprintf (dump_file, "bb %i count became negative after threading",
 bb->index);
+  /* If probabilities looks very off, scale down and reduce to guesses
+to avoid dropping the other path close to zero.  */
+  if 

Re: [PATCH][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-07-06 Thread Alex Coplan via Gcc-patches
Hi Iain,

On 20/06/2023 15:08, Iain Sandoe wrote:
> Hi Alex
> 
> again, thanks for working on this and for fixing the SDK blocker.
> 
> > On 20 Jun 2023, at 13:30, Alex Coplan  wrote:
> > 
> 
> > The patch can now survive bootstrap on Darwin (it looks like we'll need
> > to adjust some Objective-C++ tests in light of the new pedwarn, but that
> > looks to be straightforward).
> 
> Yes, I’ll deal with that soon (I was trying to decide whether to fix the the
> header we have copied from GNUStep, or whether to mark it as a system
> header).
> 
> >> (one reason to allow target opt-in/out of specific features)
> >> 
> >>> with the following omissions:
> >> 
> >>> - Objective-C-specific features.
> >> 
> >> I can clearly append the objective-c(++) cases to the end of the respective
> >> lists, but then we need to make them conditional on language, version and
> >> dialect (some will not be appropriate to GNU runtime).
> >> 
> >> this is why I think we need more flexible predicates on declaring features
> >> and extensions.
> > 
> > Would it help mitigate these concerns if I implemented some Objective-C
> > features as part of this patch (say, those implemented by your WIP
> > patch)?
> > 
> > My feeling is that the vast majority of extensions / features have
> > similar logic, so we should exploit that redundancy to keep things terse
> > in the encoding for the general case. Where we need more flexible
> > predicates (e.g. for objc_nonfragile_abi in your WIP patch), those can
> > be handled on a case-by-case basis by adding a new enumerator and logic
> > to handle that specially.
> > 
> > What do you think, does that sound OK to you?
> 
> Sketching out what you have in mind using one or two examples would be
> helpful.  Again, the fact that some of the answers are target-dependent, is
> what makes me think of needing a little more generality.

FWIW I've implemented some Objective-C features (those from your WIP patch)
in a v2 patch here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623057.html

I also tweaked the design to be closer to your patch in that we now have a hash
table which allows for registering features dynamically. Hopefully it's clear
that it should be easier to handle target-specific features in that version.

Any thoughts on the new version?

Thanks,
Alex

> 
> >> What about things like this:
> >> 
> >> attribute_availability_tvos, 
> >> attribute_availability_watchos, 
> >> attribute_availability_driverkit, 
> > 
> > FWIW, clang looks to define these unconditionally, so restricting these
> > to a given target would be deviating from its precedent.
> 
> Hmm.. i did not check that although (for the sake of keeping target-specific
> code localised) my current availabilty attribute implementation is Darwin-
> specific.
> 
> Having said that, interoperability with clang is also a very useful goal - for
> Darwin, the SDK headers have only been (fully) tested with clang up to
> now and I am sure we will find more gotchas as we expand what we can
> parse.
> 
> > However, I don't think it would be hard to extend the implementation in
> > this patch to support target-specific features if required. I think
> > perhaps a langhook that targets can call to add their own features would
> > be a reasonable approach.
> 
> Indeed, that could work if the result is needed later than pre-processing.
> 
> In my patch, IIRC, I added another entry to the libcpp callbacks to handle
> target-specific __has_ queries.
> 
> cheers
> Iain
> 
> 


Re: [PATCH][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-07-06 Thread Alex Coplan via Gcc-patches
Hi Jason,

On 11/05/2023 16:25, Jason Merrill wrote:
> On 5/9/23 08:07, Alex Coplan wrote:
> > This patch implements clang's __has_feature and __has_extension in GCC.
> 
> Thanks!

Thanks a lot for the review, I posted a v2 patch incorporating your
feedback here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623057.html

do the C++ parts of the patch look OK?

Thanks,
Alex

> 
> > Currently the patch aims to implement all documented features (and some
> > undocumented ones) following the documentation at
> > https://clang.llvm.org/docs/LanguageExtensions.html with the following
> > omissions:
> >   - C++ type traits.
> >   - Objective-C-specific features.
> > 
> > C++ type traits aren't currently implemented since, as the clang
> > documentation notes, __has_builtin is the correct "modern" way to query
> > for these (which GCC already implements). Of course there's an argument
> > that we should recognize the legacy set of C++ type traits that can be
> > queried through __has_feature for backwards compatibility with older
> > code. I'm happy to do this if reviewers think that's a good idea.
> 
> That seems unnecessary unless there's a specific motivation.
> 
> > There are some comments in the patch marked with XXX, I'm looking for
> > review comments from C/C++ maintainers on those areas in particular.
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu. Any comments?
> 
> All the has_*_feature_p functions need to check flag_pedantic_errors, for
> compatibility with the Clang documented behavior "If the -pedantic-errors
> option is given, __has_extension is equivalent to __has_feature."
> 
> > +static const cp_feature_info cp_feature_table[] =
> > +{
> > +  { "cxx_exceptions", _exceptions },
> > +  { "cxx_rtti", _rtti },
> > +  { "cxx_access_control_sfinae", { cxx11, cxx98 } },
> > +  { "cxx_alias_templates", cxx11 },
> > +  { "cxx_alignas", cxx11 },
> > +  { "cxx_alignof", cxx11 },
> > +  { "cxx_attributes", cxx11 },
> > +  { "cxx_constexpr", cxx11 },
> > +  { "cxx_constexpr_string_builtins", cxx11 },
> > +  { "cxx_decltype", cxx11 },
> > +  { "cxx_decltype_incomplete_return_types", cxx11 },
> > +  { "cxx_default_function_template_args", cxx11 },
> > +  { "cxx_defaulted_functions", cxx11 }, /* XXX: extension in c++98?  */
> 
> I'm not sure I see the benefit of advertising a lot of these as C++98
> extensions, even if we do accept them with a pedwarn by default.  The ones
> that indicate DRs like cxx_access_control_sfinae, yes, but I'm inclined to
> be conservative if it isn't an extension that libstdc++ relies on, like
> variadic templates or inline namespaces.  My concern is that important
> implementation is limited to C++11 mode even if we don't immediately give an
> error.  For instance,
> 
> struct A
> {
>   int i = 42;
>   A() = default;
> };
> 
> breaks in C++98 mode; even though we only warn for the two C++11 features,
> trying to actually combine them fails.
> 
> So if there's a question, let's say no.
> 
> > +  { "cxx_delegating_constructors", { cxx11, cxx98 } },
> > +  { "cxx_deleted_functions", cxx11 },
> > +  { "cxx_explicit_conversions", { cxx11, cxx98 } },
> > +  { "cxx_generalized_initializers", cxx11 },
> > +  { "cxx_implicit_moves", cxx11 },
> > +  { "cxx_inheriting_constructors", cxx11 }, /* XXX: extension in c++98?  */
> > +  { "cxx_inline_namespaces", { cxx11, cxx98 } },
> > +  { "cxx_lambdas", cxx11 }, /* XXX: extension in c++98?  */
> > +  { "cxx_local_type_template_args", cxx11 },
> > +  { "cxx_noexcept", cxx11 },
> > +  { "cxx_nonstatic_member_init", { cxx11, cxx98 } },
> > +  { "cxx_nullptr", cxx11 },
> > +  { "cxx_override_control", { cxx11, cxx98 } },
> > +  { "cxx_reference_qualified_functions", cxx11 },
> > +  { "cxx_range_for", cxx11 },
> > +  { "cxx_raw_string_literals", cxx11 },
> > +  { "cxx_rvalue_references", cxx11 },
> > +  { "cxx_static_assert", cxx11 },
> > +  { "cxx_thread_local", cxx11 },
> > +  { "cxx_auto_type", cxx11 },
> > +  { "cxx_strong_enums", cxx11 },
> > +  { "cxx_trailing_return", cxx11 },
> > +  { "cxx_unicode_literals", cxx11 },
> > +  { "cxx_unrestricted_unions", cxx11 },
> > +  { "cxx_user_literals", cxx11 },
> > +  { "cxx_variadic_templates", { cxx11, cxx98 } },
> > +  { "cxx_binary_literals", { cxx14, cxx98 } },
> > +  { "cxx_contextual_conversions", { cxx14, cxx98 } },
> > +  { "cxx_decltype_auto", cxx14 },
> > +  { "cxx_aggregate_nsdmi", cxx14 },
> > +  { "cxx_init_captures", { cxx14, cxx11 } },
> > +  { "cxx_generic_lambdas", cxx14 },
> > +  { "cxx_relaxed_constexpr", cxx14 },
> > +  { "cxx_return_type_deduction", cxx14 },
> > +  { "cxx_variable_templates", { cxx14, cxx98 } },
> > +  { "modules", _modules },
> 
> 
> 


[committed] arc: Update builtin documentation

2023-07-06 Thread Claudiu Zissulescu via Gcc-patches
gcc/ChangeLog:
* doc/extend.texi (ARC Built-in Functions): Update documentation
with missing builtins.
---
 gcc/doc/extend.texi | 55 +
 1 file changed, 55 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d701b4d1d41..bfbc1d6cc9f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -15260,6 +15260,23 @@ __builtin_arc_sr()
 __builtin_arc_swi()
 @end example
 
+The following built-in functions are available for the ARCv2 family of
+processors.
+
+@example
+int __builtin_arc_clri ();
+void __builtin_arc_kflag (unsigned);
+void __builtin_arc_seti (int);
+@end example
+
+The following built-in functions are available for the ARCv2 family
+and uses @option{-mnorm}.
+
+@example
+int __builtin_arc_ffs (int);
+int __builtin_arc_fls (int);
+@end example
+
 @node ARC SIMD Built-in Functions
 @subsection ARC SIMD Built-in Functions
 
@@ -15486,6 +15503,44 @@ void __builtin_arc_vst16_n (__v8hi, const int, const 
int, const int);
 void __builtin_arc_vst32_n (__v8hi, const int, const int, const int);
 @end example
 
+The following built-in functions are available on systems that uses
+@option{-mmpy-option=6} or higher.
+
+@example
+__v2hi __builtin_arc_dmach (__v2hi, __v2hi);
+__v2hi __builtin_arc_dmachu (__v2hi, __v2hi);
+__v2hi __builtin_arc_dmpyh (__v2hi, __v2hi);
+__v2hi __builtin_arc_dmpyhu (__v2hi, __v2hi);
+__v2hi __builtin_arc_vaddsub2h (__v2hi, __v2hi);
+__v2hi __builtin_arc_vsubadd2h (__v2hi, __v2hi);
+@end example
+
+The following built-in functions are available on systems that uses
+@option{-mmpy-option=7} or higher.
+
+@example
+__v2si __builtin_arc_vmac2h (__v2hi, __v2hi);
+__v2si __builtin_arc_vmac2hu (__v2hi, __v2hi);
+__v2si __builtin_arc_vmpy2h (__v2hi, __v2hi);
+__v2si __builtin_arc_vmpy2hu (__v2hi, __v2hi);
+@end example
+
+The following built-in functions are available on systems that uses
+@option{-mmpy-option=8} or higher.
+
+@example
+long long __builtin_arc_qmach (__v4hi, __v4hi);
+long long __builtin_arc_qmachu (__v4hi, __v4hi);
+long long __builtin_arc_qmpyh (__v4hi, __v4hi);
+long long __builtin_arc_qmpyhu (__v4hi, __v4hi);
+long long __builtin_arc_dmacwh (__v2si, __v2hi);
+long long __builtin_arc_dmacwhu (__v2si, __v2hi);
+_v2si __builtin_arc_vaddsub (__v2si, __v2si);
+_v2si __builtin_arc_vsubadd (__v2si, __v2si);
+_v4hi __builtin_arc_vaddsub4h (__v4hi, __v4hi);
+_v4hi __builtin_arc_vsubadd4h (__v4hi, __v4hi);
+@end example
+
 @node ARM iWMMXt Built-in Functions
 @subsection ARM iWMMXt Built-in Functions
 
-- 
2.30.2



RE: [x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-07-06 Thread Roger Sayle
> On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle 
> wrote:
> >
> >
> > Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
> > result in surprising code.  Consider the example below (from PR 43644):
> >
> > __uint128 foo(__uint128 x, unsigned long long y) {
> >   return x+y;
> > }
> >
> > which currently results in 6 consecutive movq instructions:
> >
> > foo:movq%rsi, %rax
> > movq%rdi, %rsi
> > movq%rdx, %rcx
> > movq%rax, %rdi
> > movq%rsi, %rax
> > movq%rdi, %rdx
> > addq%rcx, %rax
> > adcq$0, %rdx
> > ret
> >
> > The underlying issue is that during RTL expansion, we generate the
> > following initial RTL for the x argument:
> >
> > (insn 4 3 5 2 (set (reg:TI 85)
> > (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
> >  (nil))
> > (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
> > (reg:DI 87)) "pr43644-2.c":5:1 -1
> >  (nil))
> > (insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
> > (reg:TI 85)) "pr43644-2.c":5:1 -1
> >  (nil))
> >
> > which by combine/reload becomes
> >
> > (insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
> > (const_int 0 [0])) "pr43644-2.c":5:1 -1
> >  (nil))
> > (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
> > (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
> >  (expr_list:REG_DEAD (reg:DI 93)
> > (nil)))
> > (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
> > (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
> >  (expr_list:REG_DEAD (reg:DI 94)
> > (nil)))
> >
> > where the heavy use of SUBREG SET_DESTs creates challenges for both
> > combine and register allocation.
> >
> > The improvement proposed here is to avoid these problematic SUBREGs by
> > adding (two) special cases to ix86_expand_move.  For insn 4, which
> > sets a TImode destination from a paradoxical SUBREG, to assign the
> > lowpart, we can use an explicit zero extension (zero_extendditi2 was
> > added in July 2022), and for insn 5, which sets the highpart of a
> > TImode register we can use the *insvti_highpart_1 instruction (that
> > was added in May 2023, after being approved for stage1 in January).
> > This allows combine to work its magic, merging these insns into a
> > *concatditi3 and from there into other optimized forms.
> 
> How about we introduce *insvti_lowpart_1, similar to *insvti_highpart_1, in 
> the
> hope that combine is smart enough to also combine these two instructions? IMO,
> faking insert to lowpart of the register with zero_extend is a bit overkill, 
> and could
> hinder some other optimization opportunities (as perhaps hinted by failing
> testcases).

The use of ZERO_EXTEND serves two purposes, both the setting of the lowpart
and of informing the RTL passes that the highpart is dead.  Notice in the 
original
RTL stream, i.e. current GCC, insn 25 is inserted by the .286r.init-regs pass, 
clearing
the entirety of the TImode register (like a clobber), and preventing TI:84 from
occupying the same registers as DI:93 and DI:94.

If the middle-end had asked the backend to generate a SET to STRICT_LOWPART
then our hands would be tied, but a paradoxical SUBREG allows us the freedom
to set the highpart bits to a defined value (we could have used sign extension 
if
that was cheap), which then simplifies data-flow and liveness analysis.  
Allowing the
highpart to contain undefined or untouched data is exactly the sort of security
side-channel leakage that the clear regs pass attempts to address.

I can investigate an *insvti_lowpart_1, but I don't think it will help with this
issue, i.e. it won't prevent init-regs from clobbering/clearing TImode 
parameters.

> > So for the test case above, we now generate only a single movq:
> >
> > foo:movq%rdx, %rax
> > xorl%edx, %edx
> > addq%rdi, %rax
> > adcq%rsi, %rdx
> > ret
> >
> > But there is a little bad news.  This patch causes two (minor) missed
> > optimization regressions on x86_64; gcc.target/i386/pr82580.c and
> > gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
> > no longer generating adcq $0, but instead using xorl.  For the other
> > FAIL, register allocation now has more freedom and is (arbitrarily)
> > choosing a register assignment that doesn't match what the test is
> > expecting.  These issues are easier to explain and fix once this patch
> > is in the tree.
> >
> > The good news is that this approach fixes a number of long standing
> > issues, that need to checked in bugzilla, including PR target/110533
> > which was just opened/reported earlier this week.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with only the two new FAILs described above.  Ok for mainline?
> >
> > 2023-07-06  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/43644
> > PR 

[PATCH] tree-optimization/110556 - tail merging still pre-tuples

2023-07-06 Thread Richard Biener via Gcc-patches
The stmt comparison function for GIMPLE_ASSIGNs for tail merging
still looks like it deals with pre-tuples IL.  The following
attempts to fix this, not only comparing the first operand (sic!)
of stmts but all of them plus also compare the operation code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110556
* tree-ssa-tail-merge.cc (gimple_equal_p): Check
assign code and all operands of non-stores.

* gcc.dg/torture/pr110556.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110556.c | 42 +
 gcc/tree-ssa-tail-merge.cc  | 22 ++---
 2 files changed, 59 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110556.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110556.c 
b/gcc/testsuite/gcc.dg/torture/pr110556.c
new file mode 100644
index 000..bc60db885e2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110556.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-additional-options "-fno-tree-fre -fno-delete-dead-exceptions 
-fnon-call-exceptions" } */
+
+typedef __INT32_TYPE__ int32_t;
+typedef __INT64_TYPE__ int64_t;
+
+static int64_t __attribute__((noinline,noclone))
+safe_mul_func_int64_t_s_s(int64_t si1, int64_t si2)
+{
+  return si1 > 0) && (si2 > 0) && (si1 > ( (9223372036854775807L) / si2)))
+   || ((si1 > 0) && (si2 <= 0) && (si2 < ( (-9223372036854775807L -1) 
/ si1)))
+   || ((si1 <= 0) && (si2 > 0) && (si1 < ( (-9223372036854775807L -1) 
/ si2)))
+   || ((si1 <= 0) && (si2 <= 0) && (si1 != 0) && (si2 < ( 
(9223372036854775807L) / si1
+  ? ((si1)) : si1 * si2);
+}
+
+static int32_t g_93 = 0x947A4BBFL;
+static int32_t tt = 6;
+int64_t ty, ty1;
+
+static void func_34(void)
+{
+ ty=safe_mul_func_int64_t_s_s (g_93, -1L) ;
+}
+static void func_30(void)
+{
+  ty1=safe_mul_func_int64_t_s_s(0, tt);
+}
+static void func_6(void)
+{
+ for (int g_9 = 5; (g_9 >= 0); g_9 -= 1)
+ {
+  func_34();
+  func_30 ();
+ }
+}
+
+int main ()
+{
+ func_6();
+}
diff --git a/gcc/tree-ssa-tail-merge.cc b/gcc/tree-ssa-tail-merge.cc
index 13bc8532bf2..33acb649d5d 100644
--- a/gcc/tree-ssa-tail-merge.cc
+++ b/gcc/tree-ssa-tail-merge.cc
@@ -1165,6 +1165,9 @@ gimple_equal_p (same_succ *same_succ, gimple *s1, gimple 
*s2)
   return operand_equal_p (lhs1, lhs2, 0);
 
 case GIMPLE_ASSIGN:
+  if (gimple_assign_rhs_code (s1) != gimple_assign_rhs_code (s2))
+   return false;
+
   lhs1 = gimple_get_lhs (s1);
   lhs2 = gimple_get_lhs (s2);
   if (TREE_CODE (lhs1) != SSA_NAME
@@ -1172,11 +1175,20 @@ gimple_equal_p (same_succ *same_succ, gimple *s1, 
gimple *s2)
return (operand_equal_p (lhs1, lhs2, 0)
&& gimple_operand_equal_value_p (gimple_assign_rhs1 (s1),
 gimple_assign_rhs1 (s2)));
-  else if (TREE_CODE (lhs1) == SSA_NAME
-  && TREE_CODE (lhs2) == SSA_NAME)
-   return operand_equal_p (gimple_assign_rhs1 (s1),
-   gimple_assign_rhs1 (s2), 0);
-  return false;
+
+  if (TREE_CODE (lhs1) != SSA_NAME
+ || TREE_CODE (lhs2) != SSA_NAME)
+   return false;
+
+  gcc_checking_assert (gimple_num_args (s1) == gimple_num_args (s2));
+  for (i = 0; i < gimple_num_args (s1); ++i)
+   {
+ t1 = gimple_arg (s1, i);
+ t2 = gimple_arg (s2, i);
+ if (!gimple_operand_equal_value_p (t1, t2))
+   return false;
+   }
+  return true;
 
 case GIMPLE_COND:
   t1 = gimple_cond_lhs (s1);
-- 
2.35.3


Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
 wrote:
>
> Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
> Tested successfully on x86_64 and x86 targets.
>
> PR middle-end/109986
>
> gcc/ChangeLog:
>
> * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/pr109986.c: New test.
> * gcc.dg/tree-ssa/pr109986.c: New test.
> ---
>  gcc/match.pd  |  11 ++
>  .../gcc.c-torture/execute/pr109986.c  |  41 
>  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c  | 177 ++
>  3 files changed, 229 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a17d6838c14..d9d7d932881 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>(convert (bit_and @1 (bit_not @0)
>
> +/* (~X | Y) ^ X -> ~(X & Y).  */
> +(simplify
> + (bit_xor:c (nop_convert1?
> + (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> +@1)) (nop_convert4? @0))

you want to reduce the number of nop_convert? - for example
I wonder if we can canonicalize

 (T)~X and ~(T)X

for nop-conversions.  The same might apply to binary bitwise operations
where we should push those to a direction where they are likely eliminated.
Usually we'd push them outwards.

The issue with the above pattern is that nop_convertN? expands to 2^N
separate patterns.  Together with the two :c you get 64 out of this.

I do not see that all of the combinations can happen when X has to
match unless we fail to contract some of them like if we have
(unsigned)(~(signed)X | Y) ^ X which we could rewrite like
-> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
with the last step being somewhat difficult unless we do
(signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
propagation problem and less of a direct pattern matching one.

> +  (if (types_match (type, @1))
> +   (bit_not (bit_and @1 (convert @0)))
> +   (if (types_match (type, @0))
> +(bit_not (bit_and (convert @1) @0))
> +(convert (bit_not (bit_and @0 (convert @1)))

You can elide the types_match checks and instead always emit

  (convert (bit_not (bit_and @0 (convert @1)))

the conversions are elided when the types match.

Richard.

> +
>  /* Convert ~X ^ ~Y to X ^ Y.  */
>  (simplify
>   (bit_xor (convert1? (bit_not @0)) (convert2? (bit_not @1)))
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr109986.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
> new file mode 100644
> index 000..00ee9888539
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
> @@ -0,0 +1,41 @@
> +/* PR middle-end/109986 */
> +
> +#include "../../gcc.dg/tree-ssa/pr109986.c"
> +
> +int
> +main ()
> +{
> +  if (t1 (29789, 29477) != -28678) __builtin_abort ();
> +  if (t2 (20196, -18743) != 4294965567) __builtin_abort ();
> +  if (t3 (127, 99) != -100) __builtin_abort ();
> +  if (t4 (100, 53) != 219) __builtin_abort ();
> +  if (t5 (20100, 1283) != -1025) __builtin_abort ();
> +  if (t6 (20100, 10283) != 63487) __builtin_abort ();
> +  if (t7 (2136614690L, 1136698390L) != -1128276995L) __builtin_abort ();
> +  if (t8 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();
> +  if (t9 (9176690219839792930LL, 3176690219839721234LL) != 
> -3175044472123688707LL)
> +__builtin_abort ();
> +  if (t10 (9176690219839792930LL, 3176690219839721234LL) != 
> 15271699601585862909ULL)
> +__builtin_abort ();
> +  if (t11 (29789, 29477) != -28678) __builtin_abort ();
> +  if (t12 (20196, -18743) != 4294965567) __builtin_abort ();
> +  if (t13 (127, 99) != -100) __builtin_abort ();
> +  if (t14 (100, 53) != 219) __builtin_abort ();
> +  if (t15 (20100, 1283) != -1025) __builtin_abort ();
> +  if (t16 (20100, 10283) != 63487) __builtin_abort ();
> +  if (t17 (2136614690, 1136698390) != -1128276995) __builtin_abort ();
> +  if (t18 (1136698390L, 2136614690L) != -1128276995UL) __builtin_abort ();
> +  if (t19 (9176690219839792930LL, 3176690219839721234LL) != 
> -3175044472123688707LL)
> +__builtin_abort ();
> +  if (t20 (9176690219839792930LL, 3176690219839721234LL) != 
> 15271699601585862909ULL)
> +__builtin_abort ();
> +  v4si a1 = {1, 2, 3, 4};
> +  v4si a2 = {6, 7, 8, 9};
> +  v4si r1 = {-1, -3, -1, -1};
> +  v4si b1 = t21 (a1, a2);
> +  v4si b2 = t22 (a1, a2);
> +  if (__builtin_memcmp (,  ,  sizeof (b1) != 0)) __builtin_abort();
> +  if (__builtin_memcmp (,  ,  sizeof (b2) != 0)) __builtin_abort();
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> new file mode 100644
> index 000..45f099b5656
> --- 

Re: [x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle  wrote:
>
>
> Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
> result in surprising code.  Consider the example below (from PR 43644):
>
> __uint128 foo(__uint128 x, unsigned long long y) {
>   return x+y;
> }
>
> which currently results in 6 consecutive movq instructions:
>
> foo:movq%rsi, %rax
> movq%rdi, %rsi
> movq%rdx, %rcx
> movq%rax, %rdi
> movq%rsi, %rax
> movq%rdi, %rdx
> addq%rcx, %rax
> adcq$0, %rdx
> ret
>
> The underlying issue is that during RTL expansion, we generate the
> following initial RTL for the x argument:
>
> (insn 4 3 5 2 (set (reg:TI 85)
> (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
>  (nil))
> (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
> (reg:DI 87)) "pr43644-2.c":5:1 -1
>  (nil))
> (insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
> (reg:TI 85)) "pr43644-2.c":5:1 -1
>  (nil))
>
> which by combine/reload becomes
>
> (insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
> (const_int 0 [0])) "pr43644-2.c":5:1 -1
>  (nil))
> (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
> (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
>  (expr_list:REG_DEAD (reg:DI 93)
> (nil)))
> (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
> (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
>  (expr_list:REG_DEAD (reg:DI 94)
> (nil)))
>
> where the heavy use of SUBREG SET_DESTs creates challenges for both
> combine and register allocation.
>
> The improvement proposed here is to avoid these problematic SUBREGs
> by adding (two) special cases to ix86_expand_move.  For insn 4, which
> sets a TImode destination from a paradoxical SUBREG, to assign the
> lowpart, we can use an explicit zero extension (zero_extendditi2 was
> added in July 2022), and for insn 5, which sets the highpart of a
> TImode register we can use the *insvti_highpart_1 instruction (that
> was added in May 2023, after being approved for stage1 in January).
> This allows combine to work its magic, merging these insns into a
> *concatditi3 and from there into other optimized forms.

How about we introduce *insvti_lowpart_1, similar to
*insvti_highpart_1, in the hope that combine is smart enough to also
combine these two instructions? IMO, faking insert to lowpart of the
register with zero_extend is a bit overkill, and could hinder some
other optimization opportunities (as perhaps hinted by failing
testcases).

Uros.

> So for the test case above, we now generate only a single movq:
>
> foo:movq%rdx, %rax
> xorl%edx, %edx
> addq%rdi, %rax
> adcq%rsi, %rdx
> ret
>
> But there is a little bad news.  This patch causes two (minor) missed
> optimization regressions on x86_64; gcc.target/i386/pr82580.c and
> gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
> no longer generating adcq $0, but instead using xorl.  For the other
> FAIL, register allocation now has more freedom and is (arbitrarily)
> choosing a register assignment that doesn't match what the test is
> expecting.  These issues are easier to explain and fix once this patch
> is in the tree.
>
> The good news is that this approach fixes a number of long standing
> issues, that need to checked in bugzilla, including PR target/110533
> which was just opened/reported earlier this week.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with only the two new FAILs described above.  Ok for mainline?
>
> 2023-07-06  Roger Sayle  
>
> gcc/ChangeLog
> PR target/43644
> PR target/110533
> * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
> TImode destinations from paradoxical SUBREGs (setting the lowpart)
> into explicit zero extensions.  Use *insvti_highpart_1 instruction
> to set the highpart of a TImode destination.
>
> gcc/testsuite/ChangeLog
> PR target/43644
> PR target/110533
> * gcc.target/i386/pr110533.c: New test case.
> * gcc.target/i386/pr43644-2.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
 wrote:
>
> Hi,
>
> If a loop is unrolled by n times during vectoriation, two steps are used to
> calculate the induction variable:
>   - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * Step)
>   - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
>
> This patch calculates an extra vec_n to replace vec_loop:
>   vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
>
> So that we can save the large step register and related operations.

OK.  It would be nice to avoid the dead stmts created earlier though.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/110449
> * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
> vec_loop for the unrolled loop.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/pr110449.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +
>  gcc/tree-vect-loop.cc   | 21 +--
>  2 files changed, 58 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c 
> b/gcc/testsuite/gcc.target/aarch64/pr110449.c
> new file mode 100644
> index 000..bb3b6dcfe08
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param 
> aarch64-vect-unroll-limit=2" } */
> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */
> +
> +/* Calcualte the vectorized induction with smaller step for an unrolled loop.
> +
> +   before (suggested_unroll_factor=2):
> + fmovs30, 8.0e+0
> + fmovs31, 4.0e+0
> + dup v27.4s, v30.s[0]
> + dup v28.4s, v31.s[0]
> + .L6:
> + mov v30.16b, v31.16b
> + faddv31.4s, v31.4s, v27.4s
> + faddv29.4s, v30.4s, v28.4s
> + stp q30, q29, [x0]
> + add x0, x0, 32
> + cmp x1, x0
> + bne .L6
> +
> +   after:
> + fmovs31, 4.0e+0
> + dup v29.4s, v31.s[0]
> + .L6:
> + faddv30.4s, v31.4s, v29.4s
> + stp q31, q30, [x0]
> + add x0, x0, 32
> + faddv31.4s, v29.4s, v30.4s
> + cmp x0, x1
> + bne .L6  */
> +
> +void
> +foo2 (float *arr, float freq, float step)
> +{
> +  for (int i = 0; i < 1024; i++)
> +{
> +  arr[i] = freq;
> +  freq += step;
> +}
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3b46c58a8d8..706ecbffd0c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>new_vec, step_vectype, NULL);
>
>vec_def = induc_def;
> -  for (i = 1; i < ncopies; i++)
> +  for (i = 1; i < ncopies + 1; i++)
> {
>   /* vec_i = vec_prev + vec_step  */
>   gimple_seq stmts = NULL;
> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>   vec_def = gimple_convert (, vectype, vec_def);
>
>   gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
> - new_stmt = SSA_NAME_DEF_STMT (vec_def);
> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> + if (i < ncopies)
> +   {
> + new_stmt = SSA_NAME_DEF_STMT (vec_def);
> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> +   }
> + else
> +   {
> + /* vec_1 = vec_iv + (VF/n * S)
> +vec_2 = vec_1 + (VF/n * S)
> +...
> +vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop
> +
> +vec_n is used as vec_loop to save the large step register and
> +related operations.  */
> + add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
> +  UNKNOWN_LOCATION);
> +   }
> }
>  }
>
> --
> 2.34.1


[x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-07-06 Thread Roger Sayle

Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
result in surprising code.  Consider the example below (from PR 43644):

__uint128 foo(__uint128 x, unsigned long long y) {
  return x+y;
}

which currently results in 6 consecutive movq instructions:

foo:movq%rsi, %rax
movq%rdi, %rsi
movq%rdx, %rcx
movq%rax, %rdi
movq%rsi, %rax
movq%rdi, %rdx
addq%rcx, %rax
adcq$0, %rdx
ret

The underlying issue is that during RTL expansion, we generate the
following initial RTL for the x argument:

(insn 4 3 5 2 (set (reg:TI 85)
(subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
 (nil))
(insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
(reg:DI 87)) "pr43644-2.c":5:1 -1
 (nil))
(insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
(reg:TI 85)) "pr43644-2.c":5:1 -1
 (nil))

which by combine/reload becomes

(insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
(const_int 0 [0])) "pr43644-2.c":5:1 -1
 (nil))
(insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
(reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
 (expr_list:REG_DEAD (reg:DI 93)
(nil)))
(insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
(reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
 (expr_list:REG_DEAD (reg:DI 94)
(nil)))

where the heavy use of SUBREG SET_DESTs creates challenges for both
combine and register allocation.

The improvement proposed here is to avoid these problematic SUBREGs
by adding (two) special cases to ix86_expand_move.  For insn 4, which
sets a TImode destination from a paradoxical SUBREG, to assign the
lowpart, we can use an explicit zero extension (zero_extendditi2 was
added in July 2022), and for insn 5, which sets the highpart of a
TImode register we can use the *insvti_highpart_1 instruction (that
was added in May 2023, after being approved for stage1 in January).
This allows combine to work its magic, merging these insns into a
*concatditi3 and from there into other optimized forms.

So for the test case above, we now generate only a single movq:

foo:movq%rdx, %rax
xorl%edx, %edx
addq%rdi, %rax
adcq%rsi, %rdx
ret

But there is a little bad news.  This patch causes two (minor) missed
optimization regressions on x86_64; gcc.target/i386/pr82580.c and
gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
no longer generating adcq $0, but instead using xorl.  For the other
FAIL, register allocation now has more freedom and is (arbitrarily)
choosing a register assignment that doesn't match what the test is
expecting.  These issues are easier to explain and fix once this patch
is in the tree.

The good news is that this approach fixes a number of long standing
issues, that need to checked in bugzilla, including PR target/110533
which was just opened/reported earlier this week.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with only the two new FAILs described above.  Ok for mainline?

2023-07-06  Roger Sayle  

gcc/ChangeLog
PR target/43644
PR target/110533
* config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
TImode destinations from paradoxical SUBREGs (setting the lowpart)
into explicit zero extensions.  Use *insvti_highpart_1 instruction
to set the highpart of a TImode destination.

gcc/testsuite/ChangeLog
PR target/43644
PR target/110533
* gcc.target/i386/pr110533.c: New test case.
* gcc.target/i386/pr43644-2.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 567248d..92ffa4b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -429,6 +429,16 @@ ix86_expand_move (machine_mode mode, rtx operands[])
 
 default:
   break;
+
+case SUBREG:
+  /* Transform TImode paradoxical SUBREG into zero_extendditi2.  */
+  if (TARGET_64BIT
+ && mode == TImode
+ && SUBREG_P (op1)
+ && GET_MODE (SUBREG_REG (op1)) == DImode
+ && SUBREG_BYTE (op1) == 0)
+   op1 = gen_rtx_ZERO_EXTEND (TImode, SUBREG_REG (op1));
+  break;
 }
 
   if ((flag_pic || MACHOPIC_INDIRECT)
@@ -532,6 +542,24 @@ ix86_expand_move (machine_mode mode, rtx operands[])
}
 }
 
+  /* Use *insvti_highpart_1 to set highpart of TImode register.  */
+  if (TARGET_64BIT
+  && mode == DImode
+  && SUBREG_P (op0)
+  && SUBREG_BYTE (op0) == 8
+  && GET_MODE (SUBREG_REG (op0)) == TImode
+  && REG_P (SUBREG_REG (op0))
+  && REG_P (op1))
+{
+  wide_int mask = wi::mask (64, false, 128);
+  rtx tmp = immed_wide_int_const (mask, TImode);
+  op0 = SUBREG_REG (op0);
+  tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
+  op1 = gen_rtx_ZERO_EXTEND 

[r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8 on Linux/x86_64

2023-07-06 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

e007369c8b67bcabd57c4fed8cff2a6db82e78e6 is the first bad commit
commit e007369c8b67bcabd57c4fed8cff2a6db82e78e6
Author: Jan Beulich 
Date:   Wed Jul 5 09:49:16 2023 +0200

x86: yet more PR target/100711-like splitting

caused

FAIL: gcc.target/i386/pr100711-1.c scan-assembler-times pandn 2
FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2314/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr100711-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr100711-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[r14-2310 Regression] FAIL: gcc.target/i386/pr53652-1.c scan-assembler-times pandn[ \\t] 2 on Linux/x86_64

2023-07-06 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

2d11c99dfca3cc603dbbfafb3afc41689a68e40f is the first bad commit
commit 2d11c99dfca3cc603dbbfafb3afc41689a68e40f
Author: Jan Beulich 
Date:   Wed Jul 5 09:41:09 2023 +0200

x86: use VPTERNLOG also for certain andnot forms

caused

FAIL: gcc.target/i386/pr53652-1.c scan-assembler-not vpternlogq[ \\t]
FAIL: gcc.target/i386/pr53652-1.c scan-assembler-times pandn[ \\t] 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2310/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr53652-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr53652-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[COMMITTED] ada: Add specification source files of runtime units

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Claire Dross 

gcc/ada/

* gcc-interface/Make-lang.in: Add object files of specification
files.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/Make-lang.in | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index 364dea64bbf..8c9eec3e96a 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -550,8 +550,11 @@ GNAT_ADA_OBJS+= \
  ada/libgnat/s-trasym.o \
  ada/libgnat/s-unstyp.o\
  ada/libgnat/s-valint.o\
+ ada/libgnat/s-valspe.o\
  ada/libgnat/s-valuns.o\
  ada/libgnat/s-valuti.o\
+ ada/libgnat/s-vs_int.o \
+ ada/libgnat/s-vs_uns.o \
  ada/libgnat/s-wchcnv.o\
  ada/libgnat/s-wchcon.o\
  ada/libgnat/s-wchjis.o\
-- 
2.40.0



[COMMITTED] ada: Evaluate static expressions in Range attributes

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

Gigi assumes that the value of range expressions is an integer literal.
Force evaluation of such expressions since static non-literal expressions
are not always evaluated to a literal form by gnat.

gcc/ada/

* sem_attr.adb (analyze_attribute.check_array_type): Replace valid
indexes with their staticly evaluated values.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 7a47abdb625..e00addd0152 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -2013,10 +2013,20 @@ package body Sem_Attr is
Flag_Non_Static_Expr
  ("expression for dimension must be static!", E1);
Error_Attr;
-
-elsif Expr_Value (E1) > D or else Expr_Value (E1) < 1 then
-   Error_Attr ("invalid dimension number for array type", E1);
 end if;
+
+declare
+   Value : constant Uint := Expr_Value (E1);
+begin
+
+   if Value > D or else Value < 1 then
+  Error_Attr ("invalid dimension number for array type", E1);
+   end if;
+
+   --  Replace the static value to simplify the tree for gigi
+   Fold_Uint (E1, Value, True);
+end;
+
  end if;
 
  if (Style_Check and Style_Check_Array_Attribute_Index)
-- 
2.40.0



[COMMITTED] ada: Refactor the proof of the Value and Image runtime units

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Claire Dross 

The aim of this refactoring is to avoid unnecessary dependencies
between Image and Value units even though they share the same
specification functions. These functions are grouped inside ghost
packages which are then withed by Image and Value units.

gcc/ada/

* libgnat/s-vs_int.ads: Instance of Value_I_Spec for Integer.
* libgnat/s-vs_lli.ads: Instance of Value_I_Spec for
Long_Long_Integer.
* libgnat/s-vsllli.ads: Instance of Value_I_Spec for
Long_Long_Long_Integer.
* libgnat/s-vs_uns.ads: Instance of Value_U_Spec for Unsigned.
* libgnat/s-vs_llu.ads: Instance of Value_U_Spec for
Long_Long_Unsigned.
* libgnat/s-vslllu.ads: Instance of Value_U_Spec for
Long_Long_Long_Unsigned.
* libgnat/s-imagei.ads: Take instances of Value_*_Spec as
parameters.
* libgnat/s-imagei.adb: Idem.
* libgnat/s-imageu.ads: Idem.
* libgnat/s-imageu.adb: Idem.
* libgnat/s-valuei.ads: Idem.
* libgnat/s-valuei.adb: Idem.
* libgnat/s-valueu.ads: Idem.
* libgnat/s-valueu.adb: Idem.
* libgnat/s-imgint.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imguns.ads: Adapt instance to new ghost parameters.
* libgnat/s-valint.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallli.ads: Adapt instance to new ghost parameters.
* libgnat/s-vai.ads: Adapt instance to new ghost parameters.
* libgnat/s-vau.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valuns.ads: Adapt instance to new ghost parameters.
* libgnat/s-vaispe.ads: Take instance of Value_U_Spec as parameter
and remove unused declaration.
* libgnat/s-vaispe.adb: Idem.
* libgnat/s-vauspe.ads: Remove unused declaration.
* libgnat/s-valspe.ads: Factor out the specification part of
Val_Util.
* libgnat/s-valspe.adb: Idem.
* libgnat/s-valuti.ads: Move specification to Val_Spec.
* libgnat/s-valuti.adb: Idem.
* libgnat/s-valboo.ads: Use Val_Spec.
* libgnat/s-valboo.adb: Idem.
* libgnat/s-imgboo.adb: Idem.
* libgnat/s-imagef.adb: Adapt instances to new ghost parameters.
* Makefile.rtl: List new files.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/Makefile.rtl  |   7 +
 gcc/ada/libgnat/s-imagef.adb  |  12 +-
 gcc/ada/libgnat/s-imagei.adb  |   4 +-
 gcc/ada/libgnat/s-imagei.ads  |  17 +-
 gcc/ada/libgnat/s-imageu.adb  |  81 ++
 gcc/ada/libgnat/s-imageu.ads  |  20 +-
 gcc/ada/libgnat/s-imgboo.adb  |   6 +-
 gcc/ada/libgnat/s-imgint.ads  |  13 +-
 gcc/ada/libgnat/s-imglli.ads  |  14 +-
 gcc/ada/libgnat/s-imgllli.ads |  14 +-
 gcc/ada/libgnat/s-imglllu.ads |  10 +-
 gcc/ada/libgnat/s-imgllu.ads  |   9 +-
 gcc/ada/libgnat/s-imguns.ads  |   9 +-
 gcc/ada/libgnat/s-vaispe.adb  |  10 +-
 gcc/ada/libgnat/s-vaispe.ads  |  42 +--
 gcc/ada/libgnat/s-valboo.adb  |   2 +-
 gcc/ada/libgnat/s-valboo.ads  |  12 +-
 gcc/ada/libgnat/s-valint.ads  |   5 +-
 gcc/ada/libgnat/s-vallli.ads  |   5 +-
 gcc/ada/libgnat/s-vai.ads |   5 +-
 gcc/ada/libgnat/s-vau.ads |   3 +-
 gcc/ada/libgnat/s-valllu.ads  |   3 +-
 gcc/ada/libgnat/s-valspe.adb  |  82 ++
 gcc/ada/libgnat/s-valspe.ads  | 211 +++
 gcc/ada/libgnat/s-valuei.adb  |   6 +-
 gcc/ada/libgnat/s-valuei.ads  |  21 +-
 gcc/ada/libgnat/s-valueu.adb  |   1 +
 gcc/ada/libgnat/s-valueu.ads  |   8 +-
 gcc/ada/libgnat/s-valuns.ads  |   3 +-
 gcc/ada/libgnat/s-valuti.adb  |  50 +---
 gcc/ada/libgnat/s-valuti.ads  | 474 ++
 gcc/ada/libgnat/s-vauspe.ads  |  53 +---
 gcc/ada/libgnat/s-vs_int.ads  |  59 +
 gcc/ada/libgnat/s-vs_lli.ads  |  60 +
 gcc/ada/libgnat/s-vs_llu.ads  |  58 +
 gcc/ada/libgnat/s-vs_uns.ads  |  57 
 gcc/ada/libgnat/s-vsllli.ads  |  60 +
 gcc/ada/libgnat/s-vslllu.ads  |  58 +
 38 files changed, 835 insertions(+), 729 deletions(-)
 create mode 100644 gcc/ada/libgnat/s-valspe.adb
 create mode 100644 gcc/ada/libgnat/s-valspe.ads
 create mode 100644 gcc/ada/libgnat/s-vs_int.ads
 create mode 100644 gcc/ada/libgnat/s-vs_lli.ads
 create mode 100644 gcc/ada/libgnat/s-vs_llu.ads
 create mode 100644 gcc/ada/libgnat/s-vs_uns.ads
 create mode 100644 gcc/ada/libgnat/s-vsllli.ads
 create mode 100644 gcc/ada/libgnat/s-vslllu.ads

diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index ca4c528a7e0..b94caa45b10 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -772,6 +772,7 @@ GNATRTL_NONTASKING_OBJS= \
   s-vallli$(objext) \
 

[COMMITTED] ada: Reuse code in Is_Fully_Initialized_Type

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

gcc/ada/

* sem_util.adb (Is_Fully_Initialized_Type): Avoid recalculating
the underlying type twice.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 736751f5fae..821aacf1ccb 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -17333,7 +17333,7 @@ package body Sem_Util is
   declare
  Init : constant Entity_Id :=
   (Find_Optional_Prim_Op
- (Underlying_Type (Typ), Name_Initialize));
+ (Utyp, Name_Initialize));
 
   begin
  if Present (Init)
-- 
2.40.0



[COMMITTED] ada: Refer to non-Ada binding limitations in user guide

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

The limitation of resetting the FPU mode for non 80-bit
precision was not referenced from "Creating a Stand-alone
Library to be used in a non-Ada context". Reference it the same
way it is already referenced from "Interfacing to C".

gcc/ada/

* doc/gnat_ugn/the_gnat_compilation_model.rst: Reference "Binding
with Non-Ada Main Programs" from "Creating a Stand-alone Library
to be used in a non-Ada context".
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../gnat_ugn/the_gnat_compilation_model.rst   |  3 +
 gcc/ada/gnat_ugn.texi | 65 ++-
 2 files changed, 37 insertions(+), 31 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst 
b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
index e4639d90eff..148d40815b8 100644
--- a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
+++ b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
@@ -2331,6 +2331,9 @@ finalization of all Ada libraries must be performed at 
the end of the program.
 No call to these libraries or to the Ada run-time library should be made
 after the finalization phase.
 
+Information on limitations of binding Ada code in non-Ada contexts can be
+found under :ref:`Binding_with_Non-Ada_Main_Programs`.
+
 Note also that special care must be taken with multi-tasks
 applications. The initialization and finalization routines are not
 protected against concurrent access. If such requirement is needed it
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 104adb9b489..37d914ce0e3 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT User's Guide for Native Platforms , Jul 04, 2023
+GNAT User's Guide for Native Platforms , Jul 06, 2023
 
 AdaCore
 
@@ -3857,6 +3857,9 @@ finalization of all Ada libraries must be performed at 
the end of the program.
 No call to these libraries or to the Ada run-time library should be made
 after the finalization phase.
 
+Information on limitations of binding Ada code in non-Ada contexts can be
+found under @ref{7e,,Binding with Non-Ada Main Programs}.
+
 Note also that special care must be taken with multi-tasks
 applications. The initialization and finalization routines are not
 protected against concurrent access. If such requirement is needed it
@@ -3864,7 +3867,7 @@ must be ensured at the application level using a specific 
operating
 system services like a mutex or a critical-section.
 
 @node Restrictions in Stand-alone Libraries,,Creating a Stand-alone Library to 
be used in a non-Ada context,Stand-alone Ada Libraries
-@anchor{gnat_ugn/the_gnat_compilation_model 
id45}@anchor{7e}@anchor{gnat_ugn/the_gnat_compilation_model 
restrictions-in-stand-alone-libraries}@anchor{7f}
+@anchor{gnat_ugn/the_gnat_compilation_model 
id45}@anchor{7f}@anchor{gnat_ugn/the_gnat_compilation_model 
restrictions-in-stand-alone-libraries}@anchor{80}
 @subsubsection Restrictions in Stand-alone Libraries
 
 
@@ -3910,7 +3913,7 @@ In practice these attributes are rarely used, so this is 
unlikely
 to be a consideration.
 
 @node Rebuilding the GNAT Run-Time Library,,Stand-alone Ada Libraries,GNAT and 
Libraries
-@anchor{gnat_ugn/the_gnat_compilation_model 
id46}@anchor{80}@anchor{gnat_ugn/the_gnat_compilation_model 
rebuilding-the-gnat-run-time-library}@anchor{81}
+@anchor{gnat_ugn/the_gnat_compilation_model 
id46}@anchor{81}@anchor{gnat_ugn/the_gnat_compilation_model 
rebuilding-the-gnat-run-time-library}@anchor{82}
 @subsection Rebuilding the GNAT Run-Time Library
 
 
@@ -3946,7 +3949,7 @@ experiments or debugging, and is not supported.
 @geindex Conditional compilation
 
 @node Conditional Compilation,Mixed Language Programming,GNAT and 
Libraries,The GNAT Compilation Model
-@anchor{gnat_ugn/the_gnat_compilation_model 
conditional-compilation}@anchor{2b}@anchor{gnat_ugn/the_gnat_compilation_model 
id47}@anchor{82}
+@anchor{gnat_ugn/the_gnat_compilation_model 
conditional-compilation}@anchor{2b}@anchor{gnat_ugn/the_gnat_compilation_model 
id47}@anchor{83}
 @section Conditional Compilation
 
 
@@ -3963,7 +3966,7 @@ gnatprep preprocessor utility.
 @end menu
 
 @node Modeling Conditional Compilation in Ada,Preprocessing with 
gnatprep,,Conditional Compilation
-@anchor{gnat_ugn/the_gnat_compilation_model 
id48}@anchor{83}@anchor{gnat_ugn/the_gnat_compilation_model 
modeling-conditional-compilation-in-ada}@anchor{84}
+@anchor{gnat_ugn/the_gnat_compilation_model 
id48}@anchor{84}@anchor{gnat_ugn/the_gnat_compilation_model 
modeling-conditional-compilation-in-ada}@anchor{85}
 @subsection Modeling Conditional Compilation in Ada
 
 
@@ -4014,7 +4017,7 @@ be achieved using Ada in general, and GNAT in particular.
 @end menu
 
 @node Use of Boolean Constants,Debugging - A Special Case,,Modeling 
Conditional Compilation in Ada
-@anchor{gnat_ugn/the_gnat_compilation_model 

[COMMITTED] ada: Avoid crash in Find_Optional_Prim_Op

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

Find_Optional_Prim_Op can crash when the Underlying_Type is Empty.
This can happen when you are dealing with a structure type with a
private part that does not have its Full_View set yet.

gcc/ada/

* exp_util.adb (Find_Optional_Prim_Op): Stop deriving primitive
operation if there is no underlying type to derive it from.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index c74921e1772..66e1acbf65e 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -6291,6 +6291,11 @@ package body Exp_Util is
 
   Typ := Underlying_Type (Typ);
 
+  --  We cannot find the operation if there is no full view available.
+  if Typ = Empty then
+ return Empty;
+  end if;
+
   --  Loop through primitive operations
 
   Prim := First_Elmt (Primitive_Operations (Typ));
-- 
2.40.0



[COMMITTED] ada: Improve error message on violation of SPARK_Mode rules

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Yannick Moy 

SPARK_Mode On can only be used on library-level entities.
Improve the error message here.

gcc/ada/

* errout.ads: Add explain code.
* sem_prag.adb (Check_Library_Level_Entity): Refine error message
and add explain code.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errout.ads   | 1 +
 gcc/ada/sem_prag.adb | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/errout.ads b/gcc/ada/errout.ads
index 80dd7dfaead..2065d73614a 100644
--- a/gcc/ada/errout.ads
+++ b/gcc/ada/errout.ads
@@ -622,6 +622,7 @@ package Errout is
GEC_Volatile_Non_Interfering_Context : constant := 0004;
GEC_Required_Part_Of : constant := 0009;
GEC_Ownership_Moved_Object   : constant := 0010;
+   GEC_SPARK_Mode_On_Not_Library_Level  : constant := 0011;
 

-- List Pragmas Table --
diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index c5810685dc3..6de87fbaba9 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -24144,7 +24144,8 @@ package body Sem_Prag is
 
--  Local variables
 
-   Msg_1 : constant String := "incorrect placement of pragma%";
+   Msg_1 : constant String :=
+ "incorrect placement of pragma% with value ""On"" '[[]']";
Msg_2 : Name_Id;
 
 --  Start of processing for Check_Library_Level_Entity
@@ -24161,6 +24162,7 @@ package body Sem_Prag is
  and then Instantiation_Location (Sloc (N)) = No_Location
then
   Error_Msg_Name_1 := Pname;
+  Error_Msg_Code := GEC_SPARK_Mode_On_Not_Library_Level;
   Error_Msg_N (Fix_Error (Msg_1), N);
 
   Name_Len := 0;
-- 
2.40.0



[COMMITTED] ada: Finalization not performed for component of protected type

2023-07-06 Thread Marc Poulhiès via Gcc-patches
From: Steve Baird 

In some cases involving a discriminated protected type with an array
component that is subject to a discriminant-dependent index constraint,
where the element type of the array requires finalization and the array
type has not yet been frozen at the point of the declaration of the protected
type, finalization of an object of the protected type may incorrectly omit
finalization of the array component. One case where this scenario can arise
is an instantiation of Ada.Containers.Bounded_Synchronized_Queues, passing in
an Element type that requires finalization.

gcc/ada/

* exp_ch7.adb (Make_Final_Call): Add assertion that if no
finalization call is generated, then the type of the object being
finalized does not require finalization.
* freeze.adb (Freeze_Entity): If freezing an already-frozen
subtype, do not assume that nothing needs to be done. In the case
of a frozen subtype of a non-frozen type or subtype (which is
possible), freeze the non-frozen entity.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb |  2 ++
 gcc/ada/freeze.adb  | 15 ++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 1b16839ddf3..aa16c707887 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -8387,6 +8387,8 @@ package body Exp_Ch7 is
  Param => Ref,
  Skip_Self => Skip_Self);
   else
+ pragma Assert (Serious_Errors_Detected > 0
+or else not Has_Controlled_Component (Utyp));
  return Empty;
   end if;
end Make_Final_Call;
diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 83ce0300871..38aeb2456ff 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -6188,7 +6188,20 @@ package body Freeze is
   --  Do not freeze if already frozen since we only need one freeze node
 
   if Is_Frozen (E) then
- Result := No_List;
+
+ if Is_Itype (E)
+   and then not Is_Base_Type (E)
+   and then not Is_Frozen (Etype (E))
+ then
+--  If a frozen subtype of an unfrozen type seems impossible
+--  then see Analyze_Protected_Definition.Undelay_Itypes.
+
+Result := Freeze_Entity
+(Etype (E), N, Do_Freeze_Profile => Do_Freeze_Profile);
+ else
+Result := No_List;
+ end if;
+
  goto Leave;
 
   --  Do not freeze if we are preanalyzing without freezing
-- 
2.40.0



RE: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.

2023-07-06 Thread Richard Biener via Gcc-patches
On Thu, 6 Jul 2023, Tamar Christina wrote:

> > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > expand_vector_piecewise does not support VLA expansion as it has a
> > > hard assert on the type not being VLA.
> > >
> > > Instead of just failing to expand and so the call marked unsupported we 
> > > ICE.
> > > This adjust it so we don't and can gracefully handle the expansion in
> > > support checks.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > Hmm, do we support _any_ VLA "generic" vectors?  That is, why do we get
> > here at all?  Doesn't that mean the vectorizer creates code that vector 
> > lowering
> > thinks is not supported by the target?
> > 
> > In any case I'd expect expand_vector_operations_1 at
> > 
> >   if (compute_type == NULL_TREE)
> > compute_type = get_compute_type (code, op, type);
> >   if (compute_type == type)
> > return;
> > 
> >  <  here
> > 
> >   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code,
> >  dce_ssa_names);
> > 
> > to be able to assert that compute_type (or even type) isn't VLA?
> > 
> > So, why do we arrive here?
> > 
> 
> I think we used to arrive here because the patch last year didn't properly 
> check the cmp,
> I don't his it with this new patch so I'll drop it.  I thought it was an 
> actual bug hence why I
> submitted the patch ?

If it's a genuine bug then the fix at least looks wrong ;)

Anyway, dropping is fine with me of course.

Richard.


[PATCH] tree-optimization/110563 - simplify epilogue VF checks

2023-07-06 Thread Richard Biener via Gcc-patches
The following consolidates an assert that now hits for ppc64le
with an earlier check we already do, simplifying
vect_determine_partial_vectors_and_peeling and getting rid of
its now redundant argument.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110563
* tree-vectorizer.h (vect_determine_partial_vectors_and_peeling):
Remove second argument.
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Remove for_epilogue_p argument.  Merge assert ...
(vect_analyze_loop_2): ... with check done before determining
partial vectors by moving it after.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust.
---
 gcc/tree-vect-loop-manip.cc |  3 +--
 gcc/tree-vect-loop.cc   | 54 -
 gcc/tree-vectorizer.h   |  3 +--
 3 files changed, 19 insertions(+), 41 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6c452e07880..d66d4a6de69 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3461,8 +3461,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
 a multiple of the epilogue loop's vectorization factor.
 We should have rejected the loop during the analysis phase
 if this fails.  */
-  bool res = vect_determine_partial_vectors_and_peeling (epilogue_vinfo,
-true);
+  bool res = vect_determine_partial_vectors_and_peeling (epilogue_vinfo);
   gcc_assert (res);
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4d9abd035ea..36d19a55e22 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2494,16 +2494,10 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
In this case:
 
  LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false
-
-   When FOR_EPILOGUE_P is true, make this determination based on the
-   assumption that LOOP_VINFO is an epilogue loop, otherwise make it
-   based on the assumption that LOOP_VINFO is the main loop.  The caller
-   has made sure that the number of iterations is set appropriately for
-   this value of FOR_EPILOGUE_P.  */
+ */
 
 opt_result
-vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo,
-   bool for_epilogue_p)
+vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo)
 {
   /* Determine whether there would be any scalar iterations left over.  */
   bool need_peeling_or_partial_vectors_p
@@ -2537,25 +2531,12 @@ vect_determine_partial_vectors_and_peeling 
(loop_vec_info loop_vinfo,
 }
 
   if (dump_enabled_p ())
-{
-  if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
-   dump_printf_loc (MSG_NOTE, vect_location,
-"operating on partial vectors%s.\n",
-for_epilogue_p ? " for epilogue loop" : "");
-  else
-   dump_printf_loc (MSG_NOTE, vect_location,
-"operating only on full vectors%s.\n",
-for_epilogue_p ? " for epilogue loop" : "");
-}
-
-  if (for_epilogue_p)
-{
-  loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
-  gcc_assert (orig_loop_vinfo);
-  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
-   gcc_assert (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
- LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)));
-}
+dump_printf_loc (MSG_NOTE, vect_location,
+"operating on %s vectors%s.\n",
+LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+? "partial" : "full",
+LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+? " for epilogue loop" : "");
 
   LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
 = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
@@ -3017,11 +2998,19 @@ start_over:
LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
 }
 
+  /* Decide whether this loop_vinfo should use partial vectors or peeling,
+ assuming that the loop will be used as a main loop.  We will redo
+ this analysis later if we instead decide to use the loop as an
+ epilogue loop.  */
+  ok = vect_determine_partial_vectors_and_peeling (loop_vinfo);
+  if (!ok)
+return ok;
+
   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
  to be able to handle fewer than VF scalars, or needs to have a lower VF
  than the main loop.  */
   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
-  && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+  && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
 {
   poly_uint64 unscaled_vf
= exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo),
@@ -3032,15 +3021,6 @@ start_over:
   " epilogue loop.\n");
 }
 
-  /* Decide whether this 

RE: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.

2023-07-06 Thread Tamar Christina via Gcc-patches
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > expand_vector_piecewise does not support VLA expansion as it has a
> > hard assert on the type not being VLA.
> >
> > Instead of just failing to expand and so the call marked unsupported we ICE.
> > This adjust it so we don't and can gracefully handle the expansion in
> > support checks.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Hmm, do we support _any_ VLA "generic" vectors?  That is, why do we get
> here at all?  Doesn't that mean the vectorizer creates code that vector 
> lowering
> thinks is not supported by the target?
> 
> In any case I'd expect expand_vector_operations_1 at
> 
>   if (compute_type == NULL_TREE)
> compute_type = get_compute_type (code, op, type);
>   if (compute_type == type)
> return;
> 
>  <  here
> 
>   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code,
>  dce_ssa_names);
> 
> to be able to assert that compute_type (or even type) isn't VLA?
> 
> So, why do we arrive here?
> 

I think we used to arrive here because the patch last year didn't properly 
check the cmp,
I don't his it with this new patch so I'll drop it.  I thought it was an actual 
bug hence why I
submitted the patch 

Thanks,
Tamar
> Richard.
> 
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if
> not
> > constant.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index
> >
> df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6
> ed0b2f4
> > c3c222d58a8d 100644
> > --- a/gcc/tree-vect-generic.cc
> > +++ b/gcc/tree-vect-generic.cc
> > @@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator
> *gsi, tree type, tree op0,
> > }
> >   t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
> > }
> > -  else
> > +  else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > t = expand_vector_piecewise (gsi, do_compare, type,
> >  TREE_TYPE (TREE_TYPE (op0)), op0, op1,
> >  code, false);
> >
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> Boudien Moerman; HRB 36809 (AG Nuernberg)


[Committed] Handle COPYSIGN in dwarf2out.cc'd mem_loc_descriptor

2023-07-06 Thread Roger Sayle

Many thanks to Hans-Peter Nilsson for reminding me that new RTX codes
need to be added to dwarf2out.cc's mem_loc_descriptor, and for doing
this for BITREVERSE.  This patch does the same for the recently added
COPYSIGN.  I'd been testing these on a target that doesn't use DWARF
(nvptx-none) and so didn't exhibit the issue, and my additional testing
on x86_64-pc-linux-gnu to double check that changes were safe, doesn't
(yet) trigger the problematic assert in dwarf2out.cc's mem_loc_descriptor.

Committed to mainline as obvious, after bootstrapping and regression
testing on x86_64-pc-linux-gnu.


2023-07-06  Roger Sayle  

gcc/ChangeLog
* dwarf2out.cc (mem_loc_descriptor): Handle COPYSIGN.

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index e973644..238d0a9 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -16941,6 +16941,7 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
 case SMUL_HIGHPART:
 case UMUL_HIGHPART:
 case BITREVERSE:
+case COPYSIGN:
   break;
 
 case CONST_STRING:


RE: [PATCH] Initial Granite Rapids D Support

2023-07-06 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Mo, Zewei 
> Sent: Thursday, July 6, 2023 2:37 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Initial Granite Rapids D Support
> 
> Hi all,
> 
> This patch is to add initial support for Granite Rapids D for GCC.
> The link of related information is listed below:
> https://www.intel.com/content/www/us/en/develop/download/intel-
> architecture-instruction-set-extensions-programming-reference.html
> 
> Also, the patch of removing AMX-COMPLEX from Granite Rapids will be
> backported to GCC13.
Ok.
> 
> This has been tested on x86_64-pc-linux-gnu. Is this ok for trunk? Thank you.
> 
> Sincerely,
> Zewei Mo
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/cpuinfo.h
>   (get_intel_cpu): Handle Granite Rapids D.
>   * common/config/i386/i386-common.cc:
>   (processor_names): Add graniterapids-d.
>   (processor_alias_table): Ditto.
>   * common/config/i386/i386-cpuinfo.h
>   (enum processor_subtypes): Add INTEL_GRANITERAPIDS_D.
>   * config.gcc: Add -march=graniterapids-d.
>   * config/i386/driver-i386.cc (host_detect_local_cpu):
>   Handle graniterapids-d.
>   * config/i386/i386-c.cc (ix86_target_macros_internal):
>   Ditto.
>   * config/i386/i386-options.cc (m_GRANITERAPIDSD): New.
>   (processor_cost_table): Add graniterapids-d.
>   * config/i386/i386.h (enum processor_type):
>   Add PROCESSOR_GRANITERAPIDS_D.
>   * doc/extend.texi: Add graniterapids-d.
>   * doc/invoke.texi: Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/mv16.C: Add graniterapids-d.
>   * gcc.target/i386/funcspec-56.inc: Handle new march.
> ---
>  gcc/common/config/i386/cpuinfo.h  |  9 -
>  gcc/common/config/i386/i386-common.cc |  3 +++
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/config.gcc|  2 +-
>  gcc/config/i386/driver-i386.cc|  3 +++
>  gcc/config/i386/i386-c.cc |  7 +++
>  gcc/config/i386/i386-options.cc   |  4 +++-
>  gcc/config/i386/i386.h|  5 -
>  gcc/doc/extend.texi   |  3 +++
>  gcc/doc/invoke.texi   | 11 +++
>  gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
>  12 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index ae48bc17771..7c2565c1d93 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -565,7 +565,6 @@ get_intel_cpu (struct __processor_model
> *cpu_model,
>cpu_model->__cpu_type = INTEL_SIERRAFOREST;
>break;
>  case 0xad:
> -case 0xae:
>/* Granite Rapids.  */
>cpu = "graniterapids";
>CHECK___builtin_cpu_is ("corei7"); @@ -573,6 +572,14 @@
> get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_type = INTEL_COREI7;
>cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
>break;
> +case 0xae:
> +  /* Granite Rapids D.  */
> +  cpu = "graniterapids-d";
> +  CHECK___builtin_cpu_is ("corei7");
> +  CHECK___builtin_cpu_is ("graniterapids-d");
> +  cpu_model->__cpu_type = INTEL_COREI7;
> +  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS_D;
> +  break;
>  case 0xb6:
>/* Grand Ridge.  */
>cpu = "grandridge";
> diff --git a/gcc/common/config/i386/i386-common.cc
> b/gcc/common/config/i386/i386-common.cc
> index bf126f14073..5a337c5b8be 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -1971,6 +1971,7 @@ const char *const processor_names[] =
>"alderlake",
>"rocketlake",
>"graniterapids",
> +  "graniterapids-d",
>"intel",
>"lujiazui",
>"geode",
> @@ -2094,6 +2095,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL,
> PTA_GRANITERAPIDS,
>  M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
> +  {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
> PTA_GRANITERAPIDS_D,
> +M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
> P_PROC_AVX512F},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
>{"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL, diff --git
> a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-
> cpuinfo.h
> index 2dafbb25a49..254dfec70e5 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -98,6 +98,7 @@ enum processor_subtypes
>ZHAOXIN_FAM7H_LUJIAZUI,
>AMDFAM19H_ZNVER4,
>INTEL_COREI7_GRANITERAPIDS,
> +  INTEL_COREI7_GRANITERAPIDS_D,
>CPU_SUBTYPE_MAX
>  };

[PATCH] tree-optimization/110515 - wrong code with LIM + PRE

2023-07-06 Thread Richard Biener via Gcc-patches
In this PR we face the issue that LIM speculates a load when
hoisting it out of the loop (since it knows it cannot trap).
Unfortunately this exposes undefined behavior when the load
accesses memory with the wrong dynamic type.  This later
makes PRE use that representation instead of the original
which accesses the same memory location but using a different
dynamic type leading to a wrong disambiguation of that
original access against another and thus a wrong-code transform.

Fortunately there already is code in PRE dealing with a similar
situation for code hoisting but that left a small gap which
when fixed also fixes the wrong-code transform in this bug even
if it doesn't address the underlying issue of LIM speculating
that load.

The upside is this fix is trivially safe to backport and chances
of code generation regressions are very low.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110515
* tree-ssa-pre.cc (compute_avail): Make code dealing
with hoisting loads with different alias-sets more
robust.

* g++.dg/opt/pr110515.C: New testcase.
---
 gcc/testsuite/g++.dg/opt/pr110515.C | 223 
 gcc/tree-ssa-pre.cc |   1 +
 2 files changed, 224 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr110515.C

diff --git a/gcc/testsuite/g++.dg/opt/pr110515.C 
b/gcc/testsuite/g++.dg/opt/pr110515.C
new file mode 100644
index 000..7a75cea3b4b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr110515.C
@@ -0,0 +1,223 @@
+// { dg-do run }
+// { dg-require-effective-target c++11 }
+// { dg-options "-O2" }
+
+typedef __UINT64_TYPE__ u64;
+
+struct SmallDenseMap {
+  static constexpr u64 EmptyKey = 0xC0FFEUL;
+  struct V { u64 v; };
+
+  bool contains(u64 Val) {
+V *TheSlot = nullptr;
+return (LookupSlotFor(Val, TheSlot) ? 1 : 0);
+  }
+
+  void try_emplace(u64 Key) {
+V *TheSlot = nullptr;
+if (LookupSlotFor(Key, TheSlot))
+  return;
+
+// Otherwise, insert the new element.
+InsertIntoSlot(TheSlot, Key);
+  }
+
+  void moveFromOldSlots(V *OldSlotsBegin, V *OldSlotsEnd) {
+Size = 0;
+
+V *B_ = u.o.Slots;
+V *E_ = B_ + u.o.Capacity;
+for (; B_ != E_; ++B_)
+  B_->v = EmptyKey;
+
+// Insert all the old elements.
+V *O = OldSlotsBegin;
+V *E = OldSlotsEnd;
+for (; O != E; ++O) {
+  if (O->v != EmptyKey) {
+// Insert the key/value into the new table.
+V * N = nullptr;
+LookupSlotFor(O->v, N);
+N->v = O->v;
+Size++;
+  }
+}
+  }
+
+  void InsertIntoSlot(V *TheSlot, u64 Key) {
+unsigned NewSize = Size + 1;
+unsigned Capacity = getCapacity();
+// Make sure we always keep at least one Empty value
+if (NewSize >= Capacity) {
+  //fprintf(stderr, "GROW: size=%u capacity=%u -> ...\n", Size, Capacity);
+  grow();
+  LookupSlotFor(Key, TheSlot);
+  Capacity = getCapacity();
+  //fprintf(stderr, "GROW: ... -> size=%u capacity=%u\n", NewSize, 
Capacity);
+}
+
+Size++;
+
+TheSlot->v = Key;
+  }
+
+  bool LookupSlotFor(u64 Val,
+   V *) {
+V *SlotsPtr = getSlots();
+const unsigned Capacity = getCapacity();
+
+for (unsigned i = 0; i < Capacity; ++i) {
+  V *ThisSlot = SlotsPtr + i;
+  if (Val == ThisSlot->v) {
+FoundSlot = ThisSlot;
+return true;
+  }
+
+  if (ThisSlot->v == EmptyKey) {
+FoundSlot = ThisSlot;
+return false;
+  }
+}
+// Guarantee that within an array there is a match
+// or Empty value where to insert a new vaue.
+__builtin_trap();
+  }
+
+  // Needs to bea at least 1 to hld one empty value
+  static constexpr unsigned InlineSlots = 2;
+
+  bool Small;
+  unsigned Size;
+
+  struct LargeRep {
+V *Slots;
+unsigned Capacity;
+  };
+
+  union {
+  V i[InlineSlots]; // Small = true
+  LargeRep o;   // Small = false
+  } u;
+
+  explicit SmallDenseMap() : Small(true), Size(0) {
+Size = 0;
+
+V *B = u.i;
+V *E = B + InlineSlots;
+for (; B != E; ++B)
+  B->v = EmptyKey;
+  }
+
+  void grow() {
+// assert:
+if (!Small) __builtin_trap();
+
+// First move the inline Slots into a temporary storage.
+V TmpStorage[InlineSlots];
+V *TmpBegin = TmpStorage;
+V *TmpEnd = TmpBegin;
+
+// Loop over the Slots, moving non-empty, non-tombstones into the
+// temporary storage. Have the loop move the TmpEnd forward as it goes.
+V *P = u.i;
+V *E = P + InlineSlots;
+for (; P != E; ++P) {
+if (P->v != EmptyKey) {
+TmpEnd->v = P->v;
+++TmpEnd;
+}
+}
+
+Small = false;
+u.o = LargeRep{new V[128], 128};
+moveFromOldSlots(TmpBegin, TmpEnd);
+  }
+
+  V *getSlots() {
+if (Small) {
+  V * inl = u.i;
+  return inl;
+}
+else {
+  LargeRep * rep = 
+  return rep->Slots;
+}
+  }
+
+  unsigned 

Re: [PATCH] i386: Update document for inlining rules

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 8:39 AM Hongyu Wang  wrote:
>
> Hi,
>
> This is a follow-up patch for
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623525.html
> that updates document about x86 inlining rules.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * doc/extend.texi: Move x86 inlining rule to a new subsubsection
> and add description for inling of function with arch and tune
> attributes.

LGTM.

Thanks,
Uros.

> ---
>  gcc/doc/extend.texi | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d1b018ee6d6..d701b4d1d41 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -7243,11 +7243,6 @@ Prefer 256-bit vector width for instructions.
>  Prefer 512-bit vector width for instructions.
>  @end table
>
> -On the x86, the inliner does not inline a
> -function that has different target options than the caller, unless the
> -callee has a subset of the target options of the caller.  For example
> -a function declared with @code{target("sse3")} can inline a function
> -with @code{target("sse2")}, since @code{-msse3} implies @code{-msse2}.
>  @end table
>
>  @cindex @code{indirect_branch} function attribute, x86
> @@ -7361,6 +7356,20 @@ counterpart to option 
> @option{-mno-direct-extern-access}.
>
>  @end table
>
> +@subsubsection Inlining rules
> +On the x86, the inliner does not inline a
> +function that has different target options than the caller, unless the
> +callee has a subset of the target options of the caller.  For example
> +a function declared with @code{target("sse3")} can inline a function
> +with @code{target("sse2")}, since @code{-msse3} implies @code{-msse2}.
> +
> +Besides the basic rule, when a function specifies
> +@code{target("arch=@var{ARCH}")} or @code{target("tune=@var{TUNE}")}
> +attribute, the inlining rule will be different. It allows inlining of
> +a function with default @option{-march=x86-64} and
> +@option{-mtune=generic} specified, or a function that has a subset
> +of ISA features and marked with always_inline.
> +
>  @node Xstormy16 Function Attributes
>  @subsection Xstormy16 Function Attributes
>
> --
> 2.31.1
>


RE: [PATCH V2] VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop control.

2023-07-06 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Thursday, July 6, 2023 3:09 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH V2] VECT: Fix ICE of variable stride on strieded load/store 
with SELECT_VL loop control.

On Thu, 6 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE
> with SELECT_VL loop control.

OK.

> Consider this following case:
> #define TEST_LOOP(DATA_TYPE, BITS)
>  \
>   void __attribute__ ((noinline, noclone))
>  \
>   f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,  
>  \
> INDEX##BITS stride, INDEX##BITS n)   \
>   {   
>  \
> for (INDEX##BITS i = 0; i < n; ++i)   
>  \
>   dest[i] += src[i * stride]; 
>  \
>   }
> 
> When "stride" is a constant, current flow works fine.
> However, when "stride" is a variable. It causes an ICE:
> # vectp_src.67_85 = PHI 
> ...
> _96 = .SELECT_VL (ivtmp_94, 4);
> ...
> ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 
> }, { -1, -1, -1, -1 }, _96, 0);
> ...
> vectp_src.67_86 = vectp_src.67_85 + ivtmp_78;
> 
> Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> 
> Instead, I split the IR into:
> 
> step_stride = _39
> step = step_stride * 4
> ivtmp_78 = step * _96
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.
> 
> ---
>  gcc/tree-vect-stmts.cc | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c10a4be60eb..10e71178ce7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3176,10 +3176,8 @@ vect_get_strided_load_store_ops (stmt_vec_info 
> stmt_info,
>   = fold_build2 (MULT_EXPR, sizetype,
>  fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
>  loop_len);
> -  tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
> -  gassign *assign = gimple_build_assign (bump, tmp);
> -  gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> -  *dataref_bump = bump;
> +  *dataref_bump = force_gimple_operand_gsi (gsi, tmp, true, NULL_TREE, 
> true,
> + GSI_SAME_STMT);
>  }
>else
>  {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH V2] VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop control.

2023-07-06 Thread Richard Biener via Gcc-patches
On Thu, 6 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE
> with SELECT_VL loop control.

OK.

> Consider this following case:
> #define TEST_LOOP(DATA_TYPE, BITS)
>  \
>   void __attribute__ ((noinline, noclone))
>  \
>   f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,  
>  \
> INDEX##BITS stride, INDEX##BITS n)   \
>   {   
>  \
> for (INDEX##BITS i = 0; i < n; ++i)   
>  \
>   dest[i] += src[i * stride]; 
>  \
>   }
> 
> When "stride" is a constant, current flow works fine.
> However, when "stride" is a variable. It causes an ICE:
> # vectp_src.67_85 = PHI 
> ...
> _96 = .SELECT_VL (ivtmp_94, 4);
> ...
> ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 
> }, { -1, -1, -1, -1 }, _96, 0);
> ...
> vectp_src.67_86 = vectp_src.67_85 + ivtmp_78;
> 
> Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> 
> Instead, I split the IR into:
> 
> step_stride = _39
> step = step_stride * 4
> ivtmp_78 = step * _96
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.
> 
> ---
>  gcc/tree-vect-stmts.cc | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c10a4be60eb..10e71178ce7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3176,10 +3176,8 @@ vect_get_strided_load_store_ops (stmt_vec_info 
> stmt_info,
>   = fold_build2 (MULT_EXPR, sizetype,
>  fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
>  loop_len);
> -  tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
> -  gassign *assign = gimple_build_assign (bump, tmp);
> -  gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> -  *dataref_bump = bump;
> +  *dataref_bump = force_gimple_operand_gsi (gsi, tmp, true, NULL_TREE, 
> true,
> + GSI_SAME_STMT);
>  }
>else
>  {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] Fix expectation on gcc.dg/vect/pr71264.c

2023-07-06 Thread Richard Biener via Gcc-patches
With the recent change to more reliably not vectorize code already
using vector types we run into FAILs of gcc.dg/vect/pr71264.c
The testcase was added for fixing an ICE and possible (re-)vectorization
of the code isn't really supported and I suspect might even go
wrong for non-bitops.

The following leaves the testcase as just testing for an ICE.

Pushed.

PR tree-optimization/110544
* gcc.dg/vect/pr71264.c: Remove scan for vectorization.
---
 gcc/testsuite/gcc.dg/vect/pr71264.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr71264.c 
b/gcc/testsuite/gcc.dg/vect/pr71264.c
index 1381e0ed132..b372c00832a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr71264.c
+++ b/gcc/testsuite/gcc.dg/vect/pr71264.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target vect_int } */
 
 typedef unsigned char uint8_t;
 typedef uint8_t footype __attribute__((vector_size(4)));
@@ -18,5 +17,3 @@ void test(uint8_t *ptr, uint8_t *mask)
   __builtin_memcpy([i], , sizeof(temp));
 }
 }
-
-/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail 
{ { s390*-*-* sparc*-*-* } || vect32 } } } } */
-- 
2.35.3


Re: Re: [PATCH] VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop control.

2023-07-06 Thread juzhe.zh...@rivai.ai
Thank you so much.

I have sent V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623734.html 

which is working fine for both stride = constant and variable.

Could you take a look at it?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-06 14:43
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Fix ICE of variable stride on strieded load/store 
with SELECT_VL loop control.
On Thu, 6 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE
> with SELECT_VL loop control.
> 
> Consider this following case:
> #define TEST_LOOP(DATA_TYPE, BITS)
>  \
>   void __attribute__ ((noinline, noclone))
>  \
>   f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,  
>  \
>   INDEX##BITS stride, INDEX##BITS n)   \
>   {   
>  \
> for (INDEX##BITS i = 0; i < n; ++i)   
>  \
>   dest[i] += src[i * stride]; 
>  \
>   }
> 
> When "stride" is a constant, current flow works fine.
> However, when "stride" is a variable. It causes an ICE:
> # vectp_src.67_85 = PHI 
> ...
> _96 = .SELECT_VL (ivtmp_94, 4);
> ...
> ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 
> }, { -1, -1, -1, -1 }, _96, 0);
> ...
> vectp_src.67_86 = vectp_src.67_85 + ivtmp_78;
> 
> Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> 
> Instead, I split the IR into:
> 
> step_stride = _39
> step = step_stride * 4
> ivtmp_78 = step * _96
> 
> I don't think this patch's code is elegant enough, could you help me refine 
> these codes?
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.
> 
> ---
>  gcc/tree-vect-stmts.cc | 38 +-
>  1 file changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c10a4be60eb..12d1b0f1ac0 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3172,12 +3172,40 @@ vect_get_strided_load_store_ops (stmt_vec_info 
> stmt_info,
>  vectp_a.9_26 = vectp_a.9_7 + ivtmp_8;  */
>tree loop_len
>  = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
> -  tree tmp
> - = fold_build2 (MULT_EXPR, sizetype,
> -fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> -loop_len);
> +  tree tmp;
> +  gassign *assign;
> +
> +  if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
> + tmp = fold_build2 (MULT_EXPR, sizetype,
> +fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> +loop_len);
> +  else
> + {
> +   /* If DR_STEP = (unsigned int) _37 * 4;
> +  Extract _37 and 4, explicit MULT_EXPR.  */
> +
> +   /* 1. step_stride = (unsigned int) _37.  */
> +   tree step_stride = make_ssa_name (create_tmp_var (sizetype));
> +   assign = gimple_build_assign (
> + step_stride, TREE_OPERAND (TREE_OPERAND (DR_STEP (dr), 0), 0));
> +   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> +
> +   /* 2. step = step_stride * 4.  */
> +   tree step_align = TREE_OPERAND (TREE_OPERAND (DR_STEP (dr), 0), 1);
> +   tree step = make_ssa_name (create_tmp_var (sizetype));
> +   assign
> + = gimple_build_assign (step, fold_build2 (MULT_EXPR, sizetype,
> +   step_stride, step_align));
> +   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> +
> +   /* 3. tmp = step * loop_len.  */
> +   tmp = make_ssa_name (create_tmp_var (sizetype));
> +   assign = gimple_build_assign (tmp, fold_build2 (MULT_EXPR, sizetype,
> +   step, loop_len));
> +   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> + }
>tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
> -  gassign *assign = gimple_build_assign (bump, tmp);
 
instead of
 
  tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
  gassign *assign = gimple_build_assign (bump, tmp);
 
you can simply do
 
  tree bump = force_gimple_operand_gsi (gsi, tmp, true, NULL_TREE,
true, GSI_SAME_STMT);  
 
That's all that is needed.
 
Richard.
 
 
> +  assign = gimple_build_assign (bump, tmp);
>gsi_insert_before (gsi, assign, GSI_SAME_STMT);
>*dataref_bump = bump;
>  }
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 


[PATCH V2] VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop control.

2023-07-06 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richi.

Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE
with SELECT_VL loop control.

Consider this following case:
#define TEST_LOOP(DATA_TYPE, BITS) \
  void __attribute__ ((noinline, noclone)) \
  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
  INDEX##BITS stride, INDEX##BITS n)   \
  {\
for (INDEX##BITS i = 0; i < n; ++i)\
  dest[i] += src[i * stride];  \
  }

When "stride" is a constant, current flow works fine.
However, when "stride" is a variable. It causes an ICE:
# vectp_src.67_85 = PHI 
...
_96 = .SELECT_VL (ivtmp_94, 4);
...
ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 
}, { -1, -1, -1, -1 }, _96, 0);
...
vectp_src.67_86 = vectp_src.67_85 + ivtmp_78;

Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;

Instead, I split the IR into:

step_stride = _39
step = step_stride * 4
ivtmp_78 = step * _96

Thanks.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.

---
 gcc/tree-vect-stmts.cc | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c10a4be60eb..10e71178ce7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3176,10 +3176,8 @@ vect_get_strided_load_store_ops (stmt_vec_info stmt_info,
= fold_build2 (MULT_EXPR, sizetype,
   fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
   loop_len);
-  tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
-  gassign *assign = gimple_build_assign (bump, tmp);
-  gsi_insert_before (gsi, assign, GSI_SAME_STMT);
-  *dataref_bump = bump;
+  *dataref_bump = force_gimple_operand_gsi (gsi, tmp, true, NULL_TREE, 
true,
+   GSI_SAME_STMT);
 }
   else
 {
-- 
2.36.3



Re: [PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-07-06 Thread Christoph Müllner
On Thu, Jun 29, 2023 at 4:09 PM Jeff Law  wrote:
>
>
>
> On 6/29/23 01:39, Christoph Müllner wrote:
> > On Wed, Jun 28, 2023 at 8:23 PM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 6/28/23 06:39, Christoph Müllner wrote:
> >>
> > +;; XTheadMemIdx overview:
> > +;; All peephole passes attempt to improve the operand utilization of
> > +;; XTheadMemIdx instructions, where one sign or zero extended
> > +;; register-index-operand can be shifted left by a 2-bit immediate.
> > +;;
> > +;; The basic idea is the following optimization:
> > +;; (set (reg 0) (op (reg 1) (imm 2)))
> > +;; (set (reg 3) (mem (plus (reg 0) (reg 4)))
> > +;; ==>
> > +;; (set (reg 3) (mem (plus (reg 4) (op2 (reg 1) (imm 2
> > +;; This optimization only valid if (reg 0) has no further uses.
>  Couldn't this be done by combine if you created define_insn patterns
>  rather than define_peephole2 patterns?  Similarly for the other cases
>  handled here.
> >>>
> >>> I was inspired by XTheadMemPair, which merges two memory accesses
> >>> into a mem-pair instruction (and which got inspiration from
> >>> gcc/config/aarch64/aarch64-ldpstp.md).
> >> Right.  I'm pretty familiar with those.  They cover a different case,
> >> specifically the two insns being optimized don't have a true data
> >> dependency between them.  ie, the first instruction does not produce a
> >> result used in the second insn.
> >>
> >>
> >> In the case above there is a data dependency on reg0.  ie, the first
> >> instruction generates a result used in the second instruction.  combine
> >> is usually the best place to handle the data dependency case.
> >
> > Ok, understood.
> >
> > It is a bit of a special case here, because the peephole is restricted
> > to those cases, where reg0 is not used elsewhere (peep2_reg_dead_p()).
> > I have not seen how to do this for combiner optimizations.
> If the value is used elsewhere, then the combiner will generate a
> parallel with two sets.  If the value dies, then the combiner generates
> the one set.  ie given
>
> (set (t) (op0 (a) (b)))
> (set (r) (op1 (c) (t)))
>
> If "t" is dead, then combine will present you with:
>
> (set (r) (op1 (c) (op0 (a) (b
>
> If "t" is used elsewhere, then combine will present you with:
>
> (parallel
>[(set (r) (op1 (c) (op0 (a) (b
> (set (t) (op0 (a) (b)))])
>
> Which makes perfect sense if you think about it for a while.  If you
> still need "t", then the first sequence simply isn't valid as it doesn't
> preserve that side effect.  Hence it tries to produce a sequence with
> the combined operation, but with the side effect of the first statement
> included as well.

Thanks for this!
Of course I was "lucky" and ran into the issue that the patterns did not match,
because of unexpected MULT insns where ASHIFTs were expected.
But after reading enough of combiner.cc I understood that this is on purpose
(for addresses) and I have to adjust my INSNs accordingly.

I've changed the patches for XTheadMemIdx and XTheadFMemIdx and will
send out a new series.

Thanks,
Christoph


Re: [PATCH] VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop control.

2023-07-06 Thread Richard Biener via Gcc-patches
On Thu, 6 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE
> with SELECT_VL loop control.
> 
> Consider this following case:
> #define TEST_LOOP(DATA_TYPE, BITS)
>  \
>   void __attribute__ ((noinline, noclone))
>  \
>   f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,  
>  \
> INDEX##BITS stride, INDEX##BITS n)   \
>   {   
>  \
> for (INDEX##BITS i = 0; i < n; ++i)   
>  \
>   dest[i] += src[i * stride]; 
>  \
>   }
> 
> When "stride" is a constant, current flow works fine.
> However, when "stride" is a variable. It causes an ICE:
> # vectp_src.67_85 = PHI 
> ...
> _96 = .SELECT_VL (ivtmp_94, 4);
> ...
> ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 
> }, { -1, -1, -1, -1 }, _96, 0);
> ...
> vectp_src.67_86 = vectp_src.67_85 + ivtmp_78;
> 
> Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
> 
> Instead, I split the IR into:
> 
> step_stride = _39
> step = step_stride * 4
> ivtmp_78 = step * _96
> 
> I don't think this patch's code is elegant enough, could you help me refine 
> these codes?
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.
> 
> ---
>  gcc/tree-vect-stmts.cc | 38 +-
>  1 file changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c10a4be60eb..12d1b0f1ac0 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3172,12 +3172,40 @@ vect_get_strided_load_store_ops (stmt_vec_info 
> stmt_info,
>vectp_a.9_26 = vectp_a.9_7 + ivtmp_8;  */
>tree loop_len
>   = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
> -  tree tmp
> - = fold_build2 (MULT_EXPR, sizetype,
> -fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> -loop_len);
> +  tree tmp;
> +  gassign *assign;
> +
> +  if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
> + tmp = fold_build2 (MULT_EXPR, sizetype,
> +fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
> +loop_len);
> +  else
> + {
> +   /* If DR_STEP = (unsigned int) _37 * 4;
> +  Extract _37 and 4, explicit MULT_EXPR.  */
> +
> +   /* 1. step_stride = (unsigned int) _37.  */
> +   tree step_stride = make_ssa_name (create_tmp_var (sizetype));
> +   assign = gimple_build_assign (
> + step_stride, TREE_OPERAND (TREE_OPERAND (DR_STEP (dr), 0), 0));
> +   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> +
> +   /* 2. step = step_stride * 4.  */
> +   tree step_align = TREE_OPERAND (TREE_OPERAND (DR_STEP (dr), 0), 1);
> +   tree step = make_ssa_name (create_tmp_var (sizetype));
> +   assign
> + = gimple_build_assign (step, fold_build2 (MULT_EXPR, sizetype,
> +   step_stride, step_align));
> +   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> +
> +   /* 3. tmp = step * loop_len.  */
> +   tmp = make_ssa_name (create_tmp_var (sizetype));
> +   assign = gimple_build_assign (tmp, fold_build2 (MULT_EXPR, sizetype,
> +   step, loop_len));
> +   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
> + }
>tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
> -  gassign *assign = gimple_build_assign (bump, tmp);

instead of

  tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
  gassign *assign = gimple_build_assign (bump, tmp);

you can simply do

  tree bump = force_gimple_operand_gsi (gsi, tmp, true, NULL_TREE,
true, GSI_SAME_STMT);  

That's all that is needed.

Richard.


> +  assign = gimple_build_assign (bump, tmp);
>gsi_insert_before (gsi, assign, GSI_SAME_STMT);
>*dataref_bump = bump;
>  }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] Initial Granite Rapids D Support

2023-07-06 Thread Mo, Zewei via Gcc-patches
Hi all,

This patch is to add initial support for Granite Rapids D for GCC.
The link of related information is listed below:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Also, the patch of removing AMX-COMPLEX from Granite Rapids will be backported
to GCC13.

This has been tested on x86_64-pc-linux-gnu. Is this ok for trunk? Thank you.

Sincerely,
Zewei Mo

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Granite Rapids D.
* common/config/i386/i386-common.cc:
(processor_names): Add graniterapids-d.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h
(enum processor_subtypes): Add INTEL_GRANITERAPIDS_D.
* config.gcc: Add -march=graniterapids-d.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle graniterapids-d.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Ditto.
* config/i386/i386-options.cc (m_GRANITERAPIDSD): New.
(processor_cost_table): Add graniterapids-d.
* config/i386/i386.h (enum processor_type):
Add PROCESSOR_GRANITERAPIDS_D.
* doc/extend.texi: Add graniterapids-d.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Add graniterapids-d.
* gcc.target/i386/funcspec-56.inc: Handle new march.
---
 gcc/common/config/i386/cpuinfo.h  |  9 -
 gcc/common/config/i386/i386-common.cc |  3 +++
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/config.gcc|  2 +-
 gcc/config/i386/driver-i386.cc|  3 +++
 gcc/config/i386/i386-c.cc |  7 +++
 gcc/config/i386/i386-options.cc   |  4 +++-
 gcc/config/i386/i386.h|  5 -
 gcc/doc/extend.texi   |  3 +++
 gcc/doc/invoke.texi   | 11 +++
 gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
 12 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index ae48bc17771..7c2565c1d93 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -565,7 +565,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_type = INTEL_SIERRAFOREST;
   break;
 case 0xad:
-case 0xae:
   /* Granite Rapids.  */
   cpu = "graniterapids";
   CHECK___builtin_cpu_is ("corei7");
@@ -573,6 +572,14 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_type = INTEL_COREI7;
   cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
   break;
+case 0xae:
+  /* Granite Rapids D.  */
+  cpu = "graniterapids-d";
+  CHECK___builtin_cpu_is ("corei7");
+  CHECK___builtin_cpu_is ("graniterapids-d");
+  cpu_model->__cpu_type = INTEL_COREI7;
+  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS_D;
+  break;
 case 0xb6:
   /* Grand Ridge.  */
   cpu = "grandridge";
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index bf126f14073..5a337c5b8be 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1971,6 +1971,7 @@ const char *const processor_names[] =
   "alderlake",
   "rocketlake",
   "graniterapids",
+  "graniterapids-d",
   "intel",
   "lujiazui",
   "geode",
@@ -2094,6 +2095,8 @@ const pta processor_alias_table[] =
 M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
 M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
+  {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL, 
PTA_GRANITERAPIDS_D,
+M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D), P_PROC_AVX512F},
   {"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
 M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
   {"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 2dafbb25a49..254dfec70e5 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -98,6 +98,7 @@ enum processor_subtypes
   ZHAOXIN_FAM7H_LUJIAZUI,
   AMDFAM19H_ZNVER4,
   INTEL_COREI7_GRANITERAPIDS,
+  INTEL_COREI7_GRANITERAPIDS_D,
   CPU_SUBTYPE_MAX
 };
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d88071773c9..1446eb2b3ca 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -682,7 +682,7 @@ silvermont knl knm skylake-avx512 cannonlake icelake-client 
icelake-server \
 skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
 sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 
\
 nano-x2 eden-x4 nano-x4 lujiazui 

[PATCH] i386: Update document for inlining rules

2023-07-06 Thread Hongyu Wang via Gcc-patches
Hi,

This is a follow-up patch for
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623525.html
that updates document about x86 inlining rules.

Ok for trunk?

gcc/ChangeLog:

* doc/extend.texi: Move x86 inlining rule to a new subsubsection
and add description for inling of function with arch and tune
attributes.
---
 gcc/doc/extend.texi | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d1b018ee6d6..d701b4d1d41 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7243,11 +7243,6 @@ Prefer 256-bit vector width for instructions.
 Prefer 512-bit vector width for instructions.
 @end table
 
-On the x86, the inliner does not inline a
-function that has different target options than the caller, unless the
-callee has a subset of the target options of the caller.  For example
-a function declared with @code{target("sse3")} can inline a function
-with @code{target("sse2")}, since @code{-msse3} implies @code{-msse2}.
 @end table
 
 @cindex @code{indirect_branch} function attribute, x86
@@ -7361,6 +7356,20 @@ counterpart to option @option{-mno-direct-extern-access}.
 
 @end table
 
+@subsubsection Inlining rules
+On the x86, the inliner does not inline a
+function that has different target options than the caller, unless the
+callee has a subset of the target options of the caller.  For example
+a function declared with @code{target("sse3")} can inline a function
+with @code{target("sse2")}, since @code{-msse3} implies @code{-msse2}.
+
+Besides the basic rule, when a function specifies
+@code{target("arch=@var{ARCH}")} or @code{target("tune=@var{TUNE}")}
+attribute, the inlining rule will be different. It allows inlining of
+a function with default @option{-march=x86-64} and
+@option{-mtune=generic} specified, or a function that has a subset
+of ISA features and marked with always_inline.
+
 @node Xstormy16 Function Attributes
 @subsection Xstormy16 Function Attributes
 
-- 
2.31.1



Re: [PATCH] x86: Properly find the maximum stack slot alignment

2023-07-06 Thread Richard Biener via Gcc-patches
On Thu, Jul 6, 2023 at 1:28 AM H.J. Lu via Gcc-patches
 wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  Also check memory accesses from registers defined by
> stack or frame registers.
>
> gcc/
>
> PR target/109780
> * config/i386/i386.cc (ix86_set_with_register_source): New.
> (ix86_find_all_stack_access): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.
> ---
>  gcc/config/i386/i386.cc| 145 ++---
>  gcc/testsuite/g++.target/i386/pr109780-1.C |  72 ++
>  gcc/testsuite/gcc.target/i386/pr109780-1.c |  14 ++
>  gcc/testsuite/gcc.target/i386/pr109780-2.c |  21 +++
>  4 files changed, 233 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index caca74d6dec..85dd8cb0581 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -8084,6 +8084,72 @@ output_probe_stack_range (rtx reg, rtx end)
>return "";
>  }
>
> +/* Check if PAT is a SET with register source.  */
> +
> +static void
> +ix86_set_with_register_source (rtx, const_rtx pat, void *data)
> +{
> +  if (GET_CODE (pat) != SET)
> +return;
> +
> +  rtx src = SET_SRC (pat);
> +  if (MEM_P (src) || CONST_INT_P (src))
> +return;
> +
> +  bool *may_use_register = (bool *) data;
> +  *may_use_register = true;
> +}
> +
> +/* Find all register access registers.  */
> +
> +static bool
> +ix86_find_all_stack_access (HARD_REG_SET _slot_access)
> +{
> +  bool repeat = false;
> +
> +  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +if (GENERAL_REGNO_P (i)
> +   && !TEST_HARD_REG_BIT (stack_slot_access, i))
> +  for (df_ref def = DF_REG_DEF_CHAIN (i);
> +  def != NULL;
> +  def = DF_REF_NEXT_REG (def))
> +   {
> + if (DF_REF_IS_ARTIFICIAL (def))
> +   continue;
> +
> + rtx_insn *insn = DF_REF_INSN (def);
> +
> + bool may_use_register = false;
> + note_stores (insn, ix86_set_with_register_source,
> +  _use_register);
> +
> + if (!may_use_register)
> +   continue;
> +
> + df_ref use;
> + FOR_EACH_INSN_USE (use, insn)
> +   {
> + rtx reg = DF_REF_REG (use);
> +
> + if (!REG_P (reg))
> +   continue;
> +
> + /* Skip if stack slot access register isn't used.  */
> + if (!TEST_HARD_REG_BIT (stack_slot_access,
> + REGNO (reg)))
> +   continue;
> +
> + /* Add this register to stack_slot_access.  */
> + add_to_hard_reg_set (_slot_access, Pmode, i);

So you are looking for uses of stack regs and then their defs, in the
end looking for memory accesses of them.  But you are doing this
weridly backwards?  I would have expected you start marking
values dependend on STACK_POINTER_REGNUM by walking
DF_REF_USE_CHAIN of it, queueing the use insn defs in a worklist
and in those insns also looking with note_stores?

Isn't the above way prone to needing more iterations and why is
a single worklist and thus visiting each regs uses at most once
enough?

> +
> + /* Repeat if a register is added to stack_slot_access.  */
> + repeat = true;
> +   }
> +   }
> +
> +  return repeat;
> +}
> +
>  /* Set stack_frame_required to false if stack frame isn't required.
> Update STACK_ALIGNMENT to the largest alignment, in bits, of stack
> slot used if stack frame is required and CHECK_STACK_SLOT is true.  */
> @@ -8092,15 +8158,23 @@ static void
>  ix86_find_max_used_stack_alignment (unsigned int _alignment,
> bool check_stack_slot)
>  {
> -  HARD_REG_SET set_up_by_prologue, prologue_used;
> +  HARD_REG_SET set_up_by_prologue, prologue_used, stack_slot_access;
>basic_block bb;
>
>CLEAR_HARD_REG_SET (prologue_used);
>CLEAR_HARD_REG_SET (set_up_by_prologue);
> +  CLEAR_HARD_REG_SET (stack_slot_access);
>add_to_hard_reg_set (_up_by_prologue, Pmode, STACK_POINTER_REGNUM);
>add_to_hard_reg_set (_up_by_prologue, Pmode, ARG_POINTER_REGNUM);
>add_to_hard_reg_set (_up_by_prologue, Pmode,
>HARD_FRAME_POINTER_REGNUM);
> +  /* Stack slot can be accessed by stack pointer, frame pointer or
> + registers defined by stack pointer or frame pointer.  */
> +  add_to_hard_reg_set (_slot_access, Pmode,
> +  STACK_POINTER_REGNUM);
> +  if 

Re: [PATCH] RISC-V: Handle rouding mode correctly on zfinx

2023-07-06 Thread Kito Cheng via Gcc-patches
Committed to trunk, and plan to back port to GCC 13 branch 1 week later :)


On Wed, Jul 5, 2023 at 10:15 PM Jeff Law  wrote:
>
>
>
> On 7/5/23 02:11, Kito Cheng wrote:
> > Zfinx has provide fcsr like F, so rouding mode should use fcsr instead
> > of `soft` fenv.
> >
> > libgcc/ChangeLog:
> >
> >   * config/riscv/sfp-machine.h (FP_INIT_ROUNDMODE): Check zfinx.
> >   (FP_HANDLE_EXCEPTIONS): Ditto.
> OK
> jeff


Re: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 11:15 PM Eugene Rozenfeld
 wrote:
>
> There is no warning and perf /uk succeeds when kptr_restrict is set to 1 and 
> perf_event_paranoid set to 2. However, create_gcov may fail since it won't be 
> able to understand kernel addresses and it requires at least 95% of events to 
> be successfully mapped.

OK, so I guess the patch is OK then given it can improve the situation
in the right circumstances
and doesn't hurt otherwise.

Thanks,
Richard.

> If I set both kptr_restrict and perf_event_paranoid to 1, then I do get 
> warnings from perf (but it still succeeds and exits with a 0 code). And, of 
> course create_gcov will also fail to map some events since it won't 
> understand kernel addresses.
>
> WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
> check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
>
> Samples in kernel functions may not be resolved if a suitable vmlinux
> file is not found in the buildid cache or in the vmlinux path.
>
> Samples in kernel modules won't be resolved at all.
>
> If some relocation was applied (e.g. kexec) symbols may be misresolved
> even with a suitable vmlinux or kallsyms file.
>
> Couldn't record kernel reference relocation symbol
> Symbol resolution may be skewed if relocation was used (e.g. kexec).
> Check /proc/kallsyms permission or run as root.
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.037 MB 
> /home/erozen/gcc1_objdir/gcc/testsuite/gcc/indir-call-prof.perf.data (86 
> samples) ]
>
> Eugene
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, July 3, 2023 12:47 AM
> To: Eugene Rozenfeld 
> Cc: Sam James ; gcc-patches@gcc.gnu.org
> Subject: Re: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for 
> autofdo tests and autoprofiledbootstrap
>
> On Sat, Jul 1, 2023 at 12:05 AM Eugene Rozenfeld 
>  wrote:
> >
> > I also set /proc/sys/kernel/perf_event_paranoid to 1 instead of the default 
> > 2.
>
> Does the perf attempt fail when the privileges are not adjusted and you 
> specify --all?  I see it adds /uk as flags, when I do
>
> > perf record -e instructions//uk ./a.out
>
> it doesn't complain in any way with
>
> > cat /proc/sys/kernel/kptr_restrict
> 1
> > cat /proc/sys/kernel/perf_event_paranoid
> 2
>
> so in case the 'kernel' side is simply ignored when profiling there isn't 
> permitted/possible then I guess the patch is OK?
>
> Can you confirm?
>
> Thanks,
> Richard.
>
> > -Original Message-
> > From: Gcc-patches
> >  On Behalf Of
> > Eugene Rozenfeld via Gcc-patches
> > Sent: Friday, June 30, 2023 2:44 PM
> > To: Sam James ; Richard Biener
> > 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel
> > events for autofdo tests and autoprofiledbootstrap
> >
> > I don't run this with elevated privileges but I set 
> > /proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated 
> > privileges.
> >
> > If that's not acceptable, the only fix I can think of is to make that event 
> > mapping threshold percentage a parameter to create_gcov and pass something 
> > low enough. 80% instead of the current threshold of 95% should work, 
> > although it's a bit fragile.
> >
> > Eugene
> >
> > -Original Message-
> > From: Sam James 
> > Sent: Friday, June 30, 2023 1:59 AM
> > To: Richard Biener 
> > Cc: Eugene Rozenfeld ;
> > gcc-patches@gcc.gnu.org
> > Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events
> > for autofdo tests and autoprofiledbootstrap
> >
> > [You don't often get email from s...@gentoo.org. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Richard Biener via Gcc-patches  writes:
> >
> > > On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches
> > >  wrote:
> > >>
> > >> When we collect just user events for autofdo with lbr we get some
> > >> events where branch sources are kernel addresses and branch targets
> > >> are user addresses. Without kernel MMAP events create_gcov can't
> > >> make sense of kernel addresses. Currently create_gcov fails if it
> > >> can't map at least 95% of events. We sometimes get below this threshold 
> > >> with just user events. The change is to collect both user events and 
> > >> kernel events.
> > >
> > > Does this require elevated privileges?  Can we instead "fix" create_gcov 
> > > here?
> >
> > Right, requiring privileges for this is going to be a no-go for a lot of 
> > builders. In a distro context, for example, it means we can't consider 
> > autofdo at all.


Re: [PATCH 1/2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 3:20 AM liuhongt  wrote:
>
> We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
> it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
> the testcase in the PR, there's an extra move from cmp_op0 to if_true,
> and it failed ix86_expand_sse_fp_minmax.
>
> This patch adds pre_reload splitter to detect the min/max pattern.
>
> Operands order in MINSS matters for signed zero and NANs, since the
> instruction always returns second operand when any operand is NAN or
> both operands are zero.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/110170
> * config/i386/i386.md (*ieee_minmax3_1): New pre_reload
> splitter to detect fp min/max pattern.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr110170.C: New test.
> * gcc.target/i386/pr110170.c: New test.
> ---
>  gcc/config/i386/i386.md  | 30 +
>  gcc/testsuite/g++.target/i386/pr110170.C | 78 
>  gcc/testsuite/gcc.target/i386/pr110170.c | 18 ++
>  3 files changed, 126 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr110170.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr110170.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index e6ebc461e52..353bb21993d 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -22483,6 +22483,36 @@ (define_insn "*ieee_s3"
> (set_attr "type" "sseadd")
> (set_attr "mode" "")])
>
> +;; Operands order in min/max instruction matters for signed zero and NANs.
> +(define_insn_and_split "*ieee_minmax3_1"
> +  [(set (match_operand:MODEF 0 "register_operand")
> +   (unspec:MODEF
> + [(match_operand:MODEF 1 "register_operand")
> +  (match_operand:MODEF 2 "register_operand")
> +  (lt:MODEF
> +(match_operand:MODEF 3 "register_operand")
> +(match_operand:MODEF 4 "register_operand"))]
> + UNSPEC_BLENDV))]
> +  "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
> +  && ((rtx_equal_p (operands[1], operands[3])
> +   && rtx_equal_p (operands[2], operands[4]))
> +  || (rtx_equal_p (operands[1], operands[4])
> + && rtx_equal_p (operands[2], operands[3])))
> +  && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  int u = (rtx_equal_p (operands[1], operands[3])
> +  && rtx_equal_p (operands[2], operands[4]))
> +  ? UNSPEC_IEEE_MAX : UNSPEC_IEEE_MIN;
> +  emit_move_insn (operands[0],
> + gen_rtx_UNSPEC (mode,
> + gen_rtvec (2, operands[2], operands[1]),
> + u));
> +  DONE;
> +})

Please split the above pattern into two, one emitting UNSPEC_IEEE_MAX
and the other emitting UNSPEC_IEEE_MIN.

> +
>  ;; Make two stack loads independent:
>  ;;   fld aa  fld aa
>  ;;   fld %st(0) ->   fld bb
> diff --git a/gcc/testsuite/g++.target/i386/pr110170.C 
> b/gcc/testsuite/g++.target/i386/pr110170.C
> new file mode 100644
> index 000..1e9a781ca74
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr110170.C
> @@ -0,0 +1,78 @@
> +/* { dg-do run } */
> +/* { dg-options " -O2 -march=x86-64 -mfpmath=sse -std=gnu++20" } */

The test involves blendv instruction, which is SSE4.1, so it is
pointless to test it without -msse4.1. Please add -msse4.1 instead of
-march=x86_64 and use sse4_runtime target selector, as is the case
with gcc.target/i386/pr90358.c.

> +#include 
> +
> +void
> +__attribute__((noinline))
> +__cond_swap(double* __x, double* __y) {
> +  bool __r = (*__x < *__y);
> +  auto __tmp = __r ? *__x : *__y;
> +  *__y = __r ? *__y : *__x;
> +  *__x = __tmp;
> +}
> +
> +auto test1() {
> +double nan = -0.0;
> +double x = 0.0;
> +__cond_swap(, );
> +return x == -0.0 && nan == 0.0;
> +}
> +
> +auto test1r() {
> +double nan = NAN;
> +double x = 1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 0 && nan == 1.0;
> +}
> +
> +auto test2() {
> +double nan = NAN;
> +double x = -1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 0 && nan == -1.0;
> +}
> +
> +auto test2r() {
> +double nan = NAN;
> +double x = -1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 0 && nan == -1.0;
> +}
> +
> +auto test3() {
> +double nan = -NAN;
> +double x = 1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 1 && nan == 1.0;
> +}
> +
> +auto test3r() {
> +double nan = -NAN;
> +double x = 1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 1 && nan == 1.0;
> +}
> +
> +auto test4() {
> +double nan = -NAN;
> +double x = -1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 1 && nan == -1.0;
> +}
> +
> +auto test4r() {
> +double nan = -NAN;
> +double x = -1.0;
> +__cond_swap(, );
> +return isnan(x) && signbit(x) == 1 

[PATCH] VECT: Fix ICE of variable stride on strieded load/store with SELECT_VL loop control.

2023-07-06 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richi.

Sorry for making mistake on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE
with SELECT_VL loop control.

Consider this following case:
#define TEST_LOOP(DATA_TYPE, BITS) \
  void __attribute__ ((noinline, noclone)) \
  f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   \
  INDEX##BITS stride, INDEX##BITS n)   \
  {\
for (INDEX##BITS i = 0; i < n; ++i)\
  dest[i] += src[i * stride];  \
  }

When "stride" is a constant, current flow works fine.
However, when "stride" is a variable. It causes an ICE:
# vectp_src.67_85 = PHI 
...
_96 = .SELECT_VL (ivtmp_94, 4);
...
ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;
vect__11.69_87 = .LEN_MASK_GATHER_LOAD (vectp_src.67_85, _84, 4, { 0, 0, 0, 0 
}, { -1, -1, -1, -1 }, _96, 0);
...
vectp_src.67_86 = vectp_src.67_85 + ivtmp_78;

Becase the IR: ivtmp_78 = ((sizetype) _39 * (sizetype) _96) * 4;

Instead, I split the IR into:

step_stride = _39
step = step_stride * 4
ivtmp_78 = step * _96

I don't think this patch's code is elegant enough, could you help me refine 
these codes?

Thanks.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_get_strided_load_store_ops): Fix ICE.

---
 gcc/tree-vect-stmts.cc | 38 +-
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c10a4be60eb..12d1b0f1ac0 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3172,12 +3172,40 @@ vect_get_strided_load_store_ops (stmt_vec_info 
stmt_info,
 vectp_a.9_26 = vectp_a.9_7 + ivtmp_8;  */
   tree loop_len
= vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, vectype, 0, 0);
-  tree tmp
-   = fold_build2 (MULT_EXPR, sizetype,
-  fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
-  loop_len);
+  tree tmp;
+  gassign *assign;
+
+  if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
+   tmp = fold_build2 (MULT_EXPR, sizetype,
+  fold_convert (sizetype, unshare_expr (DR_STEP (dr))),
+  loop_len);
+  else
+   {
+ /* If DR_STEP = (unsigned int) _37 * 4;
+Extract _37 and 4, explicit MULT_EXPR.  */
+
+ /* 1. step_stride = (unsigned int) _37.  */
+ tree step_stride = make_ssa_name (create_tmp_var (sizetype));
+ assign = gimple_build_assign (
+   step_stride, TREE_OPERAND (TREE_OPERAND (DR_STEP (dr), 0), 0));
+ gsi_insert_before (gsi, assign, GSI_SAME_STMT);
+
+ /* 2. step = step_stride * 4.  */
+ tree step_align = TREE_OPERAND (TREE_OPERAND (DR_STEP (dr), 0), 1);
+ tree step = make_ssa_name (create_tmp_var (sizetype));
+ assign
+   = gimple_build_assign (step, fold_build2 (MULT_EXPR, sizetype,
+ step_stride, step_align));
+ gsi_insert_before (gsi, assign, GSI_SAME_STMT);
+
+ /* 3. tmp = step * loop_len.  */
+ tmp = make_ssa_name (create_tmp_var (sizetype));
+ assign = gimple_build_assign (tmp, fold_build2 (MULT_EXPR, sizetype,
+ step, loop_len));
+ gsi_insert_before (gsi, assign, GSI_SAME_STMT);
+   }
   tree bump = make_temp_ssa_name (sizetype, NULL, "ivtmp");
-  gassign *assign = gimple_build_assign (bump, tmp);
+  assign = gimple_build_assign (bump, tmp);
   gsi_insert_before (gsi, assign, GSI_SAME_STMT);
   *dataref_bump = bump;
 }
-- 
2.36.3



Re: [PATCH] Fix PR 110554: vec lowering introduces scalar signed-boolean:32 comparisons

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 7:02 PM Andrew Pinski via Gcc-patches
 wrote:
>
> So the problem is vector generic decided to do comparisons in 
> signed-boolean:32
> types but the rest of the middle-end was not ready for that. Since we are 
> building
> the comparison which will feed into a cond_expr here, using boolean_type_node 
> is
> better and also correct. The rest of the compiler thinks the ranges for
> comparison is always [0,1] too.
>
> Note this code does not currently lowers bigger vector sizes into smaller
> vector sizes so using boolean_type_node here is better.
>
> OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> PR middle-end/110554
> * tree-vect-generic.cc (expand_vector_condition): For comparisons,
> just build using boolean_type_node instead of the cond_type.
> For non-comparisons/non-scalar-bitmask, build a ` != 0` gimple
> that will feed into the COND_EXPR.
> ---
>  gcc/tree-vect-generic.cc | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> index df04a0db68d..a7e6cb87a5e 100644
> --- a/gcc/tree-vect-generic.cc
> +++ b/gcc/tree-vect-generic.cc
> @@ -1121,7 +1121,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, 
> bitmap dce_ssa_names)
>comp_width, comp_index);
>   tree aa2 = tree_vec_extract (gsi, comp_inner_type, a2,
>comp_width, comp_index);
> - aa = gimplify_build2 (gsi, code, cond_type, aa1, aa2);
> + aa = gimplify_build2 (gsi, code, boolean_type_node, aa1, aa2);
> }
>else if (a_is_scalar_bitmask)
> {
> @@ -1132,7 +1132,11 @@ expand_vector_condition (gimple_stmt_iterator *gsi, 
> bitmap dce_ssa_names)
> build_zero_cst (TREE_TYPE (a)));
> }
>else
> -   aa = tree_vec_extract (gsi, cond_type, a, comp_width, comp_index);
> +   {
> + result = tree_vec_extract (gsi, cond_type, a, comp_width, 
> comp_index);
> + aa = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, result,
> +   build_zero_cst (cond_type));
> +   }
>result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc);
>if (!CONSTANT_CLASS_P (result))
> constant_p = false;
> --
> 2.31.1
>


Re: GGC, GTY: No pointer walking for 'atomic' in PCH 'gt_pch_note_object' (was: Patch: New GTY ((atomic)) option)

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 6:25 PM Thomas Schwinge  wrote:
>
> Hi!
>
> My original motivation for the following exercise what that, for example,
> for: 'const unsigned char * GTY((atomic)) mode_table', we currently run
> into 'const' mismatches, 'error: invalid conversion':
>
> [...]
> gtype-desc.cc: In function 'void gt_pch_nx_lto_file_decl_data(void*)':
> gtype-desc.cc:6531:34: error: invalid conversion from 'const void*' to 
> 'void*' [-fpermissive]
>  gt_pch_note_object ((*x).mode_table, x, 
> gt_pch_p_18lto_file_decl_data);
>   ^
> In file included from [...]/source-gcc/gcc/hash-table.h:247:0,
>  from [...]/source-gcc/gcc/coretypes.h:486,
>  from gtype-desc.cc:23:
> [...]/source-gcc/gcc/ggc.h:47:12: note:   initializing argument 1 of 'int 
> gt_pch_note_object(void*, void*, gt_note_pointers, size_t)'
>  extern int gt_pch_note_object (void *, void *, gt_note_pointers,
> ^
> make[2]: *** [Makefile:1180: gtype-desc.o] Error 1
> [...]
>
> ..., as I had reported as "'GTY' issues: (1) 'const' build error" in
> 
> 'Adjust LTO mode tables for "Machine_Mode: Extend machine_mode from 8 to 16 
> bits"'.
>
> That said:
>
> On 2011-05-16T02:13:56+0200, "Nicola Pero"  
> wrote:
> > This patch adds a new GTY option, "atomic", which is similar to the 
> > identical option you have with Boehm GC
> > and which can be used with pointers to inform the GC/PCH machinery that 
> > they point to an area of memory that
> > contains no pointers (and hence needs no scanning).
> >
> > [...]
>
> On top of that, OK to push the attached
> "GGC, GTY: No pointer walking for 'atomic' in PCH 'gt_pch_note_object'"?
> Appreciate review from a GGC, GTY-savvy person.

OK.  Thanks for the detailed explanations, that helps even a not
GGC/GTY savy person to
review this ;)

Thanks,
Richard.

> This depends on
> 
> "GGC, GTY: Tighten up a few things re 'reorder' option and strings".
>
>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: GGC, GTY: Tighten up a few things re 'reorder' option and strings

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 6:16 PM Thomas Schwinge  wrote:
>
> Hi!
>
> OK to push the attached
> "GGC, GTY: Tighten up a few things re 'reorder' option and strings"?

OK.

>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: [v2] GTY: Clean up obsolete parametrized structs remnants (was: [PATCH 3/3] remove gengtype support for param_is use_param, if_marked and splay tree allocators)

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 6:13 PM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2023-07-05T10:16:09+0200, I wrote:
> > On 2014-11-23T23:11:36-0500, tsaund...@mozilla.com wrote:
> >> gcc/
> >>
> >>   * plugin.c, plugin.def, ggc.h, ggc-common.c, gengtype.h, gengtype.c,
> >>   gengtype-state.c, gengtype-parse.c, gentype-lex.l, gcc-plugin.h,
> >>   doc/plugins.texi, doc/gty.texi: Remove support for if_marked and
> >>   param_is.
> >
> >> --- a/gcc/gengtype.h
> >> +++ b/gcc/gengtype.h
> >
> >> @@ -153,11 +152,6 @@ enum typekind {
> >>TYPE_LANG_STRUCT, /* GCC front-end language specific structs.
> >> Various languages may have homonymous but
> >> different structs.  */
> >> -  TYPE_PARAM_STRUCT,/* Type for parametrized structs, e.g. hash_t
> >> -   hash-tables, ...  See (param_is, use_param,
> >> -   param1_is, param2_is,... use_param1,
> >> -   use_param_2,... use_params) GTY
> >> -   options.  */
> >>TYPE_USER_STRUCT   /* User defined type.  Walkers and markers for
> >>  this type are assumed to be provided by the
> >>  user.  */
> >
> > OK to push the attached
> > "GTY: Clean up obsolete parametrized structs remnants"?
>
> Updated per
> 
> "GTY: Repair 'enum gty_token', 'token_names' desynchronization", OK to
> push the attached
> v2 "GTY: Clean up obsolete parametrized structs remnants"?

OK.

>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: GTY: Repair 'enum gty_token', 'token_names' desynchronization (was: [cxx-conversion] Support garbage-collected C++ templates)

2023-07-06 Thread Richard Biener via Gcc-patches
On Wed, Jul 5, 2023 at 12:21 PM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2012-08-10T11:06:46-0400, Diego Novillo  wrote:
> >  * gengtype-lex.l (USER_GTY): Add pattern for "user".
> >  * gengtype-parse.c (option): Handle USER_GTY.
> >  (opts_have): New.
> >  (type): Call it.
> >  If the keyword 'user' is used, do not walk the fields
> >  of the structure.
> >  * gengtype.h (USER_GTY): Add.
>
> These changes got incorporated in
> commit 0823efedd0fb8669b7e840954bc54c3b2cf08d67 (Subversion r190402).
>
> > --- a/gcc/gengtype-lex.l
> > +++ b/gcc/gengtype-lex.l
> > @@ -108,6 +108,7 @@ EOID  [^[:alnum:]_]
> >   "enum"/{EOID}   { return ENUM; }
> >   "ptr_alias"/{EOID}  { return PTR_ALIAS; }
> >   "nested_ptr"/{EOID} { return NESTED_PTR; }
> > +"user"/{EOID}{ return USER_GTY; }
> >   [0-9]+  { return NUM; }
> >   "param"[0-9]*"_is"/{EOID}   {
> > *yylval = XDUPVAR (const char, yytext, yyleng, yyleng+1);
>
> > --- a/gcc/gengtype-parse.c
> > +++ b/gcc/gengtype-parse.c
> > @@ -499,6 +499,10 @@ option (options_p prev)
> > [...]
>
> > --- a/gcc/gengtype.h
> > +++ b/gcc/gengtype.h
> > @@ -463,6 +463,7 @@ enum
> >   ELLIPSIS,
> >   PTR_ALIAS,
> >   NESTED_PTR,
> > +USER_GTY,
> >   PARAM_IS,
> >   NUM,
> >   SCALAR,
>
> This did add 'USER_GTY' to what nowadays is known as 'enum gty_token',
> but didn't accordingly update 'gcc/gengtype-parse.c:token_names', leaving
> those out of sync.  Updating 'gcc/gengtype-parse.c:token_value_format'
> wasn't necessary, as:
>
> /* print_token assumes that any token >= FIRST_TOKEN_WITH_VALUE may have
>a meaningful value to be printed.  */
> FIRST_TOKEN_WITH_VALUE = PARAM_IS
>
> This, in turn, got further confused -- or "fixed" -- by later changes:
> 2014 commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
> "remove gengtype support for param_is use_param, if_marked and splay tree 
> allocators",
> which reciprocally missed corresponding clean-up.
>
> OK to push the attached
> "GTY: Repair 'enum gty_token', 'token_names' desynchronization"?

OK.

>
> On top of that, I'll then re-submit an adjusted
> 
> "GTY: Clean up obsolete parametrized structs remnants".
>
>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955