date:20230607

Re: [Patch, fortran] PR87477 - (associate) - [meta-bug] [F03] issues concerning the ASSOCIATE statement

2023-06-07 Thread Paul Richard Thomas via Gcc-patches

Hi Harald,

In answer to your question:
void
gfc_replace_expr (gfc_expr *dest, gfc_expr *src)
{
  free_expr0 (dest);
  *dest = *src;
  free (src);
}
So it does indeed do the job.

I should perhaps have remarked that, following the divide error,
gfc_simplify_expr was returning a mutilated version of the expression
and this was somehow connected with successfully simplifying the
parentheses. Copying and replacing on no errors deals with the
problem.

Thanks

Paul

On Wed, 7 Jun 2023 at 19:38, Harald Anlauf  wrote:
>
> Hi Paul!
>
> On 6/7/23 18:10, Paul Richard Thomas via Gcc-patches wrote:
> > Hi All,
> >
> > Three more fixes for PR87477. Please note that PR99350 was a blocker
> > but, as pointed out in comment #5 of the PR, this has nothing to do
> > with the associate construct.
> >
> > All three fixes are straight forward and the .diff + ChangeLog suffice
> > to explain them. 'rankguessed' was made redundant by the last PR87477
> > fix.
> >
> > Regtests on x86_64 - good for mainline?
> >
> > Paul
> >
> > Fortran: Fix some more blockers in associate meta-bug [PR87477]
> >
> > 2023-06-07  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/99350
> > * decl.cc (char_len_param_value): Simplify a copy of the expr
> > and replace the original if there is no error.
>
> This seems to lack a gfc_free_expr (p) in case the gfc_replace_expr
> is not executed, leading to a possible memleak.  Can you check?
>
> @@ -1081,10 +1082,10 @@ char_len_param_value (gfc_expr **expr, bool
> *deferred)
> if (!gfc_expr_check_typed (*expr, gfc_current_ns, false))
>   return MATCH_ERROR;
>
> -  /* If gfortran gets an EXPR_OP, try to simplify it.  This catches things
> - like CHARACTER(([1])).   */
> -  if ((*expr)->expr_type == EXPR_OP)
> -gfc_simplify_expr (*expr, 1);
> +  /* Try to simplify the expression to catch things like
> CHARACTER(([1])).   */
> +  p = gfc_copy_expr (*expr);
> +  if (gfc_is_constant_expr (p) && gfc_simplify_expr (p, 1))
> +gfc_replace_expr (*expr, p);
> else
>   gfc_free_expr (p);
>
> > * gfortran.h : Remove the redundant field 'rankguessed' from
> > 'gfc_association_list'.
> > * resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.
> >
> > PR fortran/107281
> > * resolve.cc (resolve_variable): Associate names with constant
> > or structure constructor targets cannot have array refs.
> >
> > PR fortran/109451
> > * trans-array.cc (gfc_conv_expr_descriptor): Guard expression
> > character length backend decl before using it. Suppress the
> > assignment if lhs equals rhs.
> > * trans-io.cc (gfc_trans_transfer): Scalarize transfer of
> > associate variables pointing to a variable. Add comment.
> > * trans-stmt.cc (trans_associate_var): Remove requirement that
> > the character length be deferred before assigning the value
> > returned by gfc_conv_expr_descriptor. Also, guard the backend
> > decl before testing with VAR_P.
> >
> > gcc/testsuite/
> > PR fortran/99350
> > * gfortran.dg/pr99350.f90 : New test.
> >
> > PR fortran/107281
> > * gfortran.dg/associate_5.f03 : Changed error message.
> > * gfortran.dg/pr107281.f90 : New test.
> >
> > PR fortran/109451
> > * gfortran.dg/associate_61.f90 : New test
>
> Otherwise LGTM.
>
> Thanks for the patch!
>
> Harald
>
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein

Re: [PATCH] optabs: Implement double-word ctz and ffs expansion

2023-06-07 Thread Richard Biener via Gcc-patches




> Am 07.06.2023 um 18:59 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> We have expand_doubleword_clz for a couple of years, where we emit
> double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
> else return CLZ (high_word);
> We can do something similar for CTZ and FFS IMHO, just with the 2
> words swapped.  So if (low_word == 0) return CTZ (high_word) + word_size;
> else return CTZ (low_word); for CTZ and
> if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
> else return FFS (low_word);
> 
> The following patch implements that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 
> Note, on some targets which implement both word_mode ctz and ffs patterns,
> it might be better to incrementally implement those double-word ffs expansion
> patterns in md files, because we aren't able to optimize it correctly;
> nothing can detect we have just made sure that argument is not 0 and so
> don't need to bother with handling that case.  So, on ia32 just using
> CTZ patterns would be better there, but I think we can even do better and
> instead of doing the comparisons of the operands against 0 do the CTZ
> expansion followed by testing of flags.
> 
> 2023-06-07  Jakub Jelinek  
> 
>* optabs.cc (expand_ffs): Add forward declaration.
>(expand_doubleword_clz): Rename to ...
>(expand_doubleword_clz_ctz_ffs): ... this.  Add UNOPTAB argument,
>handle also doubleword CTZ and FFS in addition to CLZ.
>(expand_unop): Adjust caller.  Also call it for doubleword
>ctz_optab and ffs_optab.
> 
>* gcc.target/i386/ctzll-1.c: New test.
>* gcc.target/i386/ffsll-1.c: New test.
> 
> --- gcc/optabs.cc.jj2023-06-07 09:42:14.701130305 +0200
> +++ gcc/optabs.cc2023-06-07 14:35:04.909879272 +0200
> @@ -2697,10 +2697,14 @@ expand_clrsb_using_clz (scalar_int_mode
>   return temp;
> }
> 
> -/* Try calculating clz of a double-word quantity as two clz's of word-sized
> -   quantities, choosing which based on whether the high word is nonzero.  */
> +static rtx expand_ffs (scalar_int_mode, rtx, rtx);
> +
> +/* Try calculating clz, ctz or ffs of a double-word quantity as two clz, ctz 
> or
> +   ffs operations on word-sized quantities, choosing which based on whether 
> the
> +   high (for clz) or low (for ctz and ffs) word is nonzero.  */
> static rtx
> -expand_doubleword_clz (scalar_int_mode mode, rtx op0, rtx target)
> +expand_doubleword_clz_ctz_ffs (scalar_int_mode mode, rtx op0, rtx target,
> +   optab unoptab)
> {
>   rtx xop0 = force_reg (mode, op0);
>   rtx subhi = gen_highpart (word_mode, xop0);
> @@ -2709,6 +2713,7 @@ expand_doubleword_clz (scalar_int_mode m
>   rtx_code_label *after_label = gen_label_rtx ();
>   rtx_insn *seq;
>   rtx temp, result;
> +  int addend = 0;
> 
>   /* If we were not given a target, use a word_mode register, not a
>  'mode' register.  The result will fit, and nobody is expecting
> @@ -2721,6 +2726,9 @@ expand_doubleword_clz (scalar_int_mode m
>  'target' to tag a REG_EQUAL note on.  */
>   result = gen_reg_rtx (word_mode);
> 
> +  if (unoptab != clz_optab)
> +std::swap (subhi, sublo);
> +
>   start_sequence ();
> 
>   /* If the high word is not equal to zero,
> @@ -2728,7 +2736,13 @@ expand_doubleword_clz (scalar_int_mode m
>   emit_cmp_and_jump_insns (subhi, CONST0_RTX (word_mode), EQ, 0,
>   word_mode, true, hi0_label);
> 
> -  temp = expand_unop_direct (word_mode, clz_optab, subhi, result, true);
> +  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
> +temp = expand_unop_direct (word_mode, unoptab, subhi, result, true);
> +  else
> +{
> +  gcc_assert (unoptab == ffs_optab);
> +  temp = expand_ffs (word_mode, subhi, result);
> +}
>   if (!temp)
> goto fail;
> 
> @@ -2739,14 +2753,32 @@ expand_doubleword_clz (scalar_int_mode m
>   emit_barrier ();
> 
>   /* Else clz of the full value is clz of the low word plus the number
> - of bits in the high word.  */
> + of bits in the high word.  Similarly for ctz/ffs of the high word,
> + except that ffs should be 0 when both words are zero.  */
>   emit_label (hi0_label);
> 
> -  temp = expand_unop_direct (word_mode, clz_optab, sublo, 0, true);
> +  if (unoptab == ffs_optab)
> +{
> +  convert_move (result, const0_rtx, true);
> +  emit_cmp_and_jump_insns (sublo, CONST0_RTX (word_mode), EQ, 0,
> +   word_mode, true, after_label);
> +}
> +
> +  if (optab_handler (unoptab, word_mode) != CODE_FOR_nothing)
> +temp = expand_unop_direct (word_mode, unoptab, sublo, NULL_RTX, true);
> +  else
> +{
> +  gcc_assert (unoptab == ffs_optab);
> +  temp = expand_unop_direct (word_mode, ctz_optab, sublo, NULL_RTX, 
> true);
> +  addend = 1;
> +}
> +
>   if (!temp)
> goto fail;
> +
>   temp = expand_binop (word_mode, add_optab, temp,
> -   gen_int_mode (GET_MODE_BITSIZE (word_mode), word_mode),
> +

Re: [PATCH] i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]

2023-06-07 Thread Richard Biener via Gcc-patches




> Am 07.06.2023 um 18:52 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> I'm getting
> +FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
> +FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
> +FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
> +FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault 
> signal terminated program cc1)
> +FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
> regressions on i686-linux since r14-1166.  The problem is when
> ix86_expand_vector_init_general is called with mmx_ok = true and
> mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
> but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
> arguments (ok, fresh new tmp and vals) infinitely.
> The following patch fixes that by passing mmx_ok to that recursive call.
> For n_words == 4 it isn't needed, because we only care about mmx_ok for
> V2SImode or V2SFmode and no other modes.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard 

> 2023-06-07  Jakub Jelinek  
> 
>PR target/110152
>* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
>n_words == 2 recurse with mmx_ok as first argument rather than false.
> 
> --- gcc/config/i386/i386-expand.cc.jj2023-06-03 15:32:04.489410367 +0200
> +++ gcc/config/i386/i386-expand.cc2023-06-07 10:31:34.715981752 +0200
> @@ -16371,7 +16371,7 @@ quarter:
>  machine_mode concat_mode = tmp_mode == DImode ? V2DImode : V2SImode;
>  rtx tmp = gen_reg_rtx (concat_mode);
>  vals = gen_rtx_PARALLEL (concat_mode, gen_rtvec_v (2, words));
> -  ix86_expand_vector_init_general (false, concat_mode, tmp, vals);
> +  ix86_expand_vector_init_general (mmx_ok, concat_mode, tmp, vals);
>  emit_move_insn (target, gen_lowpart (mode, tmp));
>}
>   else if (n_words == 4)
> 
>Jakub
>

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-07 Thread Jeff Law via Gcc-patches





On 5/25/23 06:35, Manolis Tsamis wrote:

Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.

gcc/ChangeLog:

* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
pass.
* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
* config/riscv/riscv.opt: New options.
* config/riscv/t-riscv: New build rule.
* doc/invoke.texi: Document new option.
* config/riscv/riscv-fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

So not going into the guts of the patch yet.

From a benchmark standpoint the only two that get out of the +-0.05% 
range are mcf and deepsjeng (from a dynamic instruction standpoint).  So 
from an evaluation standpoint we can probably focus our efforts there. 
And as we know, mcf is actually memory bound, so while improving its 
dynamic instruction count is good, the end performance improvement may 
be marginal.


As I mentioned to Philipp many months ago this reminds me a lot of a 
problem I've seen before.  Basically register elimination emits code 
that can be terrible in some circumstances.  So I went and poked at this 
again.


I think the key difference between now and what I was dealing with 
before is for the cases that really matter for rv64 we have a shNadd 
insn in the sequence.  That private port I was working on before did not 
have shNadd (don't ask, I probably can't tell).  Our target also had 
reg+reg addressing modes.  What I can't remember was if we were trying 
harder to fold the constant terms into the memory reference or if we 
were more focused on the reg+reg.  Ultimately it's probably not that 
important to remember -- the key is there are very significant 
differences in the target's capabilities which impact how we should be 
generating code in this case.  Those differences affect the code we 
generate *and* the places where we can potentially get control and do 
some address rewriting.


A  key sequence in mcf looks something like this in IRA, others have 
similar structure:



(insn 237 234 239 26 (set (reg:DI 377)
(plus:DI (ashift:DI (reg:DI 200 [ _173 ])
(const_int 3 [0x3]))
(reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
 (nil))
(insn 239 237 235 26 (set (reg/f:DI 380)
(plus:DI (reg:DI 513)
(reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
 (expr_list:REG_DEAD (reg:DI 377)
(expr_list:REG_EQUAL (plus:DI (reg:DI 377)
(const_int -32768 [0x8000]))
(nil

[ ... ]

(insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
(mem/f:DI (plus:DI (reg/f:DI 380)
(const_int 280 [0x118])) [7 *_176+0 S8 A64])) 
"pbeampp.c":139:15 179 {*movdi_64bit}
 (expr_list:REG_DEAD (reg/f:DI 380)
(nil)))



The key here is insn 237.  It's generally going to be bad to have FP 
show up in a shadd insn because its going to be eliminated into 
sp+offset.  That'll generate an input reload before insn 237 and we 
can't do any combination with the constant in insn 239.


After LRA it looks like this:


(insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
(const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
 (nil))
(insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
(plus:DI (reg:DI 11 a1 [750])
(const_int -272 [0xfef0]))) "pbeampp.c":139:15 5 
{adddi3}
 (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
(nil))) 
(insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])

(plus:DI (reg/f:DI 2 sp)
(const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
 (nil))
(insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
(plus:DI (reg:DI 12 a2 [749])
(reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
 (nil))
(insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
(plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
(const_int 3 [0x3]))
(reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
 (nil))
(insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
(plus:DI (reg:DI 10 a0 [513])
(reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3}
 (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377])
(const_int -32768 [0x8000]))
(nil))) 

[ ... ]

(insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204])
(mem/f:DI (plus:DI (reg/f:DI 12 a2 [380])
(const_int 280 [0x118])) [7 *_176+0 S8 A64])) 
"pbeampp.c":139:15 179 {*movdi_64bit}
 (nil))



Reload/LRA made an absolute mess of that code.

But before we add a new pass (target specific or generic), I think it 
may be in our best

[PATCH v6] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-07 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one function as
the gate for FP16 supported or not.

Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (float_point_mode_supported_p):
New function to float point is supported by extension.
* config/riscv/riscv-v.cc (float_point_mode_supported_p):
Ditto.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
* config/riscv/vector.md: Add condition to FP define insn.
---
 gcc/config/riscv/riscv-protos.h  |   1 +
 gcc/config/riscv/riscv-v.cc  |  12 +++
 gcc/config/riscv/vector-iterators.md |  23 +++--
 gcc/config/riscv/vector.md   | 144 +++
 4 files changed, 105 insertions(+), 75 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ebbaac255f9..e4881786b53 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -177,6 +177,7 @@ rtx expand_builtin (unsigned int, tree, rtx);
 bool check_builtin_call (location_t, vec, unsigned int,
   tree, unsigned int, tree *);
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+bool float_point_mode_supported_p (machine_mode mode);
 bool legitimize_move (rtx, rtx);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 49752cd8899..1cc157f1858 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -418,6 +418,18 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }
 
+/* Return true if the inner of mode is HFmode when ZVFH enabled, or other
+   float point machine mode.  */
+bool
+float_point_mode_supported_p (machine_mode mode)
+{
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+
+  gcc_assert (FLOAT_MODE_P (inner_mode));
+
+  return inner_mode == HFmode ? TARGET_ZVFH : true;
+}
+
 /* Return true if VEC is a constant in which every element is in the range
[MINVAL, MAXVAL].  The elements do not need to have the same value.
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f4946d84449..234b712bc9d 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
 
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
@@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
 (define_mode_iterator V_FRACT [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
(VNx8QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
"TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN 
< 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -497,12 +500,12 @@ (define_mode_iterator VWEXTI [
 ])
 
 (define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN >

Targetting p0847 for GCC14 (explicit object parameter)

2023-06-07 Thread waffl3x via Gcc

I would like to boldly suggest implementing P0847 should be targeted at
GCC14. In my anecdotal experiences, this feature is very important to
people, and very important to myself, I believe it should be a priority.

I am not suggesting this without offering to contribute, however
because of my inexperience with compiler hacking I am concerned I would
hinder efforts. With that said, if no one is interested in starting
work on it, but there is consensus that the feature is important
enough, then I will do my best to take up that job.

If this was already the understood plan for GCC14, I apologize for my
ignorance on the matter. I searched around and couldn't find any
information regarding it, the mail list didn't seem to have any results
either. If it's there and I missed it, please do point it out. I am
also wondering if there is a public document with information about the
feature roadmap? I can understand why there isn't one if that isn't the
case, I imagine it would just cause a nuisance for the developers. I
had read the GCC Development Plan document a few months ago and given
the information in it I decided to wait for development to move on to
GCC14 before getting in touch. In hindsight that might have been a
mistake, oops!

I apologize if I overlooked something obvious, please don't hesitate to
correct any misconceptions I'm having here as it's not my intention to
step on any toes here, I just really really value this feature and want
to see it sooner rather than later.

I look forward to hearing everyone's input,
-Alex

[Bug libstdc++/110167] New: excessive compile time when optimizing std::to_array

2023-06-07 Thread nightstrike at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110167

Bug ID: 110167
   Summary: excessive compile time when optimizing std::to_array
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nightstrike at gmail dot com
  Target Milestone: ---

#include 
int f[262144]; auto g(void) { return std::to_array(f); }

(Thanks to Andrew for help reducing!)

Baseline run:

$ time g++ test.cc -std=gnu++20 -O0 -c
real0m17.274s
user0m16.806s
sys 0m0.119s


at -O1, 2, 3, or fast, it takes too long to ever finish.  I killed it after 15
minutes.

For the curious, that array size is 512*512.  I didn't bisect to see where it
starts to blow up.

[PATCH v2] LoongArch: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e" [PR110136]

2023-06-07 Thread Lulu Cheng

Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase in branch 
error
prediction rate and affect performance. The same problem exists with 
"indirect_jump".

gcc/ChangeLog:

* config/loongarch/loongarch.md: Modify the register constraints for 
template
"jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
  1. Modify the description
  2. Modify the register constraints of the template "indirect_jump".
---
 gcc/config/loongarch/loongarch.md | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 816a943d155..43a2ecc8957 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
 }
   [(set_attr "type" "branch")])
 
+;; Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
+;; hence doing "jr $ra" would interfere with both subroutine return prediction 
and
+;; the more general indirect branch prediction.
+
 (define_expand "indirect_jump"
   [(set (pc) (match_operand 0 "register_operand"))]
   ""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
 })
 
 (define_insn "@indirect_jump"
-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
   ""
   "jr\t%0"
   [(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
 
 (define_insn "@tablejump"
   [(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
(use (label_ref (match_operand 1 "" "")))]
   ""
   "jr\t%0"
-- 
2.31.1

[Bug tree-optimization/110166] [14 Regression] wrong code with signed 1-bit integers sometimes since r14-868-gb06cfb62229f

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110166

--- Comment #2 from Andrew Pinski  ---
```
/* Max -> bool0 | bool1
   Min -> bool0 & bool1 .   */
(for op(max min)
 logic (bit_ior bit_and)
 (simplify
  (op zero_one_valued_p@0 zero_one_valued_p@1)
  (if (TYPE_PRECISION (type) != 1
   || TYPE_UNSIGNED (type))
   (logic @0 @1
```
Should fix it.

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread jincikang at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

--- Comment #7 from jinci kang  ---
(In reply to Jonathan Wakely from comment #5)
> (In reply to Andrew Pinski from comment #4)
> > See https://gcc.gnu.org/gcc-13/porting_to.html also.
> 
> I don't think this is related to the new rules.
> 
> The std::move here is redundant because request is const, so request.body()
> calls the const overload which returns const std::string* and so
> std::move(*request.body()) produces a const std::string&& which cannot be
> moved. It can only be copied. So the move is redundant.

I see.

[PATCH V5] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Co-authored-by: Richard Sandiford
Co-authored-by: Richard Biener 

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
Co-authored-by: Richard Biener 

---
 gcc/doc/md.texi | 22 ++
 gcc/internal-fn.def |  1 +
 gcc/optabs.def  |  1 +
 gcc/tree-vect-loop-manip.cc | 32 ++
 gcc/tree-vect-loop.cc   | 72 +++
 gcc/tree-vect-stmts.cc  | 86 -
 gcc/tree-vectorizer.h   |  6 +++
 7 files changed, 201 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 3ac9d82aace..5d638de6d06 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -177,6 +177,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 6c064ff4993..f31b69c5d85 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -488,3 +488,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-  _36 = MIN_EXPR ;
+  _36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
-_gsi, insert_after, _before_incr,
-_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
-   index_before_incr,
-   nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+   {
+ create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
+insert_after, _before_incr, _after_incr);
+ tree len =

Re: On inform diagnostics in plugins, support scripts for gdb and modeling creation of PyObjects for static analysis

2023-06-07 Thread Eric Feng via Gcc

Hi Dave,

> If that's the code, does it work if you get rid of the "if (0)"
> conditional, or change it to "if (1)"?  As written, that guard is
> false, so that call to "inform" will never be executed.

Woops! Somehow I missed that but yes, it works now. Thanks!

>  Are you invoking gcc from an installed copy, or from the build
> directory?  I think my instructions assume the latter.

Ah gotcha, thanks! It loads as expected when invoking gcc from the
build directory.

> Don't attempt to build the struct by hand; we want to look up the
> struct from the user's headers.  There are at least two ABIs for
> PyObject, so we want to be sure we're using the correct one.
>
> IIRC, to look things up by name, that's generally a frontend thing,
> since every language has its own concept of scopes/namespaces/etc.
>
> It sounds like you want to look for a type in the global scope of the
> C/C++ FE with the name "PyObject".
>
> We currently have some hooks in the analyzer for getting constants from
> the frontends; see analyzer-language.cc, where the frontend calls
> on_finish_translation_unit, where the analyzer queries the FE for the
> named constants that will be of interest during analysis.  Maybe we can
> extend this so that we have a way to look up named types there, and
> stash the tree for later use, and thus your plugin could ask the
> frontend which tree is the PyObject RECORD_TYPE before the frontend is
> cleaned up (in on_finish_translation_unit).

Sounds good, I will look into that. Thanks for the suggestion!

Best,
Eric


On Wed, Jun 7, 2023 at 5:55 PM David Malcolm  wrote:
>
> On Wed, 2023-06-07 at 16:21 -0400, Eric Feng wrote:
> > Hi everyone,
> >
> > I am one of the GSoC participants this year — in particular, I am
> > working on a static analyzer plugin for CPython extension module
> > code.
> > I'm encountering a few challenges and would appreciate any guidance
> > on
> > the following issues:
> >
> > 1) Issue with "inform" diagnostics in the plugin:
> > I am currently unable to see any "inform" messages from my plugin
> > when
> > compiling test programs with the plugin enabled. As per the structure
> > of existing analyzer plugins, I have included the following code in
> > the plugin_init function:
> >
> > #if ENABLE_ANALYZER
> > const char *plugin_name = plugin_info->base_name;
> > if (0)
> > inform(input_location, "got here; %qs", plugin_name);
>
> If that's the code, does it work if you get rid of the "if (0)"
> conditional, or change it to "if (1)"?  As written, that guard is
> false, so that call to "inform" will never be executed.
>
> > register_callback(plugin_info->base_name,
> >   PLUGIN_ANALYZER_INIT,
> >   ana::cpython_analyzer_init_cb,
> >   NULL);
> > #else
> > sorry_no_analyzer();
> > #endif
> > return 0;
> >
> > I expected to see the "got here" message (among others in other areas
> > of the plugin) when compiling test programs but haven't observed any
> > output. I also did not observe the "sorry" diagnostic. I am compiling
> > a simple CPython extension module with the plugin loaded like so:
> >
> > gcc-dev -S -fanalyzer -fplugin=/path/to/cpython_plugin.so
> > -I/usr/include/python3.9 -lpython3.9 -x c refcount6.c
>
> Looks reasonable.
>
> >
> > Additionally, I compiled the plugin following the steps outlined in
> > the GCC documentation for plugin building
> > (https://gcc.gnu.org/onlinedocs/gccint/Plugins-building.html):
> >
> > g++-dev -shared -I/home/flappy/gcc_/gcc/gcc
> > -I/usr/local/lib/gcc/aarch64-unknown-linux-gnu/14.0.0/plugin/include
> > -fPIC -fno-rtti -O2 analyzer_cpython_plugin.c -o cpython_plugin.so
> >
> > Please let me know if I missed any steps or if there is something
> > else
> > I should consider. I have no trouble seeing inform calls when they
> > are
> > added to the core GCC.
> >
> > 2) gdb not detecting .gdbinit in build/gcc:
> > Following Dave's GCC newbies guide, I ran gcc/configure within the
> > gcc
> > subdirectory of the build directory to generate a .gdbinit file.
> > Dave's guide suggested that this file would be automatically detected
> > and run by gdb. However, it appears that GDB is not detecting this
> > .gdbinit file, even after I added the following line to my ~/.gdbinit
> > file:
> >
> > add-auto-load-safe-path /absolute/path/to/build/gcc
>
> Are you invoking gcc from an installed copy, or from the build
> directory?  I think my instructions assume the latter.
>
> >
> > 3) Modeling creation of a new PyObject:
> > Many CPython API calls involve the creation of a new PyObject. To
> > model the creation of a simple PyObject, we can allocate a new heap
> > region using get_or_create_region_for_heap_alloc. We can then create
> > field_regions using get_field_region to associate the newly allocated
> > region to represent fields such as ob_refcnt and ob_type in the
> > PyObject struct. However, one of the parameters to get _field_region
> > is a tree

[PATCH 3/4] rs6000: build constant via li/lis;rldicl/rldicr

2023-06-07 Thread Jiufu Guo via Gcc-patches

Hi,

This patch checks if a constant is possible left/right cleaned on a rotated
value from a negative value of "li/lis".  If so, we can build the constant
through "li/lis ; rldicl/rldicr".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 61 ++-
 .../gcc.target/powerpc/const-build.c  | 44 +
 2 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 03cd9d5e952..2a3fa733b45 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10332,6 +10332,61 @@ can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicl.
+
+   If so, *SHIFT is set to the shift operand of rldicl, and *MASK is set to
+   the mask operand of rldicl, and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_lis_and_rldicl (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Leading zeros may be cleaned by rldicl with a mask.  Change leading zeros
+ to ones and then recheck it.  */
+  int lz = clz_hwi (c);
+  HOST_WIDE_INT unmask_c
+= c | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - lz));
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U >> lz;
+  *shift = n == 0 ? 0 : HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rldicr.
+
+   If so, *SHIFT is set to the shift operand of rldicr, and *MASK is set to
+   the mask operand of rldicr, and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* Tailing zeros may be cleaned by rldicr with a mask.  Change tailing zeros
+ to ones and then recheck it.  */
+  int tz = ctz_hwi (c);
+  HOST_WIDE_INT unmask_c = c | ((HOST_WIDE_INT_1U << tz) - 1);
+  int n;
+  if (can_be_rotated_to_negative_li (unmask_c, )
+  || can_be_rotated_to_negative_lis (unmask_c, ))
+{
+  *mask = HOST_WIDE_INT_M1U << tz;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10378,7 +10433,9 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , )
+  || can_be_built_by_li_lis_and_rldicl (c, , )
+  || can_be_built_by_li_lis_and_rldicr (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
@@ -10387,6 +10444,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (temp, GEN_INT (imm));
   if (shift != 0)
temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  if (mask != HOST_WIDE_INT_M1)
+   temp = gen_rtx_AND (DImode, temp, GEN_INT (mask));
   emit_move_insn (dest, temp);
 }
   else if (ud3 == 0 && ud4 == 0)
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index c38a1dd91f2..8c209921d41 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -46,6 +46,42 @@ lis_rotldi_6 (void)
   return 0x5318LL;
 }
 
+long long NOIPA
+li_rldicl_7 (void)
+{
+  return 0x3ffa1LL;
+}
+
+long long NOIPA
+li_rldicl_8 (void)
+{
+  return 0xff8531LL;
+}
+
+long long NOIPA
+lis_rldicl_9 (void)
+{
+  return 0x00ff8531LL;
+}
+
+long long NOIPA
+li_rldicr_10 (void)
+{
+  return 0x8531fff0LL;
+}
+
+long long NOIPA
+li_rldicr_11 (void)
+{
+  return 0x21f0LL;
+}
+
+long long NOIPA
+lis_rldicr_12 (void)
+{
+  return 0x5310LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -53,9 +89,17 @@ struct fun arr[] = {
   {li_rotldi_4, 0x2194LL},

[PATCH 2/4] rs6000: build constant via lis;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches

Hi,

This patch checks if a constant is possible to be rotated to/from a negative
value from "lis".  If so, we could use "lis;rotldi" to build it.
The positive value of "lis" does not need to be analyzed.  Because if a
constant can be rotated from the positive value of "lis", it also can be
rotated from a positive value of "li".

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
function.
(can_be_built_by_li_and_rotldi): Rename to ...
(can_be_built_by_li_lis_and_rotldi): ... this function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 42 ---
 .../gcc.target/powerpc/const-build.c  | 16 ++-
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 1dd0072350a..03cd9d5e952 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10278,19 +10278,51 @@ can_be_rotated_to_negative_li (HOST_WIDE_INT c, int 
*rot)
   return can_be_rotated_to_lowbits (~c, 15, rot);
 }
 
-/* Check if value C can be built by 2 instructions: one is 'li', another is
-   rotldi.
+/* Check if C can be rotated to a negative value which 'lis' instruction is
+   able to load: 1..1xx0..0.  If so, set *ROT to the number by which C is
+   rotated, and return true.  Return false otherwise.  */
+
+static bool
+can_be_rotated_to_negative_lis (HOST_WIDE_INT c, int *rot)
+{
+  /* case a. 1..1xxx0..01..1: up to 15 x's, at least 16 0's.  */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (middle_zeros >= 16 && leading_ones + tailing_ones >= 33)
+{
+  *rot = HOST_BITS_PER_WIDE_INT - tailing_ones;
+  return true;
+}
+
+  /* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
+ rotated over the highest bit.  */
+  int pos_one = clz_hwi ((c << 16) >> 16);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
+  int middle_ones = clz_hwi (~(c << pos_one));
+  if (middle_zeros >= 16 && middle_ones >= 33)
+{
+  *rot = pos_one;
+  return true;
+}
+
+  return false;
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li or lis',
+   another is rotldi.
 
If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
is set to -1, and return true.  Return false otherwise.  */
 
 static bool
-can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+can_be_built_by_li_lis_and_rotldi (HOST_WIDE_INT c, int *shift,
   HOST_WIDE_INT *mask)
 {
   int n;
   if (can_be_rotated_to_positive_li (c, )
-  || can_be_rotated_to_negative_li (c, ))
+  || can_be_rotated_to_negative_li (c, )
+  || can_be_rotated_to_negative_lis (c, ))
 {
   *mask = HOST_WIDE_INT_M1;
   *shift = HOST_BITS_PER_WIDE_INT - n;
@@ -10346,7 +10378,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
-  else if (can_be_built_by_li_and_rotldi (c, , ))
+  else if (can_be_built_by_li_lis_and_rotldi (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 70f095f6bf2..c38a1dd91f2 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -34,14 +34,28 @@ li_rotldi_4 (void)
   return 0x2194LL;
 }
 
+long long NOIPA
+lis_rotldi_5 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+lis_rotldi_6 (void)
+{
+  return 0x5318LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
   {li_rotldi_3, 0x8531LL},
   {li_rotldi_4, 0x2194LL},
+  {lis_rotldi_5, 0x8531LL},
+  {lis_rotldi_6, 0x5318LL},
 };
 
-/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 6 } } */
 
 int
 main ()
-- 
2.39.1

[PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-06-07 Thread Jiufu Guo via Gcc-patches

Hi,

This patch checks if a constant is possible to be built by "li;rldic".
We only need to take care of "negative li", other forms do not need to check.
For example, "negative lis" is just a "negative li" with an additional shift.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.
---
 gcc/config/rs6000/rs6000.cc   | 61 ++-
 .../gcc.target/powerpc/const-build.c  | 28 +
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2a3fa733b45..cd04b6b5c82 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10387,6 +10387,64 @@ can_be_built_by_li_lis_and_rldicr (HOST_WIDE_INT c, 
int *shift,
   return false;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rldic.
+
+   If so, *SHIFT is set to the 'shift' operand of rldic; and *MASK is set
+   to the mask value about the 'mb' operand of rldic; and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int *shift, HOST_WIDE_INT *mask)
+{
+  /* There are 49 successive ones in the negative value of 'li'.  */
+  int ones = 49;
+
+  /* 1..1xx1..1: negative value of li --> 0..01..1xx0..0:
+ right bits are shifted as 0's, and left 1's(and x's) are cleaned.  */
+  int tz = ctz_hwi (c);
+  int lz = clz_hwi (c);
+  int middle_ones = clz_hwi (~(c << lz));
+  if (tz + lz + middle_ones >= ones)
+{
+  *mask = ((1LL << (HOST_BITS_PER_WIDE_INT - tz - lz)) - 1LL) << tz;
+  *shift = tz;
+  return true;
+}
+
+  /* 1..1xx1..1 --> 1..1xx0..01..1: some 1's(following x's) are cleaned. */
+  int leading_ones = clz_hwi (~c);
+  int tailing_ones = ctz_hwi (~c);
+  int middle_zeros = ctz_hwi (c >> tailing_ones);
+  if (leading_ones + tailing_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1ULL) << tailing_ones);
+  *shift = tailing_ones + middle_zeros;
+  return true;
+}
+
+  /* xx1..1xx: --> xx0..01..1xx: some 1's(following x's) are cleaned. */
+  /* Get the position for the first bit of successive 1.
+ The 24th bit would be in successive 0 or 1.  */
+  HOST_WIDE_INT low_mask = (1LL << 24) - 1LL;
+  int pos_first_1 = ((c & (low_mask + 1)) == 0)
+ ? clz_hwi (c & low_mask)
+ : HOST_BITS_PER_WIDE_INT - ctz_hwi (~(c | low_mask));
+  middle_ones = clz_hwi (~c << pos_first_1);
+  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  if (pos_first_1 < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
+  && middle_ones + middle_zeros >= ones)
+{
+  *mask = ~(((1ULL << middle_zeros) - 1LL)
+   << (HOST_BITS_PER_WIDE_INT - pos_first_1));
+  *shift = HOST_BITS_PER_WIDE_INT - pos_first_1 + middle_zeros;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10435,7 +10493,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 }
   else if (can_be_built_by_li_lis_and_rotldi (c, , )
   || can_be_built_by_li_lis_and_rldicl (c, , )
-  || can_be_built_by_li_lis_and_rldicr (c, , ))
+  || can_be_built_by_li_lis_and_rldicr (c, , )
+  || can_be_built_by_li_and_rldic (c, , ))
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
   unsigned HOST_WIDE_INT imm = (c | ~mask);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
index 8c209921d41..b503ee31c7c 100644
--- a/gcc/testsuite/gcc.target/powerpc/const-build.c
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -82,6 +82,29 @@ lis_rldicr_12 (void)
   return 0x5310LL;
 }
 
+long long NOIPA
+li_rldic_13 (void)
+{
+  return 0x000f8531LL;
+}
+long long NOIPA
+li_rldic_14 (void)
+{
+  return 0x853100ffLL;
+}
+
+long long NOIPA
+li_rldic_15 (void)
+{
+  return 0x8031LL;
+}
+
+long long NOIPA
+li_rldic_16 (void)
+{
+  return 0x8f31LL;
+}
+
 struct fun arr[] = {
   {li_rotldi_1, 0x75310LL},
   {li_rotldi_2, 0x2164LL},
@@ -95,11 +118,16 @@ struct fun arr[] = {
   {li_rldicr_10, 0x8531fff0LL},
   {li_rldicr_11, 0x21f0LL},
   {lis_rldicr_12, 0x5310LL},
+  {li_rldic_13, 0x000f8531LL},
+  {li_rldic_14, 0x853100ffLL},
+  {li_rldic_15, 0x8031LL},
+  {li_rldic_16, 0x8f31LL}
 };
 
 /* { dg-final { scan-assembler-times

[PATCH 1/4] rs6000: build constant via li;rotldi

2023-06-07 Thread Jiufu Guo via Gcc-patches

Hi,

This patch checks if a constant is possible to be rotated to/from a positive
or negative value from "li". If so, we could use "li;rotldi" to build it.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_positive_li): New function.
(can_be_rotated_to_negative_li): New function.
(can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 64 +--
 .../gcc.target/powerpc/const-build.c  | 54 
 2 files changed, 112 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..1dd0072350a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,48 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if C can be rotated to a positive value which 'li' instruction
+   is able to load.  If so, set *ROT to the number by which C is rotated,
+   and return true.  Return false otherwise.  */
+
+static bool
+can_be_rotated_to_positive_li (HOST_WIDE_INT c, int *rot)
+{
+  /* 49 leading zeros and 15 low bits on the positive value
+ generated by 'li' instruction.  */
+  return can_be_rotated_to_lowbits (c, 15, rot);
+}
+
+/* Like can_be_rotated_to_positive_li, but check the negative value of 'li'.  
*/
+
+static bool
+can_be_rotated_to_negative_li (HOST_WIDE_INT c, int *rot)
+{
+  return can_be_rotated_to_lowbits (~c, 15, rot);
+}
+
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to -1, and return true.  Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  int n;
+  if (can_be_rotated_to_positive_li (c, )
+  || can_be_rotated_to_negative_li (c, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10308,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10346,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..70f095f6bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+

[PATCH V2 0/4] rs6000: build constant via li/lis;rldicX

2023-06-07 Thread Jiufu Guo via Gcc-patches

Hi,

These patches are just minor changes based on previous version/comments.
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611286.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620489.html
And also update the wording for patches in this series.

For a given constant, it would be profitable if we can use 2 insns to build.
This patch enables more constants building through 2 insns: one is "li or lis",
another is 'rldicl, rldicr or rldic'.
Through checking and analyzing the characters of the insns "li/lis;rldicX",
all the possible constant values are considered by this patch.

The below patches are in this series.

Considering the functionality and size, 4 patches are split as below:
1. Support the constants which can be built by "li;rotldi"
   Both positive and negative values from insn "li" are analyzed.
2. Support the constants which can be built by "lis;rotldi"
   We only need to analyze the negative value from "lis".
   And this patch uses more code to check leading 1s and tailing 0s from "lis".
3. Support the constants which can be built by "li/lis;rldicl/rldicr":
   Leverage the APIs defined/analyzed in patches 1 and 2,
   this patch checks the characters for the mask of "rldicl/rldicr"
   to support more constants.
4. Support the constants which can be built by "li/lis;rldic":
   The mask of "rldic" is relatively complicated, it is analyzed in this
   patch to support more constants.

BR,
Jeff (Jiufu)

[Bug tree-optimization/110165] [13/14 Regression] wrong code with signed 1 bit integers sometimes since r13-4459-g6508d5e5a1a8

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110165

--- Comment #2 from Andrew Pinski  ---
This produces better gimple:
```
/* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
(for op (bit_xor bit_ior plus)
 (simplify
  (cond (eq zero_one_valued_p@0
integer_zerop)
@1
(op:c @2 @1))
  (if (INTEGRAL_TYPE_P (type)
   && TYPE_PRECISION (type) > 1
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
   (op (mult (bit_and (convert:type @0) { build_one_cst(type); }) @2)
@1

/* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
(for op (bit_xor bit_ior plus)
 (simplify
  (cond (ne zero_one_valued_p@0
integer_zerop)
   (op:c @2 @1)
@1)
  (if (INTEGRAL_TYPE_P (type)
   && TYPE_PRECISION (type) > 1
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
   (op (mult (bit_and (convert:type @0) { build_one_cst(type); }) @2)
@1
```
But it still has extra gimple produced for unsigned 1 bit field:
```
Folded into the sequence:
_7 = (int) _1;
_8 = (int) _1;
_9 = b_5(D) * _8;
_10 = a_4(D) | _9;
statement un-sinked:
a_6 = a_4(D) | b_5(D);
Removing basic block 3
;; basic block 3, loop depth 0
;;  pred:   2
;;  succ:   4


Removing dead stmt:a_6 = a_4(D) | b_5(D);
Removing dead stmt:_7 = (int) _1;
```
Maybe that is ok.

For signed 1-bit this is produced:
```
Folded into the sequence:
_7 = (int) _1;
_8 = _7 & 1;
_9 = b_5(D) * _8;
_10 = a_4(D) | _9;
statement un-sinked:
a_6 = a_4(D) | b_5(D);
Removing basic block 3
;; basic block 3, loop depth 0
;;  pred:   2
;;  succ:   4


Removing dead stmt:a_6 = a_4(D) | b_5(D);
```

I think this is the best I am going to get it.

Re: [PATCH] In the pipeline, UNRECOG INSN is not executed in advance if it starts a live range.

2023-06-07 Thread Jin Ma via Gcc-patches

ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619951.html

Ref: 
http://patchwork.ozlabs.org/project/gcc/patch/20230323080734.423-1-ji...@linux.alibaba.com/

[Bug tree-optimization/110165] [13/14 Regression] wrong code with signed 1 bit integers sometimes since r13-4459-g6508d5e5a1a8

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110165

--- Comment #1 from Andrew Pinski  ---
This will fix the issue:
/* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
(for op (bit_xor bit_ior plus)
 (simplify
  (cond (eq@3 zero_one_valued_p@0
integer_zerop)
@1
(op:c @2 @1))
  (if (INTEGRAL_TYPE_P (type)
   && TYPE_PRECISION (type) > 1
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
   (op (mult (convert:type @3) @2) @1

/* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
(for op (bit_xor bit_ior plus)
 (simplify
  (cond (ne@3 zero_one_valued_p@0
integer_zerop)
   (op:c @2 @1)
@1)
  (if (INTEGRAL_TYPE_P (type)
   && TYPE_PRECISION (type) > 1
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
   (op (mult (convert:type @3) @2) @1

Though it could be improved better. maybe convert followed by & 1 and still @0
rather than @3. still deciding.

[Bug tree-optimization/110166] [14 Regression] wrong code with signed 1-bit integers sometimes since r14-868-gb06cfb62229f

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110166

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Target Milestone|--- |14.0
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-06-08

--- Comment #1 from Andrew Pinski  ---
Forgot to mention this needs -O1 (-O2 compiles it correctly because of another
issue).

MIne.

[Bug tree-optimization/110166] New: [14 Regression] wrong code with signed 1-bit integers sometimes since r14-868-gb06cfb62229f

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110166

Bug ID: 110166
   Summary: [14 Regression] wrong code with signed 1-bit integers
sometimes since r14-868-gb06cfb62229f
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: pinskia at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Full tyestcase:
```
struct s
{
  int t : 1;
  int t1 : 1;
};

[[gnu::noipa]]
int f(struct s t)
{
   int c = t.t;
   int d = t.t1;
   if (c > d)
 t.t = d;
   else
 t.t = c;
  return t.t;
}

int main(void)
{
struct s t;
for(int i = -1;i <= 0; i++)
{
  for(int j = -1;j <= 0; j++)
  {
int r = f((struct s){i, j});
int exp = i < j ? i : j;
if (exp != r)
 __builtin_abort();
  }
}
}
```

Re: Followup on PR/109279: large constants on RISCV

2023-06-07 Thread Jeff Law via Gcc





On 6/1/23 20:38, Vineet Gupta wrote:

Hi Jeff,

I finally got around to collecting various observations on PR/109279 - 
more importantly the state of large constants in RV backend, apologies 
in advance for the long email.


It seems the various commits in area have improved the original test 
case of 0x1010101_01010101


   Before 2e886eef7f2b  |   With 2e886eef7f2b   | With 
0530254413f8 | With c104ef4b5eb1
Right.  The handling of that constant shows a nice progression.  On our 
architecture the latter two versions are probably equivalent from a 
latency standpoint, but the last is obviously best as it's smaller and 
probably better on in-order architectures as well.





But same commits seem to have regressed Andrew's test from same PR 
(which is the theme of this email).

The seemingly contrived test turned out to be much more than I'd hoped for.

    long long f(void)
    {
  unsigned t = 0x101_0101;
  long long t1 = t;
  long long t2 = ((unsigned long long )t) << 32;
  asm("":"+r"(t1));
  return t1 | t2;
    }

[ ... ]
It may be more instructions, but I suspect they end up being the same 
performance for us across all three varaints.  Fusion and out-of-order 
execution save the day.  But I realize there may be targets where the 
first is going to be preferred.






   Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8
     (ideal code)   | define_insn_and_split  | "splitter relaxed new
    |    |  pseudos"
    li   a0,0x101   |    li   a5,0x101   |    li a0,0x101_
    addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101
    slli a5,a0,32   |    mv   a0,a5  |    li a5,0x101_
    or   a0,a0,a5   |    slli a5,a5,32   |    slli a0,a0,32
    ret |    or   a0,a0,a5   |    addi a5,a5,0x101
    |    ret |    or   a0,a5,a0
     |    ret

As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:

[ ... ]
Right. Standard looking synthesis.





Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent 
registers for the fragments of const and reusing the reg.

Right.  That's what I would expect.

[ ... ]




With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() 
kicks in during cse1, eliding insns for a const_int.


    (insn 7 6 8 2 (set (reg:DI 137)
     (const_int [0x1010101])) {*mvconst_internal}
     (expr_list:REG_EQUAL (const_int [0x1010101])))
    [...]

    (insn 11 10 12 2 (set (reg:DI 140)
     (const_int [0x1010101_])) {*mvconst_internal}
     (expr_list:REG_EQUAL (const_int  [0x1010101_]) ))
Understood.  Not ideal, but we generally don't have good ways to limit 
patterns to being available at different times during the optimization 
phase.  One thing you might want to try (which I thought we used at one 
point) was make the pattern conditional on cse_not_expected.  The goal 
would be to avoid exposing the pattern until a later point in the 
optimizer pipeline.  It may have been the case that we dropped that over 
time during development.  It's all getting fuzzy at this point.




Eventually split1 breaks it up using same mvconst_internal splitter, but 
the cse opportunity has been lost.
Right.  I'd have to look at the pass definitions, but I suspect the 
splitting pass where this happens is after the last standard CSE pass. 
So we don't get a chance to CSE the constant synthesis.



*This is a now a baseline for large consts handling for RV backend which 
we all need to be aware of*.
Understood.  Though it's not as bad as you might think :-)  You can 
spend an inordinate amount of time improving constant synthesis, 
generate code that looks really good, but in the end it may not make a 
bit of different in real performance.  Been there, done that.  I'm not 
saying we give up, but we need to keep in mind that we're often better 
off trading a bit on the constant synthesis if doing so helps code where 
those constants get used.






(2) Now on to the nuances as to why things get progressively worse after 
commit 0530254413f8.


It all seems to get down to register allocation passes:

sched1 before 0530254413f8

    ;; 0--> b  0: i  22 r140=0x101    :alu
    ;; 1--> b  0: i  20 r137=0x101    :alu
    ;; 2--> b  0: i  23 r140=r140+0x101   :alu
    ;; 3--> b  0: i  21 r137=r137+0x101   :alu
    ;; 4--> b  0: i  24 r140=r140<<0x20   :alu
    ;; 5--> b  0: i  25 r136=r137 :alu
    ;; 6--> b  0: i   8 r136=asm_operands :nothing
    ;; 7--> b  0: i  17 a0=r136|r140  :alu
    ;; 8--> b  0: i  18 use a0    :nothing

sched1 with 0530254413f8

    ;; 0--> b  0: i  22 r144=0x101    :alu
    ;; 1--> b  0: i  20 r143=0x101    :alu
    ;; 2--> b  0: i  23 r145=r144+0x101   :alu
    ;; 3--> b  0: i  21

Re: Followup on PR/109279: large constants on RISCV

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/1/23 20:38, Vineet Gupta wrote:

Hi Jeff,

I finally got around to collecting various observations on PR/109279 - 
more importantly the state of large constants in RV backend, apologies 
in advance for the long email.


It seems the various commits in area have improved the original test 
case of 0x1010101_01010101


   Before 2e886eef7f2b  |   With 2e886eef7f2b   | With 
0530254413f8 | With c104ef4b5eb1
Right.  The handling of that constant shows a nice progression.  On our 
architecture the latter two versions are probably equivalent from a 
latency standpoint, but the last is obviously best as it's smaller and 
probably better on in-order architectures as well.





But same commits seem to have regressed Andrew's test from same PR 
(which is the theme of this email).

The seemingly contrived test turned out to be much more than I'd hoped for.

    long long f(void)
    {
  unsigned t = 0x101_0101;
  long long t1 = t;
  long long t2 = ((unsigned long long )t) << 32;
  asm("":"+r"(t1));
  return t1 | t2;
    }

[ ... ]
It may be more instructions, but I suspect they end up being the same 
performance for us across all three varaints.  Fusion and out-of-order 
execution save the day.  But I realize there may be targets where the 
first is going to be preferred.






   Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8
     (ideal code)   | define_insn_and_split  | "splitter relaxed new
    |    |  pseudos"
    li   a0,0x101   |    li   a5,0x101   |    li a0,0x101_
    addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101
    slli a5,a0,32   |    mv   a0,a5  |    li a5,0x101_
    or   a0,a0,a5   |    slli a5,a5,32   |    slli a0,a0,32
    ret |    or   a0,a0,a5   |    addi a5,a5,0x101
    |    ret |    or   a0,a5,a0
     |    ret

As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:

[ ... ]
Right. Standard looking synthesis.





Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent 
registers for the fragments of const and reusing the reg.

Right.  That's what I would expect.

[ ... ]




With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() 
kicks in during cse1, eliding insns for a const_int.


    (insn 7 6 8 2 (set (reg:DI 137)
     (const_int [0x1010101])) {*mvconst_internal}
     (expr_list:REG_EQUAL (const_int [0x1010101])))
    [...]

    (insn 11 10 12 2 (set (reg:DI 140)
     (const_int [0x1010101_])) {*mvconst_internal}
     (expr_list:REG_EQUAL (const_int  [0x1010101_]) ))
Understood.  Not ideal, but we generally don't have good ways to limit 
patterns to being available at different times during the optimization 
phase.  One thing you might want to try (which I thought we used at one 
point) was make the pattern conditional on cse_not_expected.  The goal 
would be to avoid exposing the pattern until a later point in the 
optimizer pipeline.  It may have been the case that we dropped that over 
time during development.  It's all getting fuzzy at this point.




Eventually split1 breaks it up using same mvconst_internal splitter, but 
the cse opportunity has been lost.
Right.  I'd have to look at the pass definitions, but I suspect the 
splitting pass where this happens is after the last standard CSE pass. 
So we don't get a chance to CSE the constant synthesis.



*This is a now a baseline for large consts handling for RV backend which 
we all need to be aware of*.
Understood.  Though it's not as bad as you might think :-)  You can 
spend an inordinate amount of time improving constant synthesis, 
generate code that looks really good, but in the end it may not make a 
bit of different in real performance.  Been there, done that.  I'm not 
saying we give up, but we need to keep in mind that we're often better 
off trading a bit on the constant synthesis if doing so helps code where 
those constants get used.






(2) Now on to the nuances as to why things get progressively worse after 
commit 0530254413f8.


It all seems to get down to register allocation passes:

sched1 before 0530254413f8

    ;; 0--> b  0: i  22 r140=0x101    :alu
    ;; 1--> b  0: i  20 r137=0x101    :alu
    ;; 2--> b  0: i  23 r140=r140+0x101   :alu
    ;; 3--> b  0: i  21 r137=r137+0x101   :alu
    ;; 4--> b  0: i  24 r140=r140<<0x20   :alu
    ;; 5--> b  0: i  25 r136=r137 :alu
    ;; 6--> b  0: i   8 r136=asm_operands :nothing
    ;; 7--> b  0: i  17 a0=r136|r140  :alu
    ;; 8--> b  0: i  18 use a0    :nothing

sched1 with 0530254413f8

    ;; 0--> b  0: i  22 r144=0x101    :alu
    ;; 1--> b  0: i  20 r143=0x101    :alu
    ;; 2--> b  0: i  23 r145=r144+0x101   :alu
    ;; 3--> b  0: i  21

[Bug tree-optimization/110165] [13/14 Regression] wrong code with signed 1 bit integers sometimes since r13-4459-g6508d5e5a1a8

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110165

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-06-07
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Target Milestone|--- |13.2

[Bug tree-optimization/110165] New: [13/14 Regression] wrong code with signed 1 bit integers sometimes since r13-4459-g6508d5e5a1a8

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110165

Bug ID: 110165
   Summary: [13/14 Regression] wrong code with signed 1 bit
integers sometimes since r13-4459-g6508d5e5a1a8
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: pinskia at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Full testcase:

struct s
{
  int t : 1;
};

[[gnu::noipa]]
int f(struct s t, int a, int b)
{
int bd = t.t;
if (bd) a|=b;
return a;
}

int main(void)
{
struct s t;
for(int i = 0;i <= 1; i++)
{
int a = 0x10;
int b = 0x0f;
int c = a | b;
int r = f((struct s){i}, a, b);
int exp = i == 1 ? a | b : a;
if (exp != r)
 __builtin_abort();
}
}
```

Found while improving these match patterns.

Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Andrew Pinski via Gcc-patches

On Wed, Jun 7, 2023 at 4:11 PM Jeff Law  wrote:
>
>
>
> On 6/7/23 17:05, Andrew Pinski wrote:
> > On Wed, Jun 7, 2023 at 3:57 PM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:
> >>> Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
> >>> already,
> >>> it is better if we don't do a secondary transformation. This reduces the 
> >>> extra
> >>> statements produced by match-and-simplify on the gimple level too.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
> >>>multiply rather than negation/bit_and.
> >> Don't you need to check the types in a manner similar to what the A & -Y
> >> -> X * Y pattern does before you make this transformation?
> >
> > No, because the convert is in a different order than in that
> > transformation; a very subtle difference which makes it work.
> >
> > In A & -Y it was matching:
> > (bit_and  (convert? (negate
> > But here we have:
> > (bit_and (negate (convert
> > Notice the convert is in a different location, in the `A & -Y` case,
> > the convert needs to be a sign extending (or a truncation) of the
> > negative value. Here we are converting the one_zero_value to the new
> > type so we get zero_one in the new type and then doing the negation
> > getting us 0 or -1 value.
> THanks for the clarification.  OK for the trunk.

So even though my transformation is correct based on what was done in
match.pd but that was broken already for signed one bit integers:
```
struct s
{
  int t : 1;
};
int f(struct s t, int a, int b)
{
int bd = t.t;
if (bd) a|=b;
return a;
}
```
I am going to withdraw this patch and fix that up first.

Thanks,
Andrew

>
> jeff

[Bug target/105617] [12/13/14 Regression] Slp is maybe too aggressive in some/many cases

2023-06-07 Thread already5chosen at yahoo dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617

--- Comment #19 from Michael_S  ---
(In reply to Mason from comment #18)
> Hello Michael_S,
> 
> As far as I can see, massaging the source helps GCC generate optimal code
> (in terms of instruction count, not convinced about scheduling).
> 
> #include 
> typedef unsigned long long u64;
> void add4i(u64 dst[4], const u64 A[4], const u64 B[4])
> {
>   unsigned char c = 0;
>   c = _addcarry_u64(c, A[0], B[0], dst+0);
>   c = _addcarry_u64(c, A[1], B[1], dst+1);
>   c = _addcarry_u64(c, A[2], B[2], dst+2);
>   c = _addcarry_u64(c, A[3], B[3], dst+3);
> }
> 
> 
> On godbolt, gcc-{11.4, 12.3, 13.1, trunk} -O3 -march=znver1 all generate
> the expected:
> 
> add4i:
> movq(%rdx), %rax
> addq(%rsi), %rax
> movq%rax, (%rdi)
> movq8(%rsi), %rax
> adcq8(%rdx), %rax
> movq%rax, 8(%rdi)
> movq16(%rsi), %rax
> adcq16(%rdx), %rax
> movq%rax, 16(%rdi)
> movq24(%rdx), %rax
> adcq24(%rsi), %rax
> movq%rax, 24(%rdi)
> ret
> 
> I'll run a few benchmarks to test optimal scheduling.

That's not merely "massaging the source". That's changing semantics.
Think about what happens when dst points to the middle of A or of B.
The change of semantics effectively prevented vectorizer from doing harm.

And yes, for common non-aliasing case the scheduling is problematic, too. 
It would probably not cause slowdown on the latest and greatest cores, but
could be slow on less great cores, including your default Zen1.

Re: [PATCH 3/3] Add Plus to the op list of `(zero_one == 0) ? y : z y` pattern

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

This adds plus to the op list of `(zero_one == 0) ? y : z  y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/97711
PR tree-optimization/110155

gcc/ChangeLog:

* match.pd ((zero_one == 0) ? y : z  y): Add plus to the op.
((zero_one != 0) ? z  y : y): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.

OK
jeff

Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 17:05, Andrew Pinski wrote:

On Wed, Jun 7, 2023 at 3:57 PM Jeff Law via Gcc-patches
 wrote:




On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

   * match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
   multiply rather than negation/bit_and.

Don't you need to check the types in a manner similar to what the A & -Y
-> X * Y pattern does before you make this transformation?


No, because the convert is in a different order than in that
transformation; a very subtle difference which makes it work.

In A & -Y it was matching:
(bit_and  (convert? (negate
But here we have:
(bit_and (negate (convert
Notice the convert is in a different location, in the `A & -Y` case,
the convert needs to be a sign extending (or a truncation) of the
negative value. Here we are converting the one_zero_value to the new
type so we get zero_one in the new type and then doing the negation
getting us 0 or -1 value.

THanks for the clarification.  OK for the trunk.

jeff

[nvptx PATCH] Update nvptx's bitrev2 pattern to use BITREVERSE rtx.

2023-06-07 Thread Roger Sayle


This minor tweak to the nvptx backend switches the representation of
of the brev instruction from an UNSPEC to instead use the new BITREVERSE
rtx.  This allows various RTL optimizations including evaluation (constant
folding) of integer constant arguments at compile-time.

This patch has been tested on nvptx-none with make and make -k check
with no new failures.  Ok for mainline?


2023-06-07  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (UNSPEC_BITREV): Delete.
(bitrev2): Represent using bitreverse.


Thanks in advance,
Roger
--

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1bb9304..7a7c994 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -34,8 +34,6 @@
UNSPEC_FPINT_CEIL
UNSPEC_FPINT_NEARBYINT
 
-   UNSPEC_BITREV
-
UNSPEC_ALLOCA
 
UNSPEC_SET_SOFTSTACK
@@ -636,8 +634,7 @@
 
 (define_insn "bitrev2"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
-   (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
-UNSPEC_BITREV))]
+   (bitreverse:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
   ""
   "%.\\tbrev.b%T0\\t%0, %1;")

Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Andrew Pinski via Gcc-patches

On Wed, Jun 7, 2023 at 3:57 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:
> > Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
> > already,
> > it is better if we don't do a secondary transformation. This reduces the 
> > extra
> > statements produced by match-and-simplify on the gimple level too.
> >
> > gcc/ChangeLog:
> >
> >   * match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
> >   multiply rather than negation/bit_and.
> Don't you need to check the types in a manner similar to what the A & -Y
> -> X * Y pattern does before you make this transformation?

No, because the convert is in a different order than in that
transformation; a very subtle difference which makes it work.

In A & -Y it was matching:
(bit_and  (convert? (negate
But here we have:
(bit_and (negate (convert
Notice the convert is in a different location, in the `A & -Y` case,
the convert needs to be a sign extending (or a truncation) of the
negative value. Here we are converting the one_zero_value to the new
type so we get zero_one in the new type and then doing the negation
getting us 0 or -1 value.

Thanks,
Andrew

>
> jeff
>

Re: [PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

* match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
multiply rather than negation/bit_and.
Don't you need to check the types in a manner similar to what the A & -Y 
-> X * Y pattern does before you make this transformation?


jeff

[Committed] Bug fix to new wi::bitreverse_large function.

2023-06-07 Thread Roger Sayle


Richard Sandiford was, of course, right to be warry of new code without
much test coverage.  Converting the nvptx backend to use the BITREVERSE
rtx infrastructure, has resulted in far more exhaustive testing and
revealed a subtle bug in the new wi::bitreverse implementation.  The
code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended
sign extension.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(with a minor tweak to use BITREVERSE), where it fixes regressions of
the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit
test vectors in gcc.target/nvptx/brevll-2.c.  Committed as obvious.


2023-06-07  Roger Sayle  

gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to
avoid sign extension/undefined behaviour when setting each bit.


Thanks,
Roger
--

diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 24bdce2..ab92ee6 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -786,7 +786,7 @@ wi::bitreverse_large (HOST_WIDE_INT *val, const 
HOST_WIDE_INT *xval,
  unsigned int d = (precision - 1) - s;
  block = d / HOST_BITS_PER_WIDE_INT;
  offset = d & (HOST_BITS_PER_WIDE_INT - 1);
-  val[block] |= 1 << offset;
+  val[block] |= HOST_WIDE_INT_1U << offset;
}
 }

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> This is basically PR 102138 .

Except it works at -O1 because the cast is pushed out of the phi by phiopt but
the cast is the same as a & 1 here :(.

For comment #0 we could just match this for unsigned type
a_2(D) > 1 ? 0 : a_2(D) == a_2(D) <= 1 ? a_2(D) : 0 -> (unsigned)(a == 1)

For comment #3 we need to pattern match this now:
  _7 = (_Bool) a_6(D);
  _9 = a_6(D) <= 1;
  _10 = _7 & _9;

Re: [PATCH 1/3] MATCH: Allow unsigned types for `X & -Y -> X * Y` pattern

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 15:32, Andrew Pinski via Gcc-patches wrote:

This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.

branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a) in there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.

OK.
jeff

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=102138

--- Comment #5 from Andrew Pinski  ---
This is basically PR 102138 .

gcc-10-20230607 is now available

2023-06-07 Thread GCC Administrator via Gcc

Snapshot gcc-10-20230607 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20230607/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision 27d409610fa67b6c296ef29c4585e699a4e32414

You'll find:

 gcc-10-20230607.tar.xz   Complete GCC

  SHA256=06103cc3251daff66de5df74a31f41bca3ceccc7548a9bedaeddd00b86d209d7
  SHA1=69792cbf3a6daaa3d805f32d22308ca23cd654cd

Diffs from 10-20230531 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

[Bug tree-optimization/109791] -Wstringop-overflow warning with -O3 and _GLIBCXX_USE_CXX11_ABI=0

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109791

--- Comment #17 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #16)
> (In reply to Richard Biener from comment #15)
> > Created attachment 55155 [details]
> > patch unfolding such PHIs
> > 
> > Updated PHI unfolding patch.  Tests fine besides mentioned diagnostic
> > regressions.
> 
> I was looking into doing the opposite in forwprop but maybe I can skip
> addresses.

Oh yes I see it was mentioned before in PR 102138.

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-07 Thread Jeff Law via Gcc-patches





On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

 * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
Thanks for the clarification.  This is OK for the trunk.  It looks 
generic enough to have value going forward now rather than waiting.


jeff

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-07 Thread Jeff Law via Gcc-patches





On 5/31/23 06:15, Manolis Tsamis wrote:

On Thu, May 25, 2023 at 4:38 PM Jeff Law  wrote:




On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

  * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.

I can't see how this can be correct given the stack pointer equality
tests elsewhere in the compiler, particularly the various targets.

The problem is if you change the mode then you end up with multiple REG
expressions that reference the stack pointer.

See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around
the change which introduced this code.



Hi Jeff,

Isn't this fine for this case since:

   1) stack_pointer_rtx is used which won't cause issues with pointer
equalities (If I understand correctly).
   2) Propagation is guarded with `if (orig_mode == new_mode)` so only
when there is no mode change.
I must have missed #2 -- is that something that changed since the first 
iteration for Ventana many months ago?


Anyway, hoping to make meaningful progress on these two patches over the 
next couple days.


jeff

[Bug c++/58487] Missed return value optimization

2023-06-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58487

--- Comment #7 from CVS Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:28db36e2cfca1b7106adc8d371600fa3a325c4e2

commit r14-1624-g28db36e2cfca1b7106adc8d371600fa3a325c4e2
Author: Jason Merrill 
Date:   Wed Jun 7 05:15:02 2023 -0400

c++: allow NRV and non-NRV returns [PR58487]

Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later
return
expression can't possibly alias it.

This fixes 58487 and half-fixes 53637: now one of the returns is elided,
but
not the other.

Fixing the remaining xfails in these testcases will require a very
different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.

PR c++/58487
PR c++/53637

gcc/cp/ChangeLog:

* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.

[Bug c++/53637] NRVO not applied where there are two different variables involved

2023-06-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53637

--- Comment #10 from CVS Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:28db36e2cfca1b7106adc8d371600fa3a325c4e2

commit r14-1624-g28db36e2cfca1b7106adc8d371600fa3a325c4e2
Author: Jason Merrill 
Date:   Wed Jun 7 05:15:02 2023 -0400

c++: allow NRV and non-NRV returns [PR58487]

Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later
return
expression can't possibly alias it.

This fixes 58487 and half-fixes 53637: now one of the returns is elided,
but
not the other.

Fixing the remaining xfails in these testcases will require a very
different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.

PR c++/58487
PR c++/53637

gcc/cp/ChangeLog:

* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.

[pushed] c++: allow NRV and non-NRV returns [PR58487]

2023-06-07 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later return
expression can't possibly alias it.

This fixes 58487 and half-fixes 53637: now one of the returns is elided, but
not the other.

Fixing the remaining xfails in these testcases will require a very different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.

PR c++/58487
PR c++/53637

gcc/cp/ChangeLog:

* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.
---
 gcc/cp/cp-tree.h  |  5 +
 gcc/cp/name-lookup.h  |  1 +
 gcc/cp/name-lookup.cc | 22 ++
 gcc/cp/semantics.cc   |  8 +++
 gcc/cp/typeck.cc  | 37 ---
 gcc/testsuite/g++.dg/opt/nrv26.C  | 19 
 gcc/testsuite/g++.dg/opt/nrv26a.C | 18 +++
 gcc/testsuite/g++.dg/opt/nrv27.C  | 23 +++
 8 files changed, 121 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/nrv26.C
 create mode 100644 gcc/testsuite/g++.dg/opt/nrv26a.C
 create mode 100644 gcc/testsuite/g++.dg/opt/nrv27.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 87572e3574d..83982233111 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -444,6 +444,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   REINTERPRET_CAST_P (in NOP_EXPR)
   ALIGNOF_EXPR_STD_P (in ALIGNOF_EXPR)
   OVL_DEDUP_P (in OVERLOAD)
+  INIT_EXPR_NRV_P (in INIT_EXPR)
   ATOMIC_CONSTR_MAP_INSTANTIATED_P (in ATOMIC_CONSTR)
   contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
1: IDENTIFIER_KIND_BIT_1 (in IDENTIFIER_NODE)
@@ -4078,6 +4079,10 @@ struct GTY(()) lang_decl {
 #define DELETE_EXPR_USE_VEC(NODE) \
   TREE_LANG_FLAG_1 (DELETE_EXPR_CHECK (NODE))
 
+/* True iff this represents returning a potential named return value.  */
+#define INIT_EXPR_NRV_P(NODE) \
+  TREE_LANG_FLAG_0 (INIT_EXPR_CHECK (NODE))
+
 #define CALL_OR_AGGR_INIT_CHECK(NODE) \
   TREE_CHECK2 ((NODE), CALL_EXPR, AGGR_INIT_EXPR)
 
diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h
index b3e708561d8..613745ba501 100644
--- a/gcc/cp/name-lookup.h
+++ b/gcc/cp/name-lookup.h
@@ -449,6 +449,7 @@ extern void resort_type_member_vec (void *, void *,
 extern vec *set_class_bindings (tree, int extra = 0);
 extern void insert_late_enum_def_bindings (tree, tree);
 extern tree innermost_non_namespace_value (tree);
+extern bool decl_in_scope_p (tree);
 extern cxx_binding *outer_binding (tree, cxx_binding *, bool);
 extern void cp_emit_debug_info_for_using (tree, tree);
 
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index eb5c333b5ea..b8ca7306a28 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -7451,6 +7451,28 @@ innermost_non_namespace_value (tree name)
   return binding ? binding->value : NULL_TREE;
 }
 
+/* True iff current_binding_level is within the potential scope of local
+   variable DECL. */
+
+bool
+decl_in_scope_p (tree decl)
+{
+  gcc_checking_assert (DECL_FUNCTION_SCOPE_P (decl));
+
+  tree name = DECL_NAME (decl);
+
+  for (cxx_binding *iter = NULL;
+   (iter = outer_binding (name, iter, /*class_p=*/false)); )
+{
+  if (!LOCAL_BINDING_P (iter))
+   return false;
+  if (iter->value == decl)
+   return true;
+}
+
+  return false;
+}
+
 /* Look up NAME in the current binding level and its superiors in the
namespace of variables, functions and typedefs.  Return a ..._DECL
node of some kind representing its definition if there is only one
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 1d397b6f257..a2e74a5d2c7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4956,7 +4956,7 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
   /* If there's a label, we might need to destroy the NRV on goto (92407).  */
   else if (TREE_CODE (*tp) == LABEL_EXPR)
 dp->simple = false;
-  /* Change all returns to just refer to the RESULT_DECL; this is a nop,
+  /* Change NRV returns to just refer to the RESULT_DECL; this is a nop,
  but differs from using NULL_TREE in that it indicates that we care
  about the value of the RESULT_DECL.  But preserve anything appended
  by check_return_expr.  */
@@ -4965,9 +4965,9 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
   tree *p = _OPERAND (*tp, 0);
   while

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Joseph Myers

On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:

> Are you suggesting to use identifier directly as the argument of the 
> attribute?
> I tried this in the beginning, however, the current parser for the attribute 
> argument can not identify that this identifier is a field identifier inside 
> the same structure. 
> 
> For example:
> 
> int count;
> struct trailing_array_7 {
>   Int count;
>   int array_7[] __attribute ((element_count (count))); 
> };
> 
> The identifier “count” inside the attribute will refer to the variable 
> “int count” outside of the structure.

c_parser_attribute_arguments is supposed to allow an identifier as an 
attribute argument - and not look it up (the user of the attribute would 
later need to look it up in the context of the containing structure).  
Callers use attribute_takes_identifier_p to determine which attributes 
take identifiers (versus expressions) as arguments, which would need 
updating to cover the new attribute.

There is a ??? comment about the case where the identifier is declared as 
a type name.  That would simply be one of the cases carried over from the 
old Bison parser, and it would seem reasonable to remove that 
special-casing so that the attribute works even when the identifier is 
declared as a typedef name as an ordinary identifier, since it's fine for 
structure members to have the same name as a typedef name.

Certainly taking an identifier directly seems like cleaner syntax than 
taking a string that then needs reinterpreting as an identifier.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: An overview of the analyzer support of the operator new

2023-06-07 Thread David Malcolm via Gcc

On Wed, 2023-06-07 at 19:19 +0200, Benjamin Priour wrote:
> Hi,
> 
> I've been mapping where the analyzer is lacking support of the
> operator new
> different variants.
> I've written a bunch of test cases already to demonstrate it, you can
> find
> them below.
> They are not yet formatted for a patch submission, and as some of
> them may
> require new warnings, I didn't use dg-* directives either.

You can always mark the dg-directives with "xfail" and a comment if the
warning isn't implemented yet.

> You will notice I included true positives and negatives as well, as I
> think
> they might spur ideas on some edge cases that may fail.
> All that to say I would greatly appreciate your comments if any test
> is
> wrong, or if you have pointers on additional test cases.

Looks great.

Note that the results might be affected by exceptions; do any results
change for -fexceptions versus -fno-exceptions?

> You can also find a godbolt  here.
> 
> The most annoying one is the recurrent noisy false positive
> -Wanalyzer-possible-null-argument on usage of a new expression.
> Although a placement new on a static buffer too short is flagged by
> the
> middle-end, the analyzer stay quiet.
> A placement on a dynamic buffer too short to contain the placement is
> never
> reported however. See PR105948
> 

Yeah; looks like that will need some extra code in the analyzer to
implement; you can ask the region_model what the capacity of the region
is; do you have access to the required size of the region at the
placement new call?  If so then the implementation should be very
similar as -Wanalyzer-out-of-bounds (or reuse it?)

Dave

> 
> Thanks,
> Benjamin
> 
> #include 
> 
> struct A
> {
> int x = 4;
> int y = 6;
> };
> 
> void test1()
> {
> int *x = ::new int; // true negative on -Wanalyzer-possible-null-
> argument
> int *arr = ::new int[3]; // true negative on
> -Wanalyzer-possible-null-argument
> A *a = ::new A(); // false positive -Wanalyzer-possible-null-argument
> (a
> throwing new cannot returns null)
> ::delete a;
> ::delete x;
> ::delete[] arr;
> }
> 
> void test_allocators_mismatch()
> {
> int *a = ::new int;
> int *b = ::new int[3];
> 
> ::delete[] a; /* true positive -Wanalyzer-mismatching-deallocation
> flagged
> */
> ::delete b; /* true positive -Wanalyzer-mismatching-deallocation
> flagged */
> }
> 
> // From clang core.uninitialized.NewArraySize
> void test_garbage_new_array()
> {
> int n;
> int *arr = ::new int[n]; /* true positive
> -Wanalyzer-use-of-uninitialized-value reported for 'n' */
> /* however nothing is reported for 'arr', even with
> '-fno-analyzer-suppress-followups', one could expect a specific
> warning */
> ::delete[] arr; /* no warnings here either */
> }
> 
> void test_placement()
> {
> void *chunk = ::operator new(20); // true negative
> -Wanalyzer-possible-null-dereference
> A *a = ::new (chunk) A();
> a->~A();
> ::operator delete(chunk);
> }
> 
> void test_delete_placement()
> {
> A *a = ::new A; // false positive -Wanalyzer-possible-null-argument
> (throwing new)
> int *z = ::new (>y) int;
> a->~A(); // deconstruct properly
> ::operator delete(a);
> ::operator delete(z); // nothing from analyzer but got
> -Wfree-nonheap-object, even though analyzer also has
> Wanalyzer-free-of-non-heap
> }
> 
> void test_write_placement_after_delete()
> {
> short *s = ::new short;
> long *lp = ::new (s) long;
> ::delete s;
> *lp = 12; // true positive -Wanalyzer-use-after-free flagged, as well
> as a
> wrong -Wanalyzer-null-dereference of lp
> }
> 
> void test_read_placement_after_delete()
> {
> short *s = ::new short;
> long *lp = ::new (s) long;
> ::delete s;
> long m = *lp; // true positive -Wanalyzer-use-after-free flagged, as
> well
> as a wrong -Wanalyzer-null-dereference of lp
> }
> 
> void test_use_placement_after_destruction()
> {
> A a;
> int *lp = ::new () int;
> a.~A();
> int m = *lp; /* true positive -Wanalyzer-use-of-uninitialized-value,
> nothing about use-after-delete though */
> }
> 
> // From clang cplusplus.PlacementNewChecker
> void test_placement_size_static()
> {
> short s;
> long *lp = ::new () long; /* nothing from analyzer, but still got
> -Wplacement-new= */
> }
> 
> void test_placement_size_dynamic()
> {
> short *s = ::new short;
> long *lp = ::new (s) long; // Nothing reported here at all, would
> expect a
> -Wanalyzer-placement-new=
> ::delete s;
> }
> 
> void test_placement_null()
> {
> int *x = nullptr;
> int *p = ::new (x) int; // Placement new on NULL is undefined, yet
> nothing
> is reported.
> ::operator delete(x);
> }
> 
> void test_initialization_through_placement()
> {
> int x;
> int *p = ::new () int;
> *p = 10;
> int z = x + 2; // Everything is fine, no warning emitted
> }

[Bug c++/110164] Improve diagnostic for incomplete standard library types due to missing include

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110164

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Last reconfirmed||2023-06-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=39730

--- Comment #1 from Andrew Pinski  ---
Confirmed.

Re: On inform diagnostics in plugins, support scripts for gdb and modeling creation of PyObjects for static analysis

2023-06-07 Thread David Malcolm via Gcc

On Wed, 2023-06-07 at 16:21 -0400, Eric Feng wrote:
> Hi everyone,
> 
> I am one of the GSoC participants this year — in particular, I am
> working on a static analyzer plugin for CPython extension module
> code.
> I'm encountering a few challenges and would appreciate any guidance
> on
> the following issues:
> 
> 1) Issue with "inform" diagnostics in the plugin:
> I am currently unable to see any "inform" messages from my plugin
> when
> compiling test programs with the plugin enabled. As per the structure
> of existing analyzer plugins, I have included the following code in
> the plugin_init function:
> 
> #if ENABLE_ANALYZER
>     const char *plugin_name = plugin_info->base_name;
>     if (0)
>     inform(input_location, "got here; %qs", plugin_name);

If that's the code, does it work if you get rid of the "if (0)"
conditional, or change it to "if (1)"?  As written, that guard is
false, so that call to "inform" will never be executed.

>     register_callback(plugin_info->base_name,
>   PLUGIN_ANALYZER_INIT,
>   ana::cpython_analyzer_init_cb,
>   NULL);
> #else
>     sorry_no_analyzer();
> #endif
>     return 0;
> 
> I expected to see the "got here" message (among others in other areas
> of the plugin) when compiling test programs but haven't observed any
> output. I also did not observe the "sorry" diagnostic. I am compiling
> a simple CPython extension module with the plugin loaded like so:
> 
> gcc-dev -S -fanalyzer -fplugin=/path/to/cpython_plugin.so
> -I/usr/include/python3.9 -lpython3.9 -x c refcount6.c

Looks reasonable.

> 
> Additionally, I compiled the plugin following the steps outlined in
> the GCC documentation for plugin building
> (https://gcc.gnu.org/onlinedocs/gccint/Plugins-building.html):
> 
> g++-dev -shared -I/home/flappy/gcc_/gcc/gcc
> -I/usr/local/lib/gcc/aarch64-unknown-linux-gnu/14.0.0/plugin/include
> -fPIC -fno-rtti -O2 analyzer_cpython_plugin.c -o cpython_plugin.so
> 
> Please let me know if I missed any steps or if there is something
> else
> I should consider. I have no trouble seeing inform calls when they
> are
> added to the core GCC.
> 
> 2) gdb not detecting .gdbinit in build/gcc:
> Following Dave's GCC newbies guide, I ran gcc/configure within the
> gcc
> subdirectory of the build directory to generate a .gdbinit file.
> Dave's guide suggested that this file would be automatically detected
> and run by gdb. However, it appears that GDB is not detecting this
> .gdbinit file, even after I added the following line to my ~/.gdbinit
> file:
> 
> add-auto-load-safe-path /absolute/path/to/build/gcc

Are you invoking gcc from an installed copy, or from the build
directory?  I think my instructions assume the latter.

> 
> 3) Modeling creation of a new PyObject:
> Many CPython API calls involve the creation of a new PyObject. To
> model the creation of a simple PyObject, we can allocate a new heap
> region using get_or_create_region_for_heap_alloc. We can then create
> field_regions using get_field_region to associate the newly allocated
> region to represent fields such as ob_refcnt and ob_type in the
> PyObject struct. However, one of the parameters to get _field_region
> is a tree representing the field (e.g ob_refcnt). I'm currently
> wondering how we may retrieve this information. My intuition is that
> it would be fairly easy if we can first get a tree representation of
> the PyObject struct. Since we include the relevant headers when
> compiling CPython extension modules (e.g., -I/usr/include/python3.9),
> I wonder if there is a way to "look up" the tree representation of
> PyObject from the included headers. This information may also be
> important for obtaining a svalue representing the size of the
> PyObject
> in get_or_create_region_for_heap_alloc. If there is no way to "look
> up" a tree representation of PyObject as described in the included
> Python header files, does it make sense for us to just create a tree
> representation manually for this task? Please let me know if this
> approach makes sense and if so where I could look into to get the
> required information.

Don't attempt to build the struct by hand; we want to look up the
struct from the user's headers.  There are at least two ABIs for
PyObject, so we want to be sure we're using the correct one.

IIRC, to look things up by name, that's generally a frontend thing,
since every language has its own concept of scopes/namespaces/etc.

It sounds like you want to look for a type in the global scope of the
C/C++ FE with the name "PyObject".

We currently have some hooks in the analyzer for getting constants from
the frontends; see analyzer-language.cc, where the frontend calls
on_finish_translation_unit, where the analyzer queries the FE for the
named constants that will be of interest during analysis.  Maybe we can
extend this so that we have a way to look up named types there, and
stash the tree for later use, and thus your plugin could ask

[Bug c++/110164] New: Improve diagnostic for incomplete standard library types due to missing include

2023-06-07 Thread rs2740 at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110164

Bug ID: 110164
   Summary: Improve diagnostic for incomplete standard library
types due to missing include
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rs2740 at gmail dot com
  Target Milestone: ---

If I forget to include a header before using a sufficiently well-known standard
library type, GCC helpfully reminds me of the header:

$ echo 'std::array x;' | g++ -x c++ -

:1:6: error: ‘array’ in namespace ‘std’ does not name a template type
:1:1: note: ‘std::array’ is defined in header ‘’; did you forget
to ‘#include ’?

But if I happen to have a different standard library header included that
happens to bring in a forward declaration of the type, the error message is
less helpful:

$ echo -e '#include \nstd::array x;' | g++ -x c++ -

:2:21: error: aggregate ‘std::array x’ has incomplete type and
cannot be defined

It would be nice if the latter case also has a hint about the potential missing
include.

[Bug tree-optimization/110155] Missing if conversion

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110155

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||patch
URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-June/62
   ||0985.html

--- Comment #4 from Andrew Pinski  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620985.html

[Bug tree-optimization/97711] Failure to optimise "x & 1 ? x - 1 : x" to "x & -2"

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97711

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||patch
URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-June/62
   ||0985.html

--- Comment #9 from Andrew Pinski  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620985.html

[PATCH 3/3] Add Plus to the op list of `(zero_one == 0) ? y : z y` pattern

2023-06-07 Thread Andrew Pinski via Gcc-patches

This adds plus to the op list of `(zero_one == 0) ? y : z  y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/97711
PR tree-optimization/110155

gcc/ChangeLog:

* match.pd ((zero_one == 0) ? y : z  y): Add plus to the op.
((zero_one != 0) ? z  y : y): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.
---
 gcc/match.pd   |  4 ++--
 .../gcc.dg/tree-ssa/branchless-cond-add-2.c|  8 
 .../gcc.dg/tree-ssa/branchless-cond-add.c  | 18 ++
 3 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c38b39fb45c..f633271f76c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3689,7 +3689,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (max @2 @1))
 
 /* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
-(for op (bit_xor bit_ior)
+(for op (bit_xor bit_ior plus)
  (simplify
   (cond (eq zero_one_valued_p@0
 integer_zerop)
@@ -3701,7 +3701,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(op (mult (convert:type @0) @2) @1
 
 /* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
-(for op (bit_xor bit_ior)
+(for op (bit_xor bit_ior plus)
  (simplify
   (cond (ne zero_one_valued_p@0
 integer_zerop)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c
new file mode 100644
index 000..27607e10f88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/97711 */
+
+int f (int x) { return x & 1 ? x - 1 : x; }
+
+/* { dg-final { scan-tree-dump-times " & -2" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c
new file mode 100644
index 000..0d81c07b03a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond-add.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/110155 */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z + y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z + y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " \\\*" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\+ " 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
-- 
2.31.1

[PATCH 2/3] Change the `zero_one ==/!= 0) ? y : z y` patterns to use multiply rather than `(-zero_one) & z`

2023-06-07 Thread Andrew Pinski via Gcc-patches

Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` 
already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

* match.pd (`zero_one ==/!= 0) ? y : z  y`): Use
multiply rather than negation/bit_and.
---
 gcc/match.pd | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 7b95b63cee4..c38b39fb45c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3688,7 +3688,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
-/* (zero_one == 0) ? y : z  y -> (-(typeof(y))zero_one & z)  y */
+/* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (eq zero_one_valued_p@0
@@ -3698,9 +3698,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (bit_and (negate (convert:type @0)) @2) @1
+   (op (mult (convert:type @0) @2) @1
 
-/* (zero_one != 0) ? z  y : y -> (-(typeof(y))zero_one & z)  y */
+/* (zero_one != 0) ? z  y : y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (ne zero_one_valued_p@0
@@ -3710,7 +3710,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (type) > 1
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
-   (op (bit_and (negate (convert:type @0)) @2) @1
+   (op (mult (convert:type @0) @2) @1
 
 /* Simplifications of shift and rotates.  */
 
-- 
2.31.1

[PATCH 1/3] MATCH: Allow unsigned types for `X & -Y -> X * Y` pattern

2023-06-07 Thread Andrew Pinski via Gcc-patches

This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.

branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a) in there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
---
 gcc/match.pd| 5 -
 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c | 6 +++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4ad037d641a..7b95b63cee4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2058,7 +2058,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && TREE_CODE (TREE_TYPE (@0)) != BOOLEAN_TYPE
-  && !TYPE_UNSIGNED (TREE_TYPE (@0)))
+  /* Sign extending of the neg or a truncation of the neg
+ is needed. */
+  && (!TYPE_UNSIGNED (TREE_TYPE (@0))
+ || TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
index 68087ae6568..e063dc4bb5f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
   return ((x & 1) != 0) ? z | y : y;
 }
 
-/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
-- 
2.31.1

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Qing Zhao via Gcc-patches

> On Jun 7, 2023, at 4:53 PM, Joseph Myers  wrote:
> 
> On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:
> 
>> Hi, Joseph,
>> 
>> A question here:  can an identifier in C be a wide char string? 
> 
> Identifiers and strings are different kinds of tokens; an identifier can't 
> be a string of any kind, wide or narrow.  It just so happens that the 
> proposed interface here involves interpreting the contents of a string as 
> referring to an identifier (presumably for parsing convenience compared to 
> using an identifier directly in an attribute).

Are you suggesting to use identifier directly as the argument of the attribute?
I tried this in the beginning, however, the current parser for the attribute 
argument can not identify that this identifier is a field identifier inside the 
same structure. 

For example:

int count;
struct trailing_array_7 {
  Int count;
  int array_7[] __attribute ((element_count (count))); 
};

The identifier “count” inside the attribute will refer to the variable “int 
count” outside of the structure.

We need to introduce new syntax for this and also need to update the parser of 
the attribute.
Not sure at this moment whether the extra effort is necessary or not?
Any suggestions?

thanks.

Qing

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

[PATCH] MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z y` patterns

2023-06-07 Thread Andrew Pinski via Gcc-patches

The patterns match more than just `a & 1` so change the comment
for these two patterns to say that.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Fix comment for the
`(zero_one ==/!= 0) ? y : z  y` patterns.
---
 gcc/match.pd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index dc36927cd0f..8f3d99239ce 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3688,7 +3688,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
-/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
+/* (zero_one == 0) ? y : z  y -> (-(typeof(y))zero_one & z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (eq zero_one_valued_p@0
@@ -3700,7 +3700,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
(op (bit_and (negate (convert:type @0)) @2) @1
 
-/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+/* (zero_one != 0) ? z  y : y -> (-(typeof(y))zero_one & z)  y */
 (for op (bit_xor bit_ior)
  (simplify
   (cond (ne zero_one_valued_p@0
-- 
2.31.1

Re: [committed] Convert H8 port to LRA

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 08:06, Andrew Pinski wrote:

On Sun, Jun 4, 2023 at 10:43 AM Jeff Law via Gcc-patches
 wrote:


With Vlad's recent LRA fix to the elimination code, the H8 can be
converted to LRA.


Could you update the h8300 entry on https://gcc.gnu.org/backends.html
for this change?
Thanks for the reminder.  I also updated the state for the ports I 
converted several weeks back.


jeff

Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 08:43, Jeff Law wrote:



On 6/7/23 08:13, Kito Cheng wrote:
I would like vendor cpu name start with vendor name, like 
ventana-veyron-v1 which is consistent with all other vendor cpu, and 
llvm are using same convention too.
Fair enough.  Better to get it right now than have this stuff be 
inconsistent.  It'll be a little more pain for our internal folks, but 
we'll deal with that :-)
I should have also noted that this seems to get a pretty consistent 1-2% 
improvement across spec2017.  Not surprisingly it reduces stalls at the 
retirement unit due to instructions not being completed.  We can see 
impacts elsewhere like fewer stalls due to conflicting resources at the 
dispatch stage.


It does make it more likely that we'll blow out the register file on 
x264's key SATD routine which shows up as a single digit regression for 
input #1.  The fix there is pretty simple, use register pressure 
scheduling, which we'll have some hard data on relatively soon.


jeff

Re: [PATCH] c++: unsynthesized defaulted constexpr fn [PR110122]

2023-06-07 Thread Jason Merrill via Gcc-patches


On 6/6/23 14:29, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

In the second testcase of PR110122, during regeneration of the generic
lambda with V=Bar{}, substitution followed by coerce_template_parms for
A's template argument naturally yields a copy of V in terms of Bar's
(implicitly) defaulted copy constructor.

This however happens inside a template context so although we introduced
a use of the copy constructor, mark_used didn't actually synthesize it,
which causes subsequent constant evaluation of the template argument to
fail with:

   nontype-class58.C: In instantiation of ‘void f() [with Bar V = Bar{Foo()}]’:
   nontype-class58.C:22:11:   required from here
   nontype-class58.C:18:18: error: ‘constexpr Bar::Bar(const Bar&)’ used before 
its definition

Conveniently we already make sure to instantiate eligible constexpr
functions before such (manifestly) constant evaluation, as per P0859R0.
So this patch fixes this by making sure to synthesize eligible defaulted
constexpr functions beforehand as well.


We probably also want to do this in cxx_eval_call_expression, under


  /* We can't defer instantiating the function any longer.  */


Jason

[Bug ipa/109886] UBSAN error: shift exponent 64 is too large for 64-bit type when compiling gcc.c-torture/compile/pr96796.c

2023-06-07 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109886

Andrew Macleod  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #6 from Andrew Macleod  ---
(In reply to Martin Jambor from comment #5)
> (In reply to Aldy Hernandez from comment #4)
> > (In reply to Andrew Pinski from comment #3)

> > > That is correct. The generated code has a VIEW_CONVERT_EXR from an integer
> > > type to a RECORD_TYPE.
> > 
> > Eeeech.  In that case, then what you suggest is reasonable.  Bail if
> > param_type is not supported by the underlying range.  Maybe the IPA experts
> > could opine?
> 
> With LTOed type mismateches or with K style code, IPA has to be prepared
> to deal with such cases, unfortunately.  So a check like that indeed looks
> reasonable.

The new range-op dispatch code is coming shortly.. when an unsupported type is
passed in to any ranger routine, we'll simply return false instead of trapping
like we do now.

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Joseph Myers

On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:

> Hi, Joseph,
> 
> A question here:  can an identifier in C be a wide char string? 

Identifiers and strings are different kinds of tokens; an identifier can't 
be a string of any kind, wide or narrow.  It just so happens that the 
proposed interface here involves interpreting the contents of a string as 
referring to an identifier (presumably for parsing convenience compared to 
using an identifier directly in an attribute).

-- 
Joseph S. Myers
jos...@codesourcery.com

[Bug c++/51571] No named return value optimization while adding a dummy scope

2023-06-07 Thread jason at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51571

--- Comment #11 from Jason Merrill  ---
(In reply to CVS Commits from comment #9)
> This implements the guaranteed copy elision specified by P2025

Or not; I just noticed that P2025 also requires a fix for PR53637.

[Bug rtl-optimization/110163] New: [14 Regression] Comparing against a constant string is inefficient on some targets

2023-06-07 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110163

Bug ID: 110163
   Summary: [14 Regression] Comparing against a constant string is
inefficient on some targets
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

Comparing against a constant string is expanded by inline_string_cmp and on
some targets the generated code can be inefficient.  This can be seen in
spec2017's omnetpp benchmark, particularly when the inline string comparison
limits are increased.

The problem is the expansion code arranges to do all the arithmetic and tests
in SImode.  On RV64 this introduces a sign extension for each test  due to how
RV64 expresses 32bit ops.

It would be better to do all the computations in word_mode, then convert the
final result to SImode, at least for RV64 and likely for other targets.

I experimented with starting to build out cost checks to determine what mode to
use for the internal computations.  That ran afoul of x86 where the cost of a
byte load is different than the cost of an extended byte load, even though they
use the exact same instruction.

There's also a need to cost out the computations, test & branch in the
different modes as well once the x86 hurdle is behind us.

I've set work on this aside for now.  But the discussion can be found in these
two threads:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620601.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620577.html

#include 
int
foo (char *x)
{
   return strcmp (x, "lowerLayout");
}

Compiled with -O2 --param builtin-string-cmp-inline-length=100 on rv64 should
show the issue.

[Bug tree-optimization/94566] conversion between std::strong_ordering and int

2023-06-07 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94566

--- Comment #13 from Andrew Macleod  ---
(In reply to Andrew Pinski from comment #12)
> Aldy or Andrew, why in conv1 we don't get a range for 
>   SR.4_4 = sD.8798._M_valueD.7665;
> 
> Even though the range we have is [-1,1] according to the
> __builtin_unreachable()?
> It seems like we should get that range. Once we do get that the code works.
> E.g. If we add:
>   signed char *t = (signed char*)
>   signed char tt = *t;
>   if (tt < -1 || tt > 1) __builtin_unreachable();
> 
> In the front before the other ifs, we get the code we are expecting.
> 
> conv2 has a similar issue too, though it has also a different issue of
> ordering for the comparisons.

its because the unreachable is after the branches, and we have multiple uses of
SR.4_4 before the unreachable.

   [local count: 1073741824]:
  SR.4_4 = s._M_value;
  if (SR.4_4 == -1)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  if (SR.4_4 == 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 268435456]:
  if (SR.4_4 == 1)
goto ; [100.00%]
  else
goto ; [0.00%]

   [count: 0]:
  __builtin_unreachable ();

   [local count: 1073741824]:
  # _1 = PHI <-1(2), 0(3), 1(4)>

We know when we get to bb5 that SR.4_4 is [-1, 1] for sure.
But we dont know that before we reach that spot.
if there was a call to 
  foo(SR.4_4)
in bb 3 for instance,   we wouldn't be able to propagate [-1,1] to the call to
foo because it happens before we know for sure.and foo may go and do
something if it has a value of 6 and exit the compilation, thus never
returning.

So we can only provide a range of [-1, 1] AFTER the unreachable, or if there is
only a single use of it.. the multiples uses are what tricks it.

This has come up before.  we need some sort of backwards propagation that can
propagate discovered values earlier into the IL to a point where it is known to
be safe (ie, it wouldnt be able to propagate it past a call to foo() for
instance)
In cases like this, we could discover it is safe to propagate that range back
to the def point, and then we could set the global.

Until we add some smarts, either to the builtin unreachable elimination code,
or elsewhere, which is aware of how to handle such side effects, we can't set
the global because we dont know if it is safe at each use before the
unreachable call.

On inform diagnostics in plugins, support scripts for gdb and modeling creation of PyObjects for static analysis

2023-06-07 Thread Eric Feng via Gcc

Hi everyone,

I am one of the GSoC participants this year — in particular, I am
working on a static analyzer plugin for CPython extension module code.
I'm encountering a few challenges and would appreciate any guidance on
the following issues:

1) Issue with "inform" diagnostics in the plugin:
I am currently unable to see any "inform" messages from my plugin when
compiling test programs with the plugin enabled. As per the structure
of existing analyzer plugins, I have included the following code in
the plugin_init function:

#if ENABLE_ANALYZER
const char *plugin_name = plugin_info->base_name;
if (0)
inform(input_location, "got here; %qs", plugin_name);
register_callback(plugin_info->base_name,
  PLUGIN_ANALYZER_INIT,
  ana::cpython_analyzer_init_cb,
  NULL);
#else
sorry_no_analyzer();
#endif
return 0;

I expected to see the "got here" message (among others in other areas
of the plugin) when compiling test programs but haven't observed any
output. I also did not observe the "sorry" diagnostic. I am compiling
a simple CPython extension module with the plugin loaded like so:

gcc-dev -S -fanalyzer -fplugin=/path/to/cpython_plugin.so
-I/usr/include/python3.9 -lpython3.9 -x c refcount6.c

Additionally, I compiled the plugin following the steps outlined in
the GCC documentation for plugin building
(https://gcc.gnu.org/onlinedocs/gccint/Plugins-building.html):

g++-dev -shared -I/home/flappy/gcc_/gcc/gcc
-I/usr/local/lib/gcc/aarch64-unknown-linux-gnu/14.0.0/plugin/include
-fPIC -fno-rtti -O2 analyzer_cpython_plugin.c -o cpython_plugin.so

Please let me know if I missed any steps or if there is something else
I should consider. I have no trouble seeing inform calls when they are
added to the core GCC.

2) gdb not detecting .gdbinit in build/gcc:
Following Dave's GCC newbies guide, I ran gcc/configure within the gcc
subdirectory of the build directory to generate a .gdbinit file.
Dave's guide suggested that this file would be automatically detected
and run by gdb. However, it appears that GDB is not detecting this
.gdbinit file, even after I added the following line to my ~/.gdbinit
file:

add-auto-load-safe-path /absolute/path/to/build/gcc

3) Modeling creation of a new PyObject:
Many CPython API calls involve the creation of a new PyObject. To
model the creation of a simple PyObject, we can allocate a new heap
region using get_or_create_region_for_heap_alloc. We can then create
field_regions using get_field_region to associate the newly allocated
region to represent fields such as ob_refcnt and ob_type in the
PyObject struct. However, one of the parameters to get _field_region
is a tree representing the field (e.g ob_refcnt). I'm currently
wondering how we may retrieve this information. My intuition is that
it would be fairly easy if we can first get a tree representation of
the PyObject struct. Since we include the relevant headers when
compiling CPython extension modules (e.g., -I/usr/include/python3.9),
I wonder if there is a way to "look up" the tree representation of
PyObject from the included headers. This information may also be
important for obtaining a svalue representing the size of the PyObject
in get_or_create_region_for_heap_alloc. If there is no way to "look
up" a tree representation of PyObject as described in the included
Python header files, does it make sense for us to just create a tree
representation manually for this task? Please let me know if this
approach makes sense and if so where I could look into to get the
required information.

Thanks all.

Best,
Eric

Re: [PATCH] riscv: Fix scope for memory model calculation

2023-06-07 Thread Jeff Law via Gcc-patches





On 6/7/23 13:15, Dimitar Dimitrov wrote:

On Tue, Jun 06, 2023 at 08:38:14PM -0600, Jeff Law wrote:




Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.  I don't have setup to test riscv64.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.

Good to see you poking around in the RISC-V world Dimitar!  Are you still
poking at the PRU as well?


Hi Jeff,

Yes, I'm still maintaining the PRU backend.

For this patch I was actually poking at the middle end, trying to
implement a small optimization for PRU (PR 106562).  And I wanted
to test if other targets would also benefit from it.
Ah!  Too bad, I'd love to have another engineer poking at RV stuff on a 
regular basis, but I'll take any cleanups/fixes/improvements you may 
have, of course!


RV32 isn't a bad test target though.  Certainly more modern than some of 
the ports you could have tested against.


Jeff

[Bug c++/99599] [11/12/13/14 Regression] Concepts requirement falsely reporting cyclic dependency, breaks tag_invoke pattern

2023-06-07 Thread danakj at orodu dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99599

--- Comment #16 from danakj at orodu dot net ---
Well for anyone who hits the same issue, it appears that GCC _does_ follow
Clang and MSVC in not considering the overload and chasing through the concept
resolution if the non-concept types are templates and do not match the caller's
arguments.

So you need to do:

1) For non-GCC just use:

  template auto invoke_tag(bar_tag, T it);

2) For GCC non-template type bar_tag use:

  template T, fooable U> auto invoke_tag(T, U it);

3) For GCC template type bar_tag, back to 1)

  template auto invoke_tag(bar_tag, T it);


Note also that 2) uses same_as, not convertible_to as in Comment #6, otherwise
you can get ambiguous overload resolution if multiple types convert to one,
which does not occur in Clang/MSVC with the regular type parameter. This _does_
again result in more code that will compile in Clang/MSVC than in GCC, as it
prevents conversions from types that don't have an overload.

The macros to do this get rather exciting, if that's of interest to someone in
the future:
https://github.com/chromium/subspace/pull/253/commits/719500c4d2cbfcfd238d7ee3c5b3d371f40e46c1

Re: [RFC] RISC-V: Eliminate extension after for *w instructions

2023-06-07 Thread Jeff Law via Gcc-patches

On 5/24/23 17:14, Jivan Hakobyan via Gcc-patches wrote:

Subject:
[RFC] RISC-V: Eliminate extension after for *w instructions
From:
Jivan Hakobyan via Gcc-patches 
Date:
5/24/23, 17:14

To:
gcc-patches@gcc.gnu.org

`This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".

The main idea of it is to add SUBREG_PROMOTED fields during expanding.

I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.

gcc/ChangeLog:
 * config/riscv/bitmanip.md (rotrdi3): New pattern.
 (rotrsi3): Likewise.
 (rotlsi3): Likewise.
 * config/riscv/riscv-protos.h (riscv_emit_binary): New function
 declaration
 * config/riscv/riscv.cc (riscv_emit_binary): Removed static
 * config/riscv/riscv.md (addsi3): New pattern
 (subsi3): Likewise.
 (negsi2): Likewise.
 (mulsi3): Likewise.
 (si3): New pattern for any_div.
 (si3): New pattern for any_shift.
 * loop-iv.cc (get_biv_step_1):  Process src of extension when it
PLUS

gcc/testsuite/ChangeLog:
 * testsuite/gcc.target/riscv/shift-and-2.c: New test
 * testsuite/gcc.target/riscv/shift-shift-2.c: New test
 * testsuite/gcc.target/riscv/sign-extend.c: New test
 * testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test

-- With the best regards Jivan Hakobyan

extend.diff

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 
96d31d92670b27d495dc5a9fbfc07e8767f40976..0430af7c95b1590308648dc4d5aaea78ada71760
 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -304,9 +304,9 @@
[(set_attr "type" "bitmanip,load")
 (set_attr "mode" "HI")])

-(define_expand "rotr3"

-  [(set (match_operand:GPR 0 "register_operand")
-   (rotatert:GPR (match_operand:GPR 1 "register_operand")
+(define_expand "rotrdi3"
+  [(set (match_operand:DI 0 "register_operand")
+   (rotatert:DI (match_operand:DI 1 "register_operand")
 (match_operand:QI 2 "arith_operand")))]
"TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB"

The condition for this expander needs to be adjusted.

Previously it used the GPR iterator.  The GPR iterator is defined like this:

(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])

Note how the DI case is conditional on TARGET_64BIT.

This impacts the HAVE_* macros that are generated from the MD file in 
insn-flags.h:

#define HAVE_rotrsi3 (TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB)
#define HAVE_rotrdi3 ((TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB) && 
(TARGET_64BIT))

Note how the rotrdi3 has the && (TARGET_64BIT) on the end.

With your change we would expose rotrdi3 independent of TARGET_64BIT 
which is not what we want.

Sorry I didn't catch that earlier.  I'll fix this minor problem.

@@ -544,7 +562,7 @@
rtx t5 = gen_reg_rtx (DImode);
rtx t6 = gen_reg_rtx (DImode);

-  emit_insn (gen_addsi3 (operands[0], operands[1], operands[2]));

+  riscv_emit_binary(PLUS, operands[0], operands[1], operands[2]);
Just a note.  In GCC we always emit a space between the function name 
and the open parenthesis for its argument list.  I fixed a few of these.

@@ -867,8 +938,8 @@

emit_insn (gen_smul3_highpart (hp, operands[1], operands[2]));

emit_insn (gen_mul3 (operands[0], operands[1], operands[2]));
-  emit_insn (gen_ashr3 (lp, operands[0],
- GEN_INT (BITS_PER_WORD - 1)));
+  riscv_emit_binary(ASHIFTRT, lp, operands[0],
+ GEN_INT (BITS_PER_WORD - 1));
Another formatting nit.  When we wrap lines for an argument list, we 
line up the arguments.  So something like this

frobit (a, b, c
d, e, f);

Obviously that's not a great example as it doesn't need wrapping, but it 
should clearly show how we indent things in this case.  I've fixed up 
this nit.

diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
index 
6c40db947f7f549303f8bb4d4f38aa98b6561bcc..bec1ea7e4ccf7291bb3dba91161f948e66c7bea9
 100644
--- a/gcc/loop-iv.cc
+++ b/gcc/loop-iv.cc
@@ -637,7 +637,7 @@ get_biv_step_1 (df_ref def, scalar_int_mode outer_mode, rtx 
reg,
  {
rtx set, rhs, op0 = NULL_RTX, op1 = NULL_RTX;
rtx next, nextr;
-  enum rtx_code code;
+  enum rtx_code code, prev_code;
So as I mentioned earlier, PREV_CODE might be used without being 
initialized.  I've initialized it to "UNKNOWN" which is a special RTX 
code which can be used for this purpose.

If we are changing a target independent file the standard is that we 
bootstrap and regression test on at least one primary platform such as 
x86_64 linux.  This would have been caught by that bootstrap process as 
it's a pretty simple uninitialized object use to analyze.

rtx_insn *insn = DF_REF_INSN (def);
df_ref next_def;
enum iv_grd_result res;

Re: [PATCH] libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

2023-06-07 Thread Jonathan Wakely via Gcc-patches

On Wed, 7 Jun 2023 at 18:26, Jonathan Wakely  wrote:

>
>
> On Wed, 7 Jun 2023, 18:17 Jakub Jelinek via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:
>
>> Hi!
>>
>> This test apparently contains 3 problematic floating point constants,
>> 1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
>> when -fexcess-precision=standard evaluates double constants in the
>> precision
>> of Intel extended 80-bit long double.
>> As written in the PR, e.g. the first one is
>> 0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
>> in the precision of GCC's internal format, 80-bit long double has
>> 63-bit precision, so the above constant rounded to long double is
>> 0x1.7a2ecc414a03f800p+418L
>> (the least significant bit in the 0 before p isn't there already).
>> 0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
>> 0x1.7a2ecc414a040p+418.
>> Now, if excess precision doesn't happen and we round the GCC's internal
>> format number directly to double, it is
>> 0x1.7a2ecc414a03fp+418 and that is the number the test expects.
>> One can see it on x86-64 (where excess precision to long double doesn't
>> happen) where double(1e126L) != 1e126.
>> The other two constants suffer from the same problem.
>>
>> The following patch tweaks the testcase, such that those problematic
>> constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
>> guarantee the constants will be evaluated in double precision),
>> plus adds corresponding tests with hexadecimal constants which don't
>> suffer from this excess precision problem, they are exact in double
>> and long double can hold all double values.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
>> tested on the latter with
>> make check RUNTESTFLAGS='--target_board=unix/-fexcess-precision=standard
>> conformance.exp=to_chars/double.cc'
>> Ok for trunk?
>>
>
> Yes, OK.
>
> Thanks for solving this puzzle!
>

I think this would be good for gcc-13, as that has the new
-fexcess-precision semantics for -std=c++NN too, right?


>
>
>
>> 2023-06-07  Jakub Jelinek  
>>
>> PR libstdc++/110145
>> * testsuite/20_util/to_chars/double.cc: Include .
>> (double_to_chars_test_cases,
>> double_scientific_precision_to_chars_test_cases_2,
>> double_fixed_precision_to_chars_test_cases_2): #if out 1e126,
>> 4.91e-6
>> and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than
>> 1.
>> Add unconditional tests with corresponding double constants
>> 0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
>> 0x1.7440bbff418b9p-18.
>>
>> --- libstdc++-v3/testsuite/20_util/to_chars/double.cc.jj
>> 2022-11-03 22:16:08.542329555 +0100
>> +++ libstdc++-v3/testsuite/20_util/to_chars/double.cc   2023-06-07
>> 15:41:44.275604870 +0200
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include 
>>
>> @@ -1968,9 +1969,19 @@ inline constexpr double_to_chars_testcas
>>  {1e125, chars_format::fixed,
>>
>>  
>> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>>  "5841365553228283904"},
>> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
>> +// When long double is Intel extended and double constants are
>> evaluated in precision of
>> +// long double, this value is initialized to double(1e126L), which
>> is 0x1.7a2ecc414a040p+418 due to
>> +// double rounding of 0x1.7a2ecc414a03f7ff6p+418L first to
>> 0x1.7a2ecc414a03f800p+418L and
>> +// then to 0x1.7a2ecc414a040p+418, while when double constants are
>> evaluated in precision of
>> +// IEEE double, this is 0x1.7a2ecc414a03fp+418 which the test
>> expects.  See PR110145.
>>  {1e126, chars_format::fixed,
>>
>>  
>> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>>  "58413655532282839040"},
>> +#endif
>> +{0x1.7a2ecc414a03fp+418, chars_format::fixed,
>> +
>>  
>> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>> +   "58413655532282839040"},
>>  {1e127, chars_format::fixed,
>>
>>  
>> "549291066784979473595300225087383524118479625982517885450291174622154390152298057300868772"
>>  "377386949310916067328"},
>> @@ -2816,8 +2827,12 @@ inline constexpr double_to_chars_testcas
>>  {0x1.a6c767640cd71p+879, chars_format::scientific,
>> "6.6564021122018745e+264"},
>>
>>  // Incorrectly handled by dtoa_milo() (Grisu2), which doesn't
>> achieve shortest round-trip.
>> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
>>  {4.91e-6, chars_format::scientific, "4.91e-06"},
>>  {5.547e-6, chars_format::scientific, "5.547e-06"},
>> +#endif
>> +{0x1.4981285e98e79p-18, chars_format::scientific, "4.91e-06"},
>> +{0x1.7440bbff418b9p-18, chars_format::scientific, "5.547e-06"},
>>
>>  // Test hexfloat corner cases.
>>  {0x1.728p+0,

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-07 Thread Qing Zhao via Gcc-patches

Hi, Joseph,

A question here:  can an identifier in C be a wide char string? 

Qing

> On May 26, 2023, at 2:15 PM, Joseph Myers  wrote:
> 
> On Fri, 26 May 2023, Qing Zhao via Gcc-patches wrote:
> 
>>> What if the string is a wide string?  I don't expect that to work (either 
>>> as a matter of interface design, or in the present code), but I think that 
>>> case should have a specific check and error.
>> 
>> Dump question: how to check whether the string is a wide string? -:)
> 
> By examining the element type; the only valid case for the attribute would 
> be an element type of (const) char.  (I think it's reasonable to reject 
> all of char8_t, char16_t, char32_t, wchar_t strings in this context.)
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from Jonathan Wakely  ---
(In reply to jinci kang from comment #0)
> $ g++ -std=c++2a -Werror -Wall -Wextra main.cpp
> Error: redundant move in initialization [-Werror=redundant-move]
>35 | response->set_body(std::move(*request.body()));
>   |~^

You turned this warning on with -Wextra and then you turned it into an error
with -Werror.

Either stop doing that, or fix the code to avoid the warning.

[Bug c++/107198] [13/14 Regression] ICE in cp_gimplify_expr, at cp/cp-gimplify.cc:752 since r13-3175-g6ffbf87ca66f4ed9

2023-06-07 Thread tschwinge at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107198

Thomas Schwinge  changed:

   What|Removed |Added

   Last reconfirmed|2022-10-10 00:00:00 |2023-6-7
 CC||tschwinge at gcc dot gnu.org

--- Comment #4 from Thomas Schwinge  ---
Reconfirmed.

Given native x86_64-pc-linux-gnu build of one-week-old commit
2720bbd597f56742a17119dfe80edc2ba86af255, running 'g++.dg/eh/aggregate1.C' with
'-fno-exceptions':

$ make check-gcc-c++ RUNTESTFLAGS='--target_board=unix/-fno-exceptions
dg.exp=aggregate1.C'

..., I see ICEs not for '-std=c++98', but for '-std=c++14' and higher:

UNSUPPORTED: g++.dg/eh/aggregate1.C  -std=c++98: exception handling
disabled
FAIL: g++.dg/eh/aggregate1.C  -std=c++14 (internal compiler error: in
cp_gimplify_expr, at cp/cp-gimplify.cc:782)
UNSUPPORTED: g++.dg/eh/aggregate1.C  -std=c++14: exception handling
disabled
FAIL: g++.dg/eh/aggregate1.C  -std=c++17 (internal compiler error: in
cp_gimplify_expr, at cp/cp-gimplify.cc:782)
UNSUPPORTED: g++.dg/eh/aggregate1.C  -std=c++17: exception handling
disabled
FAIL: g++.dg/eh/aggregate1.C  -std=c++20 (internal compiler error: in
cp_gimplify_expr, at cp/cp-gimplify.cc:782)
UNSUPPORTED: g++.dg/eh/aggregate1.C  -std=c++20: exception handling
disabled

[...]/g++.dg/eh/aggregate1.C: In constructor 'A::A()':
[...]/g++.dg/eh/aggregate1.C:18:47: error: exception handling disabled, use
'-fexceptions' to enable
[...]/g++.dg/eh/aggregate1.C: In function 'void try_idx(int)':
[...]/g++.dg/eh/aggregate1.C:40:25: error: 'x' was not declared in this
scope
[...]/g++.dg/eh/aggregate1.C:39:40: internal compiler error: in
cp_gimplify_expr, at cp/cp-gimplify.cc:782
0x6f7024 cp_gimplify_expr(tree_node**, gimple**, gimple**)
[...]/gcc/cp/cp-gimplify.cc:782
0x13d6cfd gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
[...]/gcc/gimplify.cc:16331
0x13dcf9d gimplify_init_ctor_eval_range
[...]/gcc/gimplify.cc:4929
0x13dcf9d gimplify_init_ctor_eval
[...]/gcc/gimplify.cc:5008
0x13dce55 gimplify_init_ctor_eval
[...]/gcc/gimplify.cc:5033
0x13dd671 gimplify_init_constructor
[...]/gcc/gimplify.cc:5447
0x13ea18d gimplify_modify_expr
[...]/gcc/gimplify.cc:6127
0x13d76ea gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
[...]/gcc/gimplify.cc:16422
0x13e8567 gimplify_stmt(tree_node**, gimple**)
[...]/gcc/gimplify.cc:7238
0x13e8567 gimplify_compound_expr
[...]/gcc/gimplify.cc:6431
0x13d7aae gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
[...]/gcc/gimplify.cc:16412
0x13d80f8 gimplify_cleanup_point_expr
[...]/gcc/gimplify.cc:7238
0x13d80f8 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
[...]/gcc/gimplify.cc:16815
0x13da0a6 gimplify_stmt(tree_node**, gimple**)
[...]/gcc/gimplify.cc:7238
0x13d89a8 gimplify_statement_list
[...]/gcc/gimplify.cc:2019
0x13d89a8 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
[...]/gcc/gimplify.cc:16867
0x13da0a6 gimplify_stmt(tree_node**, gimple**)
[...]/gcc/gimplify.cc:7238
0x13d89a8 gimplify_statement_list
[...]/gcc/gimplify.cc:2019
0x13d89a8 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
[...]/gcc/gimplify.cc:16867
0x13d80f8 gimplify_cleanup_point_expr
[...]/gcc/gimplify.cc:7238

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

--- Comment #5 from Jonathan Wakely  ---
(In reply to Andrew Pinski from comment #4)
> See https://gcc.gnu.org/gcc-13/porting_to.html also.

I don't think this is related to the new rules.

The std::move here is redundant because request is const, so request.body()
calls the const overload which returns const std::string* and so
std::move(*request.body()) produces a const std::string&& which cannot be
moved. It can only be copied. So the move is redundant.

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-07 Thread Richard Sandiford via Gcc-patches

Andrew Stubbs  writes:
> On 30/05/2023 07:26, Richard Biener wrote:
>> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs  wrote:
>>>
>>> Hi all,
>>>
>>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>>> do it because the GCC middle-end models DIVMOD's return value as
>>> "complex int" type, and there are no vector equivalents of that type.
>>>
>>> Therefore, this patch adds minimal support for "complex vector int"
>>> modes.  I have not attempted to provide any means to use these modes
>>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>>> implementation will pack the data into wider vector modes manually.
>>>
>>> A knock-on effect of this is that I needed to increase the range of
>>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>>> the previous 255-byte limit).
>>>
>>> Since this change would add a large number of new, unused modes to many
>>> architectures, I have elected to *not* enable them, by default, in
>>> machmode.def (where the other complex modes are created).  The new modes
>>> are therefore inactive on all architectures but amdgcn, for now.
>>>
>>> OK for mainline?  (I've not done a full test yet, but I will.)
>> 
>> I think it makes more sense to map vector CSImode to vector SImode with
>> the double number of lanes.  In fact since divmod is a libgcc function
>> I wonder where your vector variant would reside and how GCC decides to
>> emit calls to it?  That is, there's no way to OMP simd declare this function?
>
> The divmod implementation lives in libgcc. It's not too difficult to 
> write using vector extensions and some asm tricks. I did try an OMP simd 
> declare implementation, but it didn't vectorize well, and that's a yack 
> I don't wish to shave right now.
>
> In any case, the OMP simd declare will not help us here, directly, 
> because the DIVMOD transformation happens too late in the pass pipeline, 
> long after ifcvt and vect. My implementation (not yet posted), uses a 
> libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way. 
> It just needs the complex vector modes to exist.
>
> Using vectors twice the length is problematic also. If I create a new 
> V128SImode that spans across two 64-lane vector registers then that will 
> probably have the desired effect ("real" quotient in v8, "imaginary" 
> remainder in v9), but if I use V64SImode to represent two V32SImode 
> vectors then that's a one-register mode, and I'll have to use a 
> permutation (a memory operation) to extract lanes 32-63 into lanes 0-31, 
> and if we ever want to implement instructions that operate on these 
> modes (as opposed to the odd/even add/sub complex patterns we have now) 
> then the masking will be all broken and we'd need to constantly 
> disassemble the double length vectors to operate on them.

I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
the former and "V2x8QI" for the latter.  V2x8QI is forced to come
after V16QI in the mode list, and so it is only ever used through
explicit choice.  But both modes are functionally vectors of 16 QIs.

Thanks,
Richard

[Bug c++/110158] Cannot use union with std::string inside in constant expression

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110158

--- Comment #1 from Andrew Pinski  ---
Here is a slightly reduced testcase (for a slightly different issue still
dealing with unions):
```
struct str1
{
//  bool a;
  char *var;
  union {
char t[15];
int allocated;
  };
  constexpr str1() : var(new char[2]) { t[0] = 0; }
  constexpr ~str1() {if (var != t) delete[] var; }
};

typedef str1 str;
constexpr bool f1() {
str t{};
return true;
}
static_assert( f1() );

constexpr bool f() {
union U{
str s;
constexpr ~U(){ s.~str(); }
} u{};
return true;
}

static_assert( f() );

```

Re: When do I need -fnon-call-exceptions?

2023-06-07 Thread Eric Botcazou via Gcc

> On x864 Linux -fasynchronous-unwind-tables is the default.  That is
> probably sufficient to make your test case work.

The testcase g++.dg/torture/except-1.C you recently added to the testsuite 
does not pass at all if -fnon-call-exceptions is not specified (and does not 
pass with optimization if -fno-delete-dead-exceptions is not specified).

-- 
Eric Botcazou

[Bug target/106562] PRU: Inefficient code for zero check of 64-bit (boolean) AND result

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106562

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=104296

--- Comment #5 from Andrew Pinski  ---
(In reply to Dimitar Dimitrov from comment #4)
> Thus I'm trying to implementing the following conversion in
> emit_store_flag_int():
> 
>"X != 0" -> "UMIN (X, 1)

That is basically what I mention in PR 104296.

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

--- Comment #4 from Andrew Pinski  ---
See https://gcc.gnu.org/gcc-13/porting_to.html also.

Re: [PATCH] riscv: Fix scope for memory model calculation

2023-06-07 Thread Dimitar Dimitrov

On Tue, Jun 06, 2023 at 08:38:14PM -0600, Jeff Law wrote:
> 
> 
> > Regression tested for riscv32-none-elf. No changes in gcc.sum and
> > g++.sum.  I don't have setup to test riscv64.
> > 
> > gcc/ChangeLog:
> > 
> > * config/riscv/riscv.cc (riscv_print_operand): Calculate
> > memmodel only when it is valid.
> Good to see you poking around in the RISC-V world Dimitar!  Are you still
> poking at the PRU as well?

Hi Jeff,

Yes, I'm still maintaining the PRU backend.

For this patch I was actually poking at the middle end, trying to
implement a small optimization for PRU (PR 106562).  And I wanted
to test if other targets would also benefit from it.

Thanks,
Dimitar

> 
> Anyway, this is fine for the trunk and for backporting to gcc-13 if the
> problem exists there as well.
> 
> jeff

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

--- Comment #3 from Andrew Pinski  ---
See
https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C_002b_002b-Dialect-Options.html#index-Wno-redundant-move

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #2 from Andrew Pinski  ---
I think the GCC diagnostic is correct, the std::move is redundant here.

[Bug sanitizer/110157] [13/14 Regression] Address sanitizer does not like nested function trampolines any more

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110157

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-06-07
   Target Milestone|--- |13.2
Summary|Address sanitizer crashes   |[13/14 Regression] Address
   |when accessing variables|sanitizer does not like
   |through procedure callback  |nested function trampolines
   ||any more

--- Comment #2 from Andrew Pinski  ---
Reduced GNU C testcase (just compile and run with -fsanitize=address):
```
void quicksort(_Bool (*ugt)())
{
  __builtin_printf(">>> Calling ugt\n");
  _Bool t = ugt();
  __builtin_printf(">>> Done ugt\n");
}

void gfits_setsort(int key)
{
  _Bool sort_gt()
  {
return key > 0;
  }
  quicksort(sort_gt);
}

int main()
{
gfits_setsort(1);
}
```


```
AddressSanitizer:DEADLYSIGNAL
=
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x7f346f900034 (pc
0x7f346f900034 bp 0x7ffe64ea8b90 sp 0x7ffe64ea8b68 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: PC is at a non-executable region. Maybe a wild jump?
#0 0x7f346f900034  ()
#1 0x40134f in gfits_setsort /app/example.cpp:14
#2 0x40139f in main /app/example.cpp:19
#3 0x7f3471eb3082 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId:
1878e6b475720c7c51969e69ab2d276fae6d1dee)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV () 
==1==ABORTING
```

Re: When do I need -fnon-call-exceptions?

2023-06-07 Thread Ian Lance Taylor via Gcc

On Wed, Jun 7, 2023 at 10:09 AM Helmut Zeisel via Gcc  wrote:
>
> I wrote some simple program that set a signal handler for SIGFPE, throws a 
> C++ exception in the signal handler
> and catches the exception.
> I compiled with and without -fnon-call-exceptions (on x64 Linux).
> In both cases, the result was the same: the exception was caught and the 
> destructors were called as expected.
> I also tried "-fno-non-call-exceptions -fexceptions" and got the same result.
>
> My question: when do I really need -fnon-call-exceptions?
> Is there some simple program where I can see the difference whether it is on 
> or off??

On x864 Linux -fasynchronous-unwind-tables is the default.  That is
probably sufficient to make your test case work.

Ian

[Bug target/106562] PRU: Inefficient code for zero check of 64-bit (boolean) AND result

2023-06-07 Thread dimitar at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106562

--- Comment #4 from Dimitar Dimitrov  ---
The ideal PRU code sequence for the snippet would be:

char test(uint64_t a, uint64_t b)
{
return a && b;
}
or  r14, r14, r15
or  r16, r16, r17
uminr14, r14, 1
uminr14, r14, r16
ret

Thus I'm trying to implementing the following conversion in
emit_store_flag_int():

   "X != 0" -> "UMIN (X, 1)

[Bug c++/110162] redundant move in initialization

2023-06-07 Thread jincikang at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

--- Comment #1 from jinci kang  ---
# OK.
$ g++ -std=c++2a -Werror -Wall main.cpp

[Bug sanitizer/110157] Address sanitizer crashes when accessing variables through procedure callback

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110157

--- Comment #1 from Andrew Pinski  ---
If anything what is most likely happening is the stack is not being recorded as
executable which is needed for nest functions.

[Bug c++/110162] New: redundant move in initialization

2023-06-07 Thread jincikang at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110162

Bug ID: 110162
   Summary: redundant move in initialization
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jincikang at gmail dot com
  Target Milestone: ---

$ cat main.cpp
```cpp
// main.cpp
#include 

class HttpMessage {
public:
std::string* body() noexcept {
return _;
}

const std::string* body() const noexcept {
return _;
}

void set_body(std::string s) {
body_ = std::move(s);
}
private:
std::string body_;
};

class HttpResponse : private HttpMessage {
public:
using HttpMessage::body;
using HttpMessage::set_body;
private:
};

class HttpRequest : private HttpMessage {
public:
using HttpMessage::body;
using HttpMessage::set_body;
};

int main() {
  [[maybe_unused]]auto post = [](const HttpRequest& request, HttpResponse*
response) {
response->set_body(std::move(*request.body()));
  };
}
```
$ g++ -std=c++2a -Werror -Wall -Wextra main.cpp
Error: redundant move in initialization [-Werror=redundant-move]
   35 | response->set_body(std::move(*request.body()));
  |~^

# OK.
$ clang++ -std=c++2a -Werror -Wall -Wextra main.cpp
# Ok
$ g++-12 -std=c++2a -Werror -Wall -Wextra main.cpp

Re: [PATCH] Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

2023-06-07 Thread Steve Kargl via Gcc-patches

On Wed, Jun 07, 2023 at 08:31:35PM +0200, Harald Anlauf via Fortran wrote:
> Hi FX,
> 
> On 6/6/23 21:11, FX Coudert via Gcc-patches wrote:
> > Hi,
> > 
> > > I cannot see if there is proper support for kind=17 in your patch;
> > > at least the libgfortran/ieee/ieee_arithmetic.F90 part does not
> > > seem to have any related code.
> > 
> > Can real(kind=17) ever be an IEEE mode? If so, something seriously wrong 
> > happened, because the IEEE modules have no kind=17 mention in them anywhere.
> > 
> > Actually, where is the kind=17 documented?
> > 
> > FX
> 
> I was hoping for Thomas to come forward with some comment, as
> he was quite involved in related work.
> 
> There are several threads on IEEE128 for Power on the fortran ML
> e.g. around November/December 2021, January 2022.
> 
> I wasn't meaning to block your work, just wondering if the Power
> platform needs more attention here.
> 

% cd gcc/gccx/libgfortran
% grep HAVE_GFC_REAL_17 ieee/*
% troutmask:sgk[219] ls ieee
% ieee_arithmetic.F90 ieee_features.F90
% ieee_exceptions.F90 ieee_helper.c

There are zero hits for REAL(17) in the IEEE code.  If REAL(17)
is intended to be an IEEE-754 type, then it seems gfortran's
support was never added for it.  If anyone has access to a
power system, it's easy to test

program foo
   use ieee_arithmetic
   print *, ieee_support_datatype(1.e_17)
end program foo
-- 
Steve

[Bug c++/110153] [modules] Static module mapper format cannot handle header unit paths with spaces

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110153

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=89249

--- Comment #1 from Andrew Pinski  ---
Many build systems (make included here) have issues with spaces.

Even GCC's LTO does not handle spaces that well, see PR 89249.

[Bug c++/99599] [11/12/13/14 Regression] Concepts requirement falsely reporting cyclic dependency, breaks tag_invoke pattern

2023-06-07 Thread danakj at orodu dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99599

--- Comment #15 from danakj at orodu dot net ---
The workaround listed in Comment #6 does not work for templated types,
unfortunately, making Clang and MSVC more expressive here than GCC.

https://godbolt.org/z/obhsqhrbx

```
#include 
#include 
#include 

#if defined(__GNUC__) && !defined(__clang__)
#define COMPILER_IS_GCC 1
#else
#define COMPILER_IS_GCC 0
#endif

namespace sus::string::__private {
template 
A& format_to_stream(A&, B);

template 
concept StreamCanReceiveString = requires(T& t, std::basic_string s) {
{ operator<<(t, s) };
};

/// Consumes the string `s` and streams it to the output stream `os`.
template  S>
S& format_to_stream(S& os, const std::basic_string& s) {
os << s;
return os;
}

}  // namespace sus::string::__private

namespace sus::option {
template 
class Option {};

using namespace ::sus::string::__private;
template <
class T, 
#if COMPILER_IS_GCC
std::same_as > Sus_ValueType,  // Does not deduce T.  *
#endif
StreamCanReceiveString Sus_StreamType
>
inline Sus_StreamType& operator<<(
Sus_StreamType& stream,
#if COMPILER_IS_GCC
const Sus_ValueType& value
#else
const ::sus::option::Option& value  // Does deduce T.   
#endif
) {
return format_to_stream(stream, std::string());
}

}  // namespace sus::option

int main() {
std::stringstream s;
s << sus::option::Option();
}
```

Re: [Patch, fortran] PR87477 - (associate) - [meta-bug] [F03] issues concerning the ASSOCIATE statement

2023-06-07 Thread Harald Anlauf via Gcc-patches


Hi Paul!

On 6/7/23 18:10, Paul Richard Thomas via Gcc-patches wrote:

Hi All,

Three more fixes for PR87477. Please note that PR99350 was a blocker
but, as pointed out in comment #5 of the PR, this has nothing to do
with the associate construct.

All three fixes are straight forward and the .diff + ChangeLog suffice
to explain them. 'rankguessed' was made redundant by the last PR87477
fix.

Regtests on x86_64 - good for mainline?

Paul

Fortran: Fix some more blockers in associate meta-bug [PR87477]

2023-06-07  Paul Thomas  

gcc/fortran
PR fortran/99350
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.


This seems to lack a gfc_free_expr (p) in case the gfc_replace_expr
is not executed, leading to a possible memleak.  Can you check?

@@ -1081,10 +1082,10 @@ char_len_param_value (gfc_expr **expr, bool
*deferred)
   if (!gfc_expr_check_typed (*expr, gfc_current_ns, false))
 return MATCH_ERROR;

-  /* If gfortran gets an EXPR_OP, try to simplify it.  This catches things
- like CHARACTER(([1])).   */
-  if ((*expr)->expr_type == EXPR_OP)
-gfc_simplify_expr (*expr, 1);
+  /* Try to simplify the expression to catch things like
CHARACTER(([1])).   */
+  p = gfc_copy_expr (*expr);
+  if (gfc_is_constant_expr (p) && gfc_simplify_expr (p, 1))
+gfc_replace_expr (*expr, p);
   else
 gfc_free_expr (p);


* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.

PR fortran/107281
* resolve.cc (resolve_variable): Associate names with constant
or structure constructor targets cannot have array refs.

PR fortran/109451
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.

gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.

PR fortran/107281
* gfortran.dg/associate_5.f03 : Changed error message.
* gfortran.dg/pr107281.f90 : New test.

PR fortran/109451
* gfortran.dg/associate_61.f90 : New test


Otherwise LGTM.

Thanks for the patch!

Harald

Re: [PATCH] Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

2023-06-07 Thread Harald Anlauf via Gcc-patches


Hi FX,

On 6/6/23 21:11, FX Coudert via Gcc-patches wrote:

Hi,


I cannot see if there is proper support for kind=17 in your patch;
at least the libgfortran/ieee/ieee_arithmetic.F90 part does not
seem to have any related code.


Can real(kind=17) ever be an IEEE mode? If so, something seriously wrong 
happened, because the IEEE modules have no kind=17 mention in them anywhere.

Actually, where is the kind=17 documented?

FX


I was hoping for Thomas to come forward with some comment, as
he was quite involved in related work.

There are several threads on IEEE128 for Power on the fortran ML
e.g. around November/December 2021, January 2022.

I wasn't meaning to block your work, just wondering if the Power
platform needs more attention here.

Harald

[Bug c++/110160] g++ rejects concept as cyclical with non-matching function signature

2023-06-07 Thread danakj at orodu dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110160

danakj at orodu dot net changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from danakj at orodu dot net ---
Okay I've got a workaround based on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99599#c6. It's probably worse for
compile times, but it is what it is.

Thanks for the link.

*** This bug has been marked as a duplicate of bug 99599 ***

[Bug c++/99599] [11/12/13/14 Regression] Concepts requirement falsely reporting cyclic dependency, breaks tag_invoke pattern

2023-06-07 Thread danakj at orodu dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99599

danakj at orodu dot net changed:

   What|Removed |Added

 CC||danakj at orodu dot net

--- Comment #14 from danakj at orodu dot net ---
*** Bug 110160 has been marked as a duplicate of this bug. ***

[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430

2023-06-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Dimitar Dimitrov :

https://gcc.gnu.org/g:7f26e76c9848aeea9ec10ea701a6168464a4a9c2

commit r14-1621-g7f26e76c9848aeea9ec10ea701a6168464a4a9c2
Author: Dimitar Dimitrov 
Date:   Mon Jun 5 21:39:16 2023 +0300

riscv: Fix scope for memory model calculation

During libgcc configure stage for riscv32-none-elf, when
"--enable-checking=yes,rtl" has been activated, the following error
is observed:

  during RTL pass: final
  conftest.c: In function 'main':
  conftest.c:16:1: internal compiler error: RTL check: expected code
'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462
 16 | }
| ^
  0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*,
int, char const*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
  0x8ea823 riscv_print_operand
 
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462
  0xde84b5 output_operand(rtx_def*, int)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632
  0xde8ef8 output_asm_insn(char const*, rtx_def**)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544
  0xded33b output_asm_insn(char const*, rtx_def**)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421
  0xded33b final_scan_insn_1
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841
  0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887
  0xded8b7 final_1
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979
  0xdee518 rest_of_handle_final
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240
  0xdee518 execute
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318

Fix by moving the calculation of memmodel to the cases where it is used.

Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.

PR target/109725

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.

Signed-off-by: Dimitar Dimitrov

[Bug c++/110160] g++ rejects concept as cyclical with non-matching function signature

2023-06-07 Thread danakj at orodu dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110160

--- Comment #2 from danakj at orodu dot net ---
Ugh, yeah, I guess it is. It means you can't redirect through a template
function that uses concepts with G++.

[Bug tree-optimization/94566] conversion between std::strong_ordering and int

2023-06-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94566

Andrew Pinski  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com

--- Comment #12 from Andrew Pinski  ---
Aldy or Andrew, why in conv1 we don't get a range for 
  SR.4_4 = sD.8798._M_valueD.7665;

Even though the range we have is [-1,1] according to the
__builtin_unreachable()?
It seems like we should get that range. Once we do get that the code works.
E.g. If we add:
  signed char *t = (signed char*)
  signed char tt = *t;
  if (tt < -1 || tt > 1) __builtin_unreachable();

In the front before the other ifs, we get the code we are expecting.

conv2 has a similar issue too, though it has also a different issue of ordering
for the comparisons.

[Bug libstdc++/110145] 20_util/to_chars/double.cc fails for -m32 -fexcess-precision=standard

2023-06-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110145

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:88e7f1f7ee67462713a89104ae07e99b191d5e2c

commit r14-1619-g88e7f1f7ee67462713a89104ae07e99b191d5e2c
Author: Jakub Jelinek 
Date:   Wed Jun 7 19:27:35 2023 +0200

libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision
[PR110145]

This test apparently contains 3 problematic floating point constants,
1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
when -fexcess-precision=standard evaluates double constants in the
precision
of Intel extended 80-bit long double.
As written in the PR, e.g. the first one is
0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
in the precision of GCC's internal format, 80-bit long double has
63-bit precision, so the above constant rounded to long double is
0x1.7a2ecc414a03f800p+418L
(the least significant bit in the 0 before p isn't there already).
0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
0x1.7a2ecc414a040p+418.
Now, if excess precision doesn't happen and we round the GCC's internal
format number directly to double, it is
0x1.7a2ecc414a03fp+418 and that is the number the test expects.
One can see it on x86-64 (where excess precision to long double doesn't
happen) where double(1e126L) != 1e126.
The other two constants suffer from the same problem.

The following patch tweaks the testcase, such that those problematic
constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
guarantee the constants will be evaluated in double precision),
plus adds corresponding tests with hexadecimal constants which don't
suffer from this excess precision problem, they are exact in double
and long double can hold all double values.

2023-06-07  Jakub Jelinek  

PR libstdc++/110145
* testsuite/20_util/to_chars/double.cc: Include .
(double_to_chars_test_cases,
double_scientific_precision_to_chars_test_cases_2,
double_fixed_precision_to_chars_test_cases_2): #if out 1e126,
4.91e-6
and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
Add unconditional tests with corresponding double constants
0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
0x1.7440bbff418b9p-18.

Re: [PATCH] libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

2023-06-07 Thread Jonathan Wakely via Gcc-patches

On Wed, 7 Jun 2023, 18:17 Jakub Jelinek via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Hi!
>
> This test apparently contains 3 problematic floating point constants,
> 1e126, 4.91e-6 and 5.547e-6.  These constants suffer from double rounding
> when -fexcess-precision=standard evaluates double constants in the
> precision
> of Intel extended 80-bit long double.
> As written in the PR, e.g. the first one is
> 0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
> in the precision of GCC's internal format, 80-bit long double has
> 63-bit precision, so the above constant rounded to long double is
> 0x1.7a2ecc414a03f800p+418L
> (the least significant bit in the 0 before p isn't there already).
> 0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
> 0x1.7a2ecc414a040p+418.
> Now, if excess precision doesn't happen and we round the GCC's internal
> format number directly to double, it is
> 0x1.7a2ecc414a03fp+418 and that is the number the test expects.
> One can see it on x86-64 (where excess precision to long double doesn't
> happen) where double(1e126L) != 1e126.
> The other two constants suffer from the same problem.
>
> The following patch tweaks the testcase, such that those problematic
> constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
> guarantee the constants will be evaluated in double precision),
> plus adds corresponding tests with hexadecimal constants which don't
> suffer from this excess precision problem, they are exact in double
> and long double can hold all double values.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
> tested on the latter with
> make check RUNTESTFLAGS='--target_board=unix/-fexcess-precision=standard
> conformance.exp=to_chars/double.cc'
> Ok for trunk?
>

Yes, OK.

Thanks for solving this puzzle!



> 2023-06-07  Jakub Jelinek  
>
> PR libstdc++/110145
> * testsuite/20_util/to_chars/double.cc: Include .
> (double_to_chars_test_cases,
> double_scientific_precision_to_chars_test_cases_2,
> double_fixed_precision_to_chars_test_cases_2): #if out 1e126,
> 4.91e-6
> and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
> Add unconditional tests with corresponding double constants
> 0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
> 0x1.7440bbff418b9p-18.
>
> --- libstdc++-v3/testsuite/20_util/to_chars/double.cc.jj2022-11-03
> 22:16:08.542329555 +0100
> +++ libstdc++-v3/testsuite/20_util/to_chars/double.cc   2023-06-07
> 15:41:44.275604870 +0200
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>
> @@ -1968,9 +1969,19 @@ inline constexpr double_to_chars_testcas
>  {1e125, chars_format::fixed,
>
>  
> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>  "5841365553228283904"},
> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
> +// When long double is Intel extended and double constants are
> evaluated in precision of
> +// long double, this value is initialized to double(1e126L), which is
> 0x1.7a2ecc414a040p+418 due to
> +// double rounding of 0x1.7a2ecc414a03f7ff6p+418L first to
> 0x1.7a2ecc414a03f800p+418L and
> +// then to 0x1.7a2ecc414a040p+418, while when double constants are
> evaluated in precision of
> +// IEEE double, this is 0x1.7a2ecc414a03fp+418 which the test
> expects.  See PR110145.
>  {1e126, chars_format::fixed,
>
>  
> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
>  "58413655532282839040"},
> +#endif
> +{0x1.7a2ecc414a03fp+418, chars_format::fixed,
> +
>  
> "248677616189928820425446708698348384614392259722252941999757930266031634937628176537515300"
> +   "58413655532282839040"},
>  {1e127, chars_format::fixed,
>
>  
> "549291066784979473595300225087383524118479625982517885450291174622154390152298057300868772"
>  "377386949310916067328"},
> @@ -2816,8 +2827,12 @@ inline constexpr double_to_chars_testcas
>  {0x1.a6c767640cd71p+879, chars_format::scientific,
> "6.6564021122018745e+264"},
>
>  // Incorrectly handled by dtoa_milo() (Grisu2), which doesn't achieve
> shortest round-trip.
> +#if FLT_EVAL_METHOD >= 0 && FLT_EVAL_METHOD <= 1
>  {4.91e-6, chars_format::scientific, "4.91e-06"},
>  {5.547e-6, chars_format::scientific, "5.547e-06"},
> +#endif
> +{0x1.4981285e98e79p-18, chars_format::scientific, "4.91e-06"},
> +{0x1.7440bbff418b9p-18, chars_format::scientific, "5.547e-06"},
>
>  // Test hexfloat corner cases.
>  {0x1.728p+0, chars_format::hex, "1.728p+0"}, // instead of "2.e5p-1"
> @@ -5537,10 +5552,16 @@ inline constexpr double_to_chars_testcas
>  "9."
>
>  
> "9992486776161899288204254467086983483846143922597222529419997579302660316349376281765375153005"
>  "841365553228283904e+124"},
> +#if

An overview of the analyzer support of the operator new

2023-06-07 Thread Benjamin Priour via Gcc

Hi,

I've been mapping where the analyzer is lacking support of the operator new
different variants.
I've written a bunch of test cases already to demonstrate it, you can find
them below.
They are not yet formatted for a patch submission, and as some of them may
require new warnings, I didn't use dg-* directives either.
You will notice I included true positives and negatives as well, as I think
they might spur ideas on some edge cases that may fail.
All that to say I would greatly appreciate your comments if any test is
wrong, or if you have pointers on additional test cases.
You can also find a godbolt  here.

The most annoying one is the recurrent noisy false positive
-Wanalyzer-possible-null-argument on usage of a new expression.
Although a placement new on a static buffer too short is flagged by the
middle-end, the analyzer stay quiet.
A placement on a dynamic buffer too short to contain the placement is never
reported however. See PR105948


Thanks,
Benjamin

#include 

struct A
{
int x = 4;
int y = 6;
};

void test1()
{
int *x = ::new int; // true negative on -Wanalyzer-possible-null-argument
int *arr = ::new int[3]; // true negative on
-Wanalyzer-possible-null-argument
A *a = ::new A(); // false positive -Wanalyzer-possible-null-argument (a
throwing new cannot returns null)
::delete a;
::delete x;
::delete[] arr;
}

void test_allocators_mismatch()
{
int *a = ::new int;
int *b = ::new int[3];

::delete[] a; /* true positive -Wanalyzer-mismatching-deallocation flagged
*/
::delete b; /* true positive -Wanalyzer-mismatching-deallocation flagged */
}

// From clang core.uninitialized.NewArraySize
void test_garbage_new_array()
{
int n;
int *arr = ::new int[n]; /* true positive
-Wanalyzer-use-of-uninitialized-value reported for 'n' */
/* however nothing is reported for 'arr', even with
'-fno-analyzer-suppress-followups', one could expect a specific warning */
::delete[] arr; /* no warnings here either */
}

void test_placement()
{
void *chunk = ::operator new(20); // true negative
-Wanalyzer-possible-null-dereference
A *a = ::new (chunk) A();
a->~A();
::operator delete(chunk);
}

void test_delete_placement()
{
A *a = ::new A; // false positive -Wanalyzer-possible-null-argument
(throwing new)
int *z = ::new (>y) int;
a->~A(); // deconstruct properly
::operator delete(a);
::operator delete(z); // nothing from analyzer but got
-Wfree-nonheap-object, even though analyzer also has
Wanalyzer-free-of-non-heap
}

void test_write_placement_after_delete()
{
short *s = ::new short;
long *lp = ::new (s) long;
::delete s;
*lp = 12; // true positive -Wanalyzer-use-after-free flagged, as well as a
wrong -Wanalyzer-null-dereference of lp
}

void test_read_placement_after_delete()
{
short *s = ::new short;
long *lp = ::new (s) long;
::delete s;
long m = *lp; // true positive -Wanalyzer-use-after-free flagged, as well
as a wrong -Wanalyzer-null-dereference of lp
}

void test_use_placement_after_destruction()
{
A a;
int *lp = ::new () int;
a.~A();
int m = *lp; /* true positive -Wanalyzer-use-of-uninitialized-value,
nothing about use-after-delete though */
}

// From clang cplusplus.PlacementNewChecker
void test_placement_size_static()
{
short s;
long *lp = ::new () long; /* nothing from analyzer, but still got
-Wplacement-new= */
}

void test_placement_size_dynamic()
{
short *s = ::new short;
long *lp = ::new (s) long; // Nothing reported here at all, would expect a
-Wanalyzer-placement-new=
::delete s;
}

void test_placement_null()
{
int *x = nullptr;
int *p = ::new (x) int; // Placement new on NULL is undefined, yet nothing
is reported.
::operator delete(x);
}

void test_initialization_through_placement()
{
int x;
int *p = ::new () int;
*p = 10;
int z = x + 2; // Everything is fine, no warning emitted
}

1 2 3 >

1 - 100 of 258 matches

Mail list logo