Re: [PATCH] Update {skylake,icelake,alderlake}_cost to add a bit preference to vector store.

2022-06-05 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 1, 2022 at 11:56 PM H.J. Lu via Gcc-patches
 wrote:
>
> On Tue, May 31, 2022 at 10:06 PM Cui,Lili  wrote:
> >
> > This patch is to update {skylake,icelake,alderlake}_cost to add a bit 
> > preference to vector store.
> > Since the interger vector construction cost has changed, we need to adjust 
> > the load and store costs for intel processers.
> >
> > With the patch applied
> > 538.imagic_r:gets ~6% improvement on ADL for multicopy.
> > 525.x264_r  :gets ~2% improvement on ADL and ICX for multicopy.
> > with no measurable changes for other benchmarks.
> >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for trunk?
> >
> > Thanks,
> > Lili.
> >
> > gcc/ChangeLog
> >
> > PR target/105493
> > * config/i386/x86-tune-costs.h (skylake_cost): Raise the gpr load 
> > cost
> > from 4 to 6 and gpr store cost from 6 to 8. Change SSE loads and
> > unaligned loads cost from {6, 6, 6, 10, 20} to {8, 8, 8, 8, 16}.
> > (icelake_cost): Ditto.
> > (alderlake_cost): Raise the gpr store cost from 6 to 8 and SSE 
> > loads,
> > stores and unaligned stores cost from {6, 6, 6, 10, 15} to
> > {8, 8, 8, 10, 15}.
> >
> > gcc/testsuite/
> >
> > PR target/105493
> > * gcc.target/i386/pr91446.c: Adjust to expect vectorization
> > * gcc.target/i386/pr99881.c: XFAIL.
> > ---
> >  gcc/config/i386/x86-tune-costs.h| 26 -
> >  gcc/testsuite/gcc.target/i386/pr91446.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr99881.c |  2 +-
> >  3 files changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/gcc/config/i386/x86-tune-costs.h 
> > b/gcc/config/i386/x86-tune-costs.h
> > index ea34a939c68..6c9066c84cc 100644
> > --- a/gcc/config/i386/x86-tune-costs.h
> > +++ b/gcc/config/i386/x86-tune-costs.h
> > @@ -1897,15 +1897,15 @@ struct processor_costs skylake_cost = {
> >8,   /* "large" insn */
> >17,  /* MOVE_RATIO */
> >17,  /* CLEAR_RATIO */
> > -  {4, 4, 4},   /* cost of loading integer registers
> > +  {6, 6, 6},   /* cost of loading integer registers
> >in QImode, HImode and SImode.
> >Relative to reg-reg move (2).  */
> > -  {6, 6, 6},   /* cost of storing integer 
> > registers */
> > -  {6, 6, 6, 10, 20},   /* cost of loading SSE register
> > +  {8, 8, 8},   /* cost of storing integer 
> > registers */
> > +  {8, 8, 8, 8, 16},/* cost of loading SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> >{8, 8, 8, 8, 16},/* cost of storing SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> > -  {6, 6, 6, 10, 20},   /* cost of unaligned loads.  */
> > +  {8, 8, 8, 8, 16},/* cost of unaligned loads.  */
> >{8, 8, 8, 8, 16},/* cost of unaligned stores.  */
> >2, 2, 4, /* cost of moving XMM,YMM,ZMM 
> > register */
> >6,   /* cost of moving SSE register to 
> > integer.  */
> > @@ -2023,15 +2023,15 @@ struct processor_costs icelake_cost = {
> >8,   /* "large" insn */
> >17,  /* MOVE_RATIO */
> >17,  /* CLEAR_RATIO */
> > -  {4, 4, 4},   /* cost of loading integer registers
> > +  {6, 6, 6},   /* cost of loading integer registers
> >in QImode, HImode and SImode.
> >Relative to reg-reg move (2).  */
> > -  {6, 6, 6},   /* cost of storing integer 
> > registers */
> > -  {6, 6, 6, 10, 20},   /* cost of loading SSE register
> > +  {8, 8, 8},   /* cost of storing integer 
> > registers */
> > +  {8, 8, 8, 8, 16},/* cost of loading SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> >{8, 8, 8, 8, 16},/* cost of storing SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> > -  {6, 6, 6, 10, 20},   /* cost of unaligned loads.  */
> > +  {8, 8, 8, 8, 16},/* cost of unaligned loads.  */
> >{8, 8, 8, 8, 16},/* cost of unaligned stores.  */
> >2, 2, 4, /* cost of moving XMM,YMM,ZMM 
> > register */
> >6,   /* cost of moving SSE 

[Bug target/105854] ICE: in extract_constrain_insn, at recog.cc:2692 (insn does not satisfy its constraints: sse2_lshrv1ti3)

2022-06-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105854

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #1 from Hongtao.liu  ---
21114(define_insn_and_split "ssse3_palignrdi"
21115  [(set (match_operand:DI 0 "register_operand" "=y,x,Yv")
21116(unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yv")
21117(match_operand:DI 2 "register_mmxmem_operand"
"ym,x,Yv")
21118(match_operand:SI 3 "const_0_to_255_mul_8_operand")]
21119   UNSPEC_PALIGNR))]
21120  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"

Alternative 2 requires Yw instead of Yv since it's splitted to vpsrldq which
requires AVX512VL & AVX512BW for evex version.

Re: [PATCH] x86: harmonize __builtin_ia32_psadbw*() types

2022-06-05 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 6, 2022 at 3:17 AM Uros Bizjak via Gcc-patches
 wrote:
>
> On Thu, Jun 2, 2022 at 5:04 PM Jan Beulich  wrote:
> >
> > The 64-bit, 128-bit, and 512-bit variants have VDI return type, in
> > line with instruction behavior. Make the 256-bit builtin match, thus
> > also making it match the insn it expands to (using VI8_AVX2_AVX512BW).
> >
> > gcc/
> >
> > * config/i386/i386-builtin.def (__builtin_ia32_psadbw256):
> > Change type.
> > * config/i386/i386-builtin-types.def: New function type
> > (V4DI, V32QI, V32QI).
> > * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
> > V4DI_FTYPE_V32QI_V32QI.
>
> LGTM, but please let HJ have the final approval.
I think it was just a typo and not intentional, so Ok for the trunk.
>
> Uros.
>
> >
> > --- a/gcc/config/i386/i386-builtin.def
> > +++ b/gcc/config/i386/i386-builtin.def
> > @@ -1217,7 +1217,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_mulv8si3, 
> > "__builtin_ia32_pmulld256"  , IX86_BUILTIN_PMULLD256  , UNKNOWN, (int) 
> > V8SI_FTYPE_V8SI_V8SI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_vec_widen_umult_even_v8si, 
> > "__builtin_ia32_pmuludq256", IX86_BUILTIN_PMULUDQ256, UNKNOWN, (int) 
> > V4DI_FTYPE_V8SI_V8SI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_iorv4di3, 
> > "__builtin_ia32_por256", IX86_BUILTIN_POR256, UNKNOWN, (int) 
> > V4DI_FTYPE_V4DI_V4DI)
> > -BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> > "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> > V16HI_FTYPE_V32QI_V32QI)
> > +BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> > "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> > V4DI_FTYPE_V32QI_V32QI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufbv32qi3, 
> > "__builtin_ia32_pshufb256", IX86_BUILTIN_PSHUFB256, UNKNOWN, (int) 
> > V32QI_FTYPE_V32QI_V32QI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufdv3, 
> > "__builtin_ia32_pshufd256", IX86_BUILTIN_PSHUFD256, UNKNOWN, (int) 
> > V8SI_FTYPE_V8SI_INT)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufhwv3, 
> > "__builtin_ia32_pshufhw256", IX86_BUILTIN_PSHUFHW256, UNKNOWN, (int) 
> > V16HI_FTYPE_V16HI_INT)
> > --- a/gcc/config/i386/i386-builtin-types.def
> > +++ b/gcc/config/i386/i386-builtin-types.def
> > @@ -516,6 +516,7 @@ DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT
> >  DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT, V8DI, UQI)
> >  DEF_FUNCTION_TYPE (V8DI, V8DI, V4DI, INT, V8DI, UQI)
> >  DEF_FUNCTION_TYPE (V4DI, V8SI, V8SI)
> > +DEF_FUNCTION_TYPE (V4DI, V32QI, V32QI)
> >  DEF_FUNCTION_TYPE (V8DI, V64QI, V64QI)
> >  DEF_FUNCTION_TYPE (V4DI, V4DI, V2DI)
> >  DEF_FUNCTION_TYPE (V4DI, PCV4DI, V4DI)
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -10359,6 +10359,7 @@ ix86_expand_args_builtin (const struct b
> >  case V8SI_FTYPE_V16HI_V16HI:
> >  case V4DI_FTYPE_V4DI_V4DI:
> >  case V4DI_FTYPE_V8SI_V8SI:
> > +case V4DI_FTYPE_V32QI_V32QI:
> >  case V8DI_FTYPE_V64QI_V64QI:
> >if (comparison == UNKNOWN)
> > return ix86_expand_binop_builtin (icode, exp, target);
> >



-- 
BR,
Hongtao


[PATCH-1, rs6000] Replace shift and ior insns with one rotate and mask insn for bswap pattern [PR93453]

2022-06-05 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch replaces shift and ior insns with one rotate and mask
insn for the split patterns which are for DI byte swap on Power6 and
before. The test cases shows the optimization.

  Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-06-06 Haochen Gui 

gcc/
* config/rs6000/rs6000.md (split for DI load byte swap): Merge shift
and ior insns to one rotate and mask insn.
(split for DI register byte swap): Likewise.

gcc/testsuite/
* gcc.target/powerpc/pr93453-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bf85baa5370..2e38195aaac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2828,8 +2828,8 @@ (define_split
   emit_insn (gen_bswapsi2 (dest_32, word2));
 }

-  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op3));
+  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
+  GEN_INT ((HOST_WIDE_INT_1U << 32) - 1)));
   DONE;
 })

@@ -2914,10 +2914,10 @@ (define_split
   rtx op3_si  = simplify_gen_subreg (SImode, op3, DImode, lo_off);

   emit_insn (gen_lshrdi3 (op2, src, GEN_INT (32)));
-  emit_insn (gen_bswapsi2 (dest_si, src_si));
-  emit_insn (gen_bswapsi2 (op3_si, op2_si));
-  emit_insn (gen_ashldi3 (dest, dest, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op3));
+  emit_insn (gen_bswapsi2 (op3_si, src_si));
+  emit_insn (gen_bswapsi2 (dest_si, op2_si));
+  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
+  GEN_INT ((HOST_WIDE_INT_1U << 32) - 1)));
   DONE;
 })

diff --git a/gcc/testsuite/gcc.target/powerpc/pr93453-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
new file mode 100644
index 000..4271886561f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-mdejagnu-cpu=power6 -O2" } */
+
+unsigned long load_byte_reverse (unsigned long *in)
+{
+   return __builtin_bswap64 (*in);
+}
+
+unsigned long byte_reverse (unsigned long in)
+{
+   return __builtin_bswap64 (in);
+}
+
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */


[PATCH]: libgompd add parallel handle functions

2022-06-05 Thread Mohamed Sayed via Gcc-patches
This patch adds parallel region handles specified in section 5.5.3.
>From examining libgomp code, I found that struct gomp_team describes the
parallel region.
The Thread handle gives the address of gomp_thread so, I tried to
access *team
gomp_thread->ts->team.
The parallel handle is the team pointer in team state.
I have a question about ompd_get_task_parallel_handle
https://www.openmp.org/spec-html/5.0/openmpsu218.html
How can i reach gomp_team from gomp_task
And the union in gomp_task has two entries gomp_sem_t and gomp_team

libgomp/ChangeLog

2022-06-06  Mohamed Sayed  


* Makefile.am: (libgompd_la_SOURCES): Add ompd-parallel.c.
* Makefile.in: Regenerate.
* libgompd.map: Add ompd_get_curr_parallel_handle,
ompd_get_enclosing_parallel_handle, ompd_rel_parallel_handle
and ompd_parallel_handle_compare symbol versions.
* ompd-support.h:() : Add gompd_access (gomp_team_state, team) and
gompd_access (gomp_team, prev_ts).
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 6d913a93e7f..4e215450b25 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -94,7 +94,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c ompd-support.c
 
-libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-parallel.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 40f896b5f03..ab66ad1c8f0 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -233,7 +233,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
oacc-target.lo ompd-support.lo $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
-am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo
+am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo \
+   ompd-parallel.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -583,7 +584,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c ompd-support.c $(am__append_7)
-libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-parallel.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -800,6 +801,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-helper.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-icv.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-init.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-support.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
diff --git a/libgomp/libgompd.map b/libgomp/libgompd.map
index 85bdc3695f6..1662dc56962 100644
--- a/libgomp/libgompd.map
+++ b/libgomp/libgompd.map
@@ -16,6 +16,10 @@ OMPD_5.1 {
 ompd_thread_handle_compare;
 ompd_get_thread_id;
 ompd_get_device_from_thread;
+ompd_get_curr_parallel_handle;
+ompd_get_enclosing_parallel_handle;
+ompd_rel_parallel_handle;
+ompd_parallel_handle_compare;
   local:
 *;
 };
diff --git a/libgomp/ompd-support.h b/libgomp/ompd-support.h
index 39d55161132..48a2e6133f5 100644
--- a/libgomp/ompd-support.h
+++ b/libgomp/ompd-support.h
@@ -83,12 +83,15 @@ extern __UINT64_TYPE__ gompd_state;
   gompd_access (gomp_thread_pool, threads) \
   gompd_access (gomp_thread, ts) \
   gompd_access (gomp_team_state, team_id) \
-  gompd_access (gomp_task, icv)
+  gompd_access (gomp_task, icv) \
+  gompd_access (gomp_team_state, team) \
+  gompd_access (gomp_team, prev_ts)
 
 #define GOMPD_SIZES(gompd_size) \
   gompd_size (gomp_thread) \
   gompd_size (gomp_task_icv) \
-  gompd_size (gomp_task)
+  gompd_size (gomp_task) 
+  
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 #pragma GCC visibility pop


gcc-13-20220605 is now available

2022-06-05 Thread GCC Administrator via Gcc
Snapshot gcc-13-20220605 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20220605/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision ad6919374beafac4ec1a2f8059620f261019b02f

You'll find:

 gcc-13-20220605.tar.xz   Complete GCC

  SHA256=50e0c0cdd96ccee3dc7343c3316f21f6fb90cd81591e2007e86cb0d567cbf395
  SHA1=8d4c29ea88f690064f5594dd58b17f63f513d824

Diffs from 13-20220529 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug middle-end/105853] [13 regression] ice in pieces_addr constructor

2022-06-05 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105853

Roger Sayle  changed:

   What|Removed |Added

   Last reconfirmed||2022-06-05
   Priority|P3  |P1
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
 CC||roger at nextmovesoftware dot 
com
   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com

--- Comment #2 from Roger Sayle  ---
Doh!  Mine.  Sorry for the breakage.  My patch/solution for PR 105856 also
resolves this PR.  Calling expand_expr with a DECL_INITIAL CONSTRUCTOR from
load_register_parameters can trigger unintended pathways, instead we/I need to
call the lower level store_constructor directly.

[Bug target/105856] [13 Regression] ice in emit_move_insn, at expr.cc:4011

2022-06-05 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105856

Roger Sayle  changed:

   What|Removed |Added

Summary|ice in emit_move_insn, at   |[13 Regression] ice in
   |expr.cc:4011|emit_move_insn, at
   ||expr.cc:4011
   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com
 CC||roger at nextmovesoftware dot 
com
   Priority|P3  |P1
Version|12.0|13.0
 Status|UNCONFIRMED |ASSIGNED
   Keywords||ice-on-valid-code
   Target Milestone|--- |13.0
   Last reconfirmed||2022-06-05
 Target||arm*
 Ever confirmed|0   |1

--- Comment #2 from Roger Sayle  ---
Mine.  Sorry for the breakage.  I've a fix that avoids the ICE on ARM, and
allows GCC to generate the following code for this testcase:
g_329_3:
mov r0, #6
b   func_19
[i.e. the same code as without the #pragma pack(1)].
This is a big improvement on GCC v12 which generates
(both with and without #pragma pack(1)):

g_329_3:
  movw r3, #:lower16:.LANCHOR0
  movt r3, #:upper16:.LANCHOR0
  ldr r0, [r3]
  b func_19

I'm bootstrapping and regression testing on x86_64 now.

[Bug c++/105852] [13 Regression] ice in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105852

--- Comment #3 from Sam James  ---
Thanks for reporting, beat me to it. Looks like it's same on latest 11 (11.3.1
20220602) and 12 (12.1.1 20220604) snapshots.

[Bug c++/105852] [13 Regression] ice in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105852

--- Comment #2 from Sam James  ---
*** Bug 105859 has been marked as a duplicate of this bug. ***

[Bug c++/105859] ICE in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105859

Sam James  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Sam James  ---
.

*** This bug has been marked as a duplicate of bug 105852 ***

[Bug c++/105859] ICE in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105859

--- Comment #3 from Sam James  ---
Ah, it's probably a dupe of bug 105852.

Connecting From LinkedIn

2022-06-05 Thread Jacco Minnaar via Gcc
-- 

Hello from LinkedIn,

Is your email still active?


Re: [PING] PR middle-end/95126: Expand small const structs as immediate constants

2022-06-05 Thread Rainer Orth
Andreas Schwab  writes:

> This breaks Ada on aarch64 in stage3, probably a miscompiled stage2
> compiler.  For example:
>
> /opt/gcc/gcc-20220605/Build/./prev-gcc/xgcc
> -B/opt/gcc/gcc-20220605/Build/./prev-gcc/ -B/usr/aarch64-suse-linux/bin/
> -B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem
> /usr/aarch64-suse-linux/include -isystem
> /usr/aarch64-suse-linux/sys-include -fchecking=1 -c -g -O2 -fchecking=1
> -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada
> -I../../gcc/ada -Iada/libgnat -I../../gcc/ada/libgnat -Iada/gcc-interface
> -I../../gcc/ada/gcc-interface ../../gcc/ada/spark_xrefs.adb -o
> ada/spark_xrefs.o
> +===GNAT BUG DETECTED==+
> | 13.0.0 20220605 (experimental) [master ad6919374be] (aarch64-suse-linux) |
> | Assert_Failure failed precondition from sinfo-nodes.ads:5419 |
> | Error detected at types.ads:53:28|
> | Compiling ../../gcc/ada/spark_xrefs.adb  |
> | Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
> | Use a subject line meaningful to you and us to track the bug.|
> | Include the entire contents of this bug box in the report.   |
> | Include the exact command that you entered.  |
> | Also include sources listed below.   |
> +==+

Confirmed: this also happens on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.


[Bug target/105506] Error building GCC 12.1.0 against MinGW-w64: fatal error: cannot execute 'cc1': CreateProcess: No such file or directory

2022-06-05 Thread martin at martin dot st via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105506

Martin Storsjö  changed:

   What|Removed |Added

 CC||martin at martin dot st

--- Comment #6 from Martin Storsjö  ---
This is an old longstanding issue that seems to have reappeared, but which has
been fixed differently recently in the very latest mingw-w64 git. But first a
brief history of the issue:

GCC uses the access() function for checking whether a binary exists and is
executable (with the X_OK flag as parameter). On Windows, there's no separate
"execute" permission bit, but the X_OK bit (which isn't a documented parameter
from Microsoft's side) used to be ignored.

In Vista, msvcrt.dll's access() function suddenly stopped ignoring the bit that
was used for X_OK (which mingw had decided to use for that purpose), and
started erroring out when this bit was set. This was dealt with in 2007 in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33281, by adding a
reimplementation of the access() function in mingw. By defining
__USE_MINGW_ACCESS, the access() function is redirected to the __mingw_access()
function. GCC set -D__USE_MINGW_ACCESS when building on mingw to include this
workaround.

After some time, it seems like Microsoft reverted this behaviour in
msvcrt.dll's access() function, because now it no longer seems like this
behaviour is present, not on modern Windows 10, but not even on "modern"
installations of Vista either. So the need for -D__USE_MINGW_ACCESS has
vanished (and bitrotted in GCC somewhat).

UCRT's access() function does have the same issue though - if passed the
undocumented, mingw-invented X_OK bit, it errors out. As GCC did try to define
__USE_MINGW_ACCESS, the workaround should have been picked up though, but as
GCC's codebase had evolved, the define wasn't being set in all the cases where
it might have been needed. This was fixed for GCC 11 in
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=89e95ad2e7679322b2f5ee9070ff2721d5ca1d6d
(and later backported to GCC 9 and 10 in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101238).

But apparently something has changed further in GCC 12, so that this define
doesn't end up set in all the places where it needs to. (It'd be interesting to
know why/where/when!) In mingw-w64, we decided to enable this workaround
unconditionally for UCRT (as a more general fix for other audiences, although
GCC is the only one I've heard of needing it) - skipping the UCRT provided
access() function and always using the mingw reimplementation, see
https://github.com/mingw-w64/mingw-w64/commit/bceadc54d8f32b3f14c69074892e2718eac08e3b.

So to successfully build GCC 12 running on UCRT, you'd need to use another GCC
install, with the very latest mingw-w64 (or an older release with that fix
cherry-picked, plus the following Makefile.in update from
https://github.com/mingw-w64/mingw-w64/commit/89bacd2be60fa92dd74d3b5f2074b06a32d8c784),
to build GCC 12. Alternatively, see if you can manually pass
-D__USE_MINGW_ACCESS to the GCC 12 build, if it'd end up in all the places
where it's needed.

[RFC] Support for nonzero attribute

2022-06-05 Thread Miika via Gcc
Based on Jakub's and Yair's comments I created a new attribute "inrange".
Inrage takes three arguments, pos min and max.
Pos being the argument position in the function, and min and max defines the
range of valid integer. Both min and max are inclusive and work with enums.
Warnings are enabled with the new flag: "-Winrange".

The attribute definition would look something like this:
inrange(pos, min, max)


So for example compiling this with "gcc foo.c -Winrange":

#include 
void foo(int d) __attribute__ ((inrange (1, 100, 200)));
void foo(int d) {
printf("%d\n", d == 0);
}

int main() {
foo(0); // warning
foo(100); // no warning
}

Would give the following error:

foo.c: In function 'main':
foo.c:8:9: warning: argument in position 1 not in rage of 100..200 [-Winrange]
8 | foo(0); // warning
  | ^~~


I thought about having separate minval and maxval attributes but I personally
prefer that min and max values have to be defined explicitly.

If this looks good, I could look into applying inrange to types and variables
and after that I could start looking into optimization.

Patch for adding inrange is attached below

Miika

---
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 3239311b5a4..2f5732b3ed2 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -98,6 +98,7 @@ DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
 DEF_ATTR_IDENT (ATTR_MALLOC, "malloc")
 DEF_ATTR_IDENT (ATTR_NONNULL, "nonnull")
+DEF_ATTR_IDENT (ATTR_INRANGE, "inrange")
 DEF_ATTR_IDENT (ATTR_NORETURN, "noreturn")
 DEF_ATTR_IDENT (ATTR_NOTHROW, "nothrow")
 DEF_ATTR_IDENT (ATTR_LEAF, "leaf")
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index ac936d5..d6dc9c37723 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -119,6 +119,7 @@ static tree handle_novops_attribute (tree *, tree, tree, 
int, bool *);
 static tree handle_vector_size_attribute (tree *, tree, tree, int,
  bool *);
 static tree handle_nonnull_attribute (tree *, tree, tree, int, bool *);
+static tree handle_inrange_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nonstring_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nothrow_attribute (tree *, tree, tree, int, bool *);
 static tree handle_cleanup_attribute (tree *, tree, tree, int, bool *);
@@ -379,6 +380,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_tls_model_attribute, NULL },
   { "nonnull",0, -1, false, true, true, false,
  handle_nonnull_attribute, NULL },
+  { "inrange",3, 3, false, true, true, false,
+ handle_inrange_attribute, NULL },
   { "nonstring",  0, 0, true, false, false, false,
  handle_nonstring_attribute, NULL },
   { "nothrow",0, 0, true,  false, false, false,
@@ -3754,6 +3757,59 @@ handle_nonnull_attribute (tree *node, tree name,
   return NULL_TREE;
 }

+/* Handle the "inrange" attribute.  */
+
+static tree
+handle_inrange_attribute (tree *node, tree name,
+ tree args, int ARG_UNUSED (flags),
+ bool *no_add_attrs)
+{
+  tree type = *node;
+
+  /* Test the position argument  */
+  tree pos = TREE_VALUE (args);
+  if (!positional_argument (type, name, pos, INTEGER_TYPE, 0))
+*no_add_attrs = true;
+
+  /* Make sure that range args are INTEGRALs  */
+  bool range_err = false;
+  for (tree range = TREE_CHAIN (args); range; range = TREE_CHAIN (range))
+{
+  tree val = TREE_VALUE (range);
+  if (val && TREE_CODE (val) != IDENTIFIER_NODE
+ && TREE_CODE (val) != FUNCTION_DECL)
+   val = default_conversion (val);
+
+  if (TREE_CODE (val) != INTEGER_CST
+ || !INTEGRAL_TYPE_P (TREE_TYPE (val)))
+   {
+ warning (OPT_Wattributes,
+  "range value is not an integral constant");
+ *no_add_attrs = true;
+ range_err = true;
+   }
+}
+
+  /* Test the range arg max is not smaller than min
+ if range args are integrals  */
+  if (!range_err)
+{
+  tree range = TREE_CHAIN (args);
+  tree min = TREE_VALUE(range);
+  range = TREE_CHAIN (range);
+  tree max = TREE_VALUE(range);
+  if (!tree_int_cst_le (min, max))
+   {
+ warning (OPT_Wattributes,
+  "min range is bigger than max range");
+ *no_add_attrs = true;
+ return NULL_TREE;
+   }
+}
+
+  return NULL_TREE;
+}
+
 /* Handle the "nonstring" variable attribute.  */

 static tree
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 20258c331af..8936942fec8 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5342,6 +5342,51 @@ check_function_nonnull (location_t loc, tree attrs, int 
nargs, tree *argarray)
   

[Bug c++/105859] ICE in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105859

--- Comment #2 from Sam James  ---
Created attachment 53091
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53091=edit
vector.ii.orig.xz

[Bug c++/105859] ICE in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105859

--- Comment #1 from Sam James  ---
Created attachment 53090
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53090=edit
minimised.ii

[Bug c++/105859] New: ICE in instantiate_decl

2022-06-05 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105859

Bug ID: 105859
   Summary: ICE in instantiate_decl
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sam at gentoo dot org
  Target Milestone: ---

Originally reported downstream in Gentoo (https://bugs.gentoo.org/849791) by
Toralf Förster (toralf). Noticed when building dev-games/wfmath-1.0.2.

(Ionen in the downstream bug mentions 11.3.0 is fine, but it fails for me with
11.3.1 20220602 too.)

Reproducer:
```
template  struct Vector {
  friend Vector Cross(const Vector &, const Vector &);
  Vector (const int &);
};
template <> Vector<> <>::rotate(const int &) {
  Vector __trans_tmp_8 = Cross(__trans_tmp_8, *this);
}
Vector<> Cross(const Vector<> &, const Vector<> &) {}
```

Seems to not require any specific flags:
```
$ g++ foo.ii
foo.ii:3:53: warning: friend declaration ‘Vector< > Cross(const
Vector< >&, const Vector< >&)’ declares a non-template
function [-Wnon-template-friend]
3 |   friend Vector Cross(const Vector &, const Vector &);
  | ^
foo.ii:3:53: note: (if this is not what you intended, make sure the function
template has already been declared and add ‘<>’ after the function name here)
foo.ii: In member function ‘Vector< >& Vector<
>::rotate(const int&) [with int  = 3]’:
foo.ii:8:1: warning: no return statement in function returning non-void
[-Wreturn-type]
8 | }
  | ^
foo.ii: In function ‘Vector<> Cross(const Vector<>&, const Vector<>&)’:
foo.ii:9:53: warning: no return statement in function returning non-void
[-Wreturn-type]
9 | Vector<> Cross(const Vector<> &, const Vector<> &) {}
  | ^
foo.ii: At global scope:
foo.ii:7:31: internal compiler error: Segmentation fault
7 |   Vector __trans_tmp_8 = Cross(__trans_tmp_8, *this);
  |  ~^~
0xd06f86 crash_signal
   
/usr/src/debug/sys-devel/gcc-12.1.1_p20220604/gcc-12-20220604/gcc/toplev.cc:322
0x163c55e instantiate_decl(tree_node*, bool, bool)
   
/usr/src/debug/sys-devel/gcc-12.1.1_p20220604/gcc-12-20220604/gcc/cp/pt.cc:26488
0x130a940 instantiate_pending_templates(int)
   
/usr/src/debug/sys-devel/gcc-12.1.1_p20220604/gcc-12-20220604/gcc/cp/pt.cc:26809
0x1303070 c_parse_final_cleanups()
   
/usr/src/debug/sys-devel/gcc-12.1.1_p20220604/gcc-12-20220604/gcc/cp/decl2.cc:5128
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
```

```
$ g++ --version
g++ (Gentoo Hardened 12.1.1_p20220604 p7) 12.1.1 20220604
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

Re: [PATCH] x86: harmonize __builtin_ia32_psadbw*() types

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 2, 2022 at 5:04 PM Jan Beulich  wrote:
>
> The 64-bit, 128-bit, and 512-bit variants have VDI return type, in
> line with instruction behavior. Make the 256-bit builtin match, thus
> also making it match the insn it expands to (using VI8_AVX2_AVX512BW).
>
> gcc/
>
> * config/i386/i386-builtin.def (__builtin_ia32_psadbw256):
> Change type.
> * config/i386/i386-builtin-types.def: New function type
> (V4DI, V32QI, V32QI).
> * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
> V4DI_FTYPE_V32QI_V32QI.

LGTM, but please let HJ have the final approval.

Uros.

>
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -1217,7 +1217,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_mulv8si3, 
> "__builtin_ia32_pmulld256"  , IX86_BUILTIN_PMULLD256  , UNKNOWN, (int) 
> V8SI_FTYPE_V8SI_V8SI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_vec_widen_umult_even_v8si, 
> "__builtin_ia32_pmuludq256", IX86_BUILTIN_PMULUDQ256, UNKNOWN, (int) 
> V4DI_FTYPE_V8SI_V8SI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_iorv4di3, "__builtin_ia32_por256", 
> IX86_BUILTIN_POR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI)
> -BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> V16HI_FTYPE_V32QI_V32QI)
> +BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> V4DI_FTYPE_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufbv32qi3, 
> "__builtin_ia32_pshufb256", IX86_BUILTIN_PSHUFB256, UNKNOWN, (int) 
> V32QI_FTYPE_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufdv3, 
> "__builtin_ia32_pshufd256", IX86_BUILTIN_PSHUFD256, UNKNOWN, (int) 
> V8SI_FTYPE_V8SI_INT)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufhwv3, 
> "__builtin_ia32_pshufhw256", IX86_BUILTIN_PSHUFHW256, UNKNOWN, (int) 
> V16HI_FTYPE_V16HI_INT)
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -516,6 +516,7 @@ DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT
>  DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT, V8DI, UQI)
>  DEF_FUNCTION_TYPE (V8DI, V8DI, V4DI, INT, V8DI, UQI)
>  DEF_FUNCTION_TYPE (V4DI, V8SI, V8SI)
> +DEF_FUNCTION_TYPE (V4DI, V32QI, V32QI)
>  DEF_FUNCTION_TYPE (V8DI, V64QI, V64QI)
>  DEF_FUNCTION_TYPE (V4DI, V4DI, V2DI)
>  DEF_FUNCTION_TYPE (V4DI, PCV4DI, V4DI)
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -10359,6 +10359,7 @@ ix86_expand_args_builtin (const struct b
>  case V8SI_FTYPE_V16HI_V16HI:
>  case V4DI_FTYPE_V4DI_V4DI:
>  case V4DI_FTYPE_V8SI_V8SI:
> +case V4DI_FTYPE_V32QI_V32QI:
>  case V8DI_FTYPE_V64QI_V64QI:
>if (comparison == UNKNOWN)
> return ix86_expand_binop_builtin (icode, exp, target);
>


[Bug c/105858] New: MinGW-w64 64-bit build with --libstdcxx-pch: fatal error: cannot write PCH file: required memory segment unavailable

2022-06-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105858

Bug ID: 105858
   Summary: MinGW-w64 64-bit build with --libstdcxx-pch: fatal
error: cannot write PCH file: required memory segment
unavailable
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: brechtsanders at users dot sourceforge.net
  Target Milestone: ---

When building the Windows 64-bit version of GCC 12.1.0 against MinGW-w64 build
with --libstdcxx-pch the following error occurs:

In file included from
R:/winlibs64_stage/gcc-12.1.0/libstdc++-v3/include/precompiled/extc++.h:82:
R:/winlibs64_stage/gcc-12.1.0/build_mingw/x86_64-w64-mingw32/libstdc++-v3/include/ext/enc_filebuf.h:63:1:
fatal error: cannot write PCH file: required memory segment unavailable
   63 | } // namespace
  | ^
compilation terminated.

This error does not happen when building the 32-bit version.

Re: [x86 PATCH] Double word implementation of and; cmp to not; test optimization.

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 5, 2022 at 7:19 PM Roger Sayle  wrote:
>
>
> This patch extends the recent and;cmp to not;test optimization to also
> perform this transformation for TImode on TARGET_64BIT and DImode on -m32,
> One motivation for this is that it's a step to fixing the current failure
> of gcc.target/i386/pr65105-5.c on -m32.
>
> A more direct benefit for x86_64 is that the following code:
>
> int foo(__int128 x, __int128 y)
> {
>   return (x & y) == y;
> }
>
> improves (with -O2 -mbmi) from:
>
> movq%rdi, %r8
> movq%rsi, %rdi
> movq%rdx, %rsi
> andq%rcx, %rdi
> movq%r8, %rax
> andq%rdx, %rax
> movq%rdi, %rdx
> xorq%rsi, %rax
> xorq%rcx, %rdx
> orq %rdx, %rax
> sete%al
> movzbl  %al, %eax
> ret
>
> to the much better:
>
> movq%rdi, %r8
> movq%rsi, %rdi
> andn%rdx, %r8, %rax
> andn%rcx, %rdi, %rsi
> orq %rsi, %rax
> sete%al
> movzbl  %al, %eax
> ret
>
> The major theme of this patch is to generalize many of i386.md's
> *di3_doubleword patterns to become *_doubleword patterns, i.e.
> whenever there exists a "double word" optimization for DImode with -m32,
> there should be an equivalent TImode optimization on TARGET_64BIT.

No, please do not mix two different themes in one patch.

OTOH, the only TImode optimization that can be used with SSE registers
is with logic instructions and some constant shifts, but there is no
TImode arithmetic. I assume your end goal is to introduce STV for
TImode on 64-bit targets, because DImode patterns for x86_32 were
introduced to avoid early decomposition by middle end and to split
instructions that STV didn't convert to vector instructions after STV
pass. So, let's start with basic V1TImode support before optimizations
are introduced.

Uros.

> The following patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, where on TARGET_64BIT there are
> no new failures, but paradoxically with --target_board=unix{-m32}
> the other dg-final clause in gcc.target/i386/pr65105-5.c now fails.
> Counter-intuitively, this is progress, and pr65105-5.c may now be
> fixed (without using peephole2) simply by tweaking the STV pass to
> handle andn/test (in a follow-up patch).
> OK for mainline?
>
>
> 2022-06-05  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs) : Provide costs
> for double word comparisons and tests (comparisons against zero).
> * config/i386/i386.md (*test_not_doubleword): Split DWI
> and;cmp into andn;cmp $0 as a pre-reload splitter.
> (define_expand and3): Generalize from SWIM1248x to SWIDWI.
> (define_insn_and_split "*anddi3_doubleword"): Rename/generalize...
> (define_insn_and_split "*and3_doubleword"): ... to this.
> (define_insn "*andndi3_doubleword"): Rename and generalize...
> (define_insn "*andn3_doubleword): ... to this.
> (define_split): Split andn when TARGET_BMI for both  modes.
> (define_split): Split andn when !TARGET_BMI for both  modes.
> (define_expand 3): Generalize from SWIM1248x to
> SWIDWI.
> (define_insn_and_split "*3_doubleword): Generalize
> from DI mode to both  modes.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/testnot-3.c: New test case.
>
>
> Thanks again,
> Roger
> --
>


[Bug tree-optimization/105835] Dead Code Elimination Regression at -O1 (trunk vs. 12.1.0)

2022-06-05 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105835

Roger Sayle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com

--- Comment #2 from Roger Sayle  ---
Patch proposed
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596200.html

Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 5, 2022 at 1:48 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> Many thanks for your speedy review.  This revised patch implements
> all three of your recommended improvements; the use of
> ix86_binary_operator_ok with code UNKNOWN, the removal of
> "n" constraints from const_int_operand predicates, and the use
> of force_reg (for input operands, and REG_P for output operands)
> to ensure that it's always safe to call gen_lowpart/gen_highpart.
>
> [If we proceed with the recent proposal to split double word
> addition, subtraction and other operations before reload, then
> these new add/sub variants should be updated at the same time,
> but for now this patch keeps double word patterns consistent].
>
> This revised patch has been retested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, both with and without
> --target_board=unix{-m32} with no new failures.  Ok for mainline?

+(define_insn_and_split "*concat3_1"
+  [(set (match_operand: 0 "register_operand" "=r")
+ (any_or_plus:
+  (ashift: (match_operand: 1 "register_operand" "r")
+ (match_operand: 2 "const_int_operand"))
+  (zero_extend: (match_operand:DWIH 3 "register_operand" "r"]
+  "INTVAL (operands[2]) ==  * BITS_PER_UNIT
+   && REG_P (operands[0])
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 5) (match_dup 6))]
+{
+  operands[1] = force_reg (mode, operands[1]);
+  operands[4] = gen_lowpart (mode, operands[0]);
+  operands[5] = gen_highpart (mode, operands[0]);
+  operands[6] = gen_lowpart (mode, operands[1]);
+})

Hm, but in this particular case (and other) you can use
split_double_mode on operands[0], instead of manually splitting REG_P
constrained operands, and it will handle everything correctly. Please
note that split_double_mode has:

split_double_mode (machine_mode mode, rtx operands[],
   int num, rtx lo_half[], rtx hi_half[])

so with some care you can use:

"split_double_mode (mode, [0],1, [4], [5]);"

followed by:

operands[6] = simplify_gen_subreg (mode, op, mode, 0);

The above line is partially what split_double_mode does.

This is the approach other pre_reload doubleword splitters take, it
looks the safest (otherwise it would break left and right with
existing patterns ...), and the most effective to me.

Please also get approval for sse.md change from Hongtao, AVX512F stuff
is in a separate playground.

Uros.


>
> 2022-06-05  Roger Sayle  
> Uroš Bizjak  
>
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
> (*add3_doubleword_zext): New define_insn_and_split.
> (*sub3_doubleword_zext): New define_insn_and_split.
> (*concat3_1): New define_insn_and_split replacing
> previous define_split for implementing DST = (HI<<32)|LO as
> pair of move instructions, setting lopart and hipart.
> (*concat3_2): Likewise.
> (*concat3_3): Likewise, where HI is zero_extended.
> (*concat3_4): Likewise, where HI is zero_extended.
> * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
> (kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
> (kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
> (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP unspec.
> (vec_pack_trunc_): Likewise.
>
> gcc/testsuite/ChangeLog
> PR target/91681
> * g++.target/i386/pr91681.C: New test case (from the PR).
> * gcc.target/i386/pr91681-1.c: New int128 test case.
> * gcc.target/i386/pr91681-2.c: Likewise.
> * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.
>
>
> Thanks again,
> Roger
> --
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: 03 June 2022 11:08
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more
> > optimizations.
> >
> > On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle 
> > wrote:
> > >
> > >
> > > Technically, PR target/91681 has already been resolved; we now
> > > recognize the highpart multiplication at the tree-level, we no longer
> > > use the stack, and we currently generate the same number of
> > > instructions as LLVM.  However, it is still possible to do better, the
> > > current x86_64 code to generate a double word addition of a zero extended
> > operand, looks like:
> > >
> > > xorl%r11d, %r11d
> > > addq%r10, %rax
> > > adcq%r11, %rdx
> > >
> > > when it's possible (as LLVM does) to use an immediate constant:
> > >
> > > addq%r10, %rax
> > > adcq$0, %rdx
> > >
> > > To do this, the backend required one or two simple changes, that then
> > > themselves required one or two more obscure tweaks.
> > >
> > > The simple starting point is to define a zero_extendditi2 pattern, for
> > > zero extension from DImode to TImode on TARGET_64BIT that is split
> > > after reload.  Double word (TImode) 

Re: [x86 PATCH] Recognize vpcmov in combine with -mxop.

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Sat, Jun 4, 2022 at 1:03 PM Roger Sayle  wrote:
>
>
> By way of an apology for causing PR target/105791, where I'd overlooked
> the need to support V1TImode in TARGET_XOP's vpcmov instruction, this
> patch further improves support for TARGET_XOP's vpcmov instruction, by
> recognizing it in combine.
>
> Currently, the test case:
>
> typedef int v4si __attribute__ ((vector_size (16)));
> v4si foo(v4si c, v4si t, v4si f)
> {
> return (c)|(~c);
> }
>
> on x86_64 with -O2 -mxop generates:
> vpxor   %xmm2, %xmm1, %xmm1
> vpand   %xmm0, %xmm1, %xmm1
> vpxor   %xmm2, %xmm1, %xmm0
> ret
>
> but with this patch now generates:
> vpcmov  %xmm0, %xmm2, %xmm1, %xmm0
> ret
>
> On its own, the new combine splitter works fine on TARGET_64BIT, but
> alas with -m32 combine incorrectly thinks the replacement instruction
> is more expensive, as IF_THEN_ELSE isn't currently/correctly handled
> in ix86_rtx_costs.  So to avoid the need for a target selector in the
> new testcase, I've updated ix86_rtx_costs to report that AMD's vpcmov
> has a latency of two cycles [it's now an obsolete instruction set
> extension and there's unlikely to ever be a processor where this
> instruction has a different timing], and while there I also added
> rtx_costs for x86_64's integer conditional move instructions (which
> have single cycle latency).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-04  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs): Add a new case for
> IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and
> TARGET_CMOVE's (scalar integer) conditional moves.
> * config/i386/sse.md (define_split): Recognize XOP's vpcmov
> from its equivalent (canonical) pxor;pand;pxor sequence.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/xop-pcmov3.c: New test case.

OK with a nit below.

Thanks,
Uros.

+{
+  operands[5] = REGNO (operands[4]) == REGNO (operands[1]) ? operands[2]
+   : operands[1];
+})
+

Please expand this to enhance readability, it is a bit too cryptic for me ...


Re: [PING] PR middle-end/95126: Expand small const structs as immediate constants

2022-06-05 Thread Andreas Schwab
This breaks Ada on aarch64 in stage3, probably a miscompiled stage2
compiler.  For example:

/opt/gcc/gcc-20220605/Build/./prev-gcc/xgcc 
-B/opt/gcc/gcc-20220605/Build/./prev-gcc/ -B/usr/aarch64-suse-linux/bin/ 
-B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem 
/usr/aarch64-suse-linux/include -isystem /usr/aarch64-suse-linux/sys-include   
-fchecking=1 -c -g -O2 -fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. 
-Iada/generated -Iada -I../../gcc/ada -Iada/libgnat -I../../gcc/ada/libgnat 
-Iada/gcc-interface -I../../gcc/ada/gcc-interface ../../gcc/ada/spark_xrefs.adb 
-o ada/spark_xrefs.o
+===GNAT BUG DETECTED==+
| 13.0.0 20220605 (experimental) [master ad6919374be] (aarch64-suse-linux) |
| Assert_Failure failed precondition from sinfo-nodes.ads:5419 |
| Error detected at types.ads:53:28|
| Compiling ../../gcc/ada/spark_xrefs.adb  |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[x86 PATCH] Double word implementation of and; cmp to not; test optimization.

2022-06-05 Thread Roger Sayle

This patch extends the recent and;cmp to not;test optimization to also
perform this transformation for TImode on TARGET_64BIT and DImode on -m32,
One motivation for this is that it's a step to fixing the current failure
of gcc.target/i386/pr65105-5.c on -m32.

A more direct benefit for x86_64 is that the following code:

int foo(__int128 x, __int128 y)
{
  return (x & y) == y;
}

improves (with -O2 -mbmi) from:

movq%rdi, %r8
movq%rsi, %rdi
movq%rdx, %rsi
andq%rcx, %rdi
movq%r8, %rax
andq%rdx, %rax
movq%rdi, %rdx
xorq%rsi, %rax
xorq%rcx, %rdx
orq %rdx, %rax
sete%al
movzbl  %al, %eax
ret

to the much better:

movq%rdi, %r8
movq%rsi, %rdi
andn%rdx, %r8, %rax
andn%rcx, %rdi, %rsi
orq %rsi, %rax
sete%al
movzbl  %al, %eax
ret

The major theme of this patch is to generalize many of i386.md's
*di3_doubleword patterns to become *_doubleword patterns, i.e.
whenever there exists a "double word" optimization for DImode with -m32,
there should be an equivalent TImode optimization on TARGET_64BIT.

The following patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, where on TARGET_64BIT there are
no new failures, but paradoxically with --target_board=unix{-m32}
the other dg-final clause in gcc.target/i386/pr65105-5.c now fails.
Counter-intuitively, this is progress, and pr65105-5.c may now be
fixed (without using peephole2) simply by tweaking the STV pass to
handle andn/test (in a follow-up patch).
OK for mainline?


2022-06-05  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) : Provide costs
for double word comparisons and tests (comparisons against zero).
* config/i386/i386.md (*test_not_doubleword): Split DWI
and;cmp into andn;cmp $0 as a pre-reload splitter.
(define_expand and3): Generalize from SWIM1248x to SWIDWI.
(define_insn_and_split "*anddi3_doubleword"): Rename/generalize...
(define_insn_and_split "*and3_doubleword"): ... to this.
(define_insn "*andndi3_doubleword"): Rename and generalize...
(define_insn "*andn3_doubleword): ... to this.
(define_split): Split andn when TARGET_BMI for both  modes.
(define_split): Split andn when !TARGET_BMI for both  modes.
(define_expand 3): Generalize from SWIM1248x to
SWIDWI.
(define_insn_and_split "*3_doubleword): Generalize
from DI mode to both  modes.

gcc/testsuite/ChangeLog
* gcc.target/i386/testnot-3.c: New test case.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index df5c80d..af11669 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20918,6 +20918,19 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
  return true;
}
 
+  if (SCALAR_INT_MODE_P (GET_MODE (op0))
+ && GET_MODE_SIZE (GET_MODE (op0)) > UNITS_PER_WORD)
+   {
+ if (op1 == const0_rtx)
+   *total = cost->add
++ rtx_cost (op0, GET_MODE (op0), outer_code, opno, speed);
+ else
+   *total = 3*cost->add
++ rtx_cost (op0, GET_MODE (op0), outer_code, opno, speed)
++ rtx_cost (op1, GET_MODE (op0), outer_code, opno, speed);
+ return true;
+   }
+
   /* The embedded comparison operand is completely free.  */
   if (!general_operand (op0, GET_MODE (op0)) && op1 == const0_rtx)
*total = 0;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2b1d65b..502416b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9785,9 +9785,24 @@
(set (reg:CCZ FLAGS_REG)
(compare:CCZ (and:SWI (match_dup 2) (match_dup 1))
 (const_int 0)))]
-{
-  operands[2] = gen_reg_rtx (mode);
-})
+  "operands[2] = gen_reg_rtx (mode);")
+
+;; Split and;cmp (as optimized by combine) into andn;cmp $0
+(define_insn_and_split "*test_not_doubleword"
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ
+ (and:DWI
+   (not:DWI (match_operand:DWI 0 "register_operand"))
+   (match_operand:DWI 1 "nonimmediate_operand"))
+ (const_int 0)))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+  [(set (match_dup 2) (and:DWI (not:DWI (match_dup 0)) (match_dup 1)))
+   (clobber (reg:CC FLAGS_REG))])
+   (set (reg:CCZ FLAGS_REG) (compare:CCZ (match_dup 2) (const_int 0)))]
+  "operands[2] = gen_reg_rtx (mode);")
 
 ;; Convert HImode/SImode test instructions with immediate to QImode ones.
 ;; i386 does not allow to encode test with 8bit sign extended immediate, so
@@ -9846,19 +9861,21 @@
 ;; it should be done with splitters.
 
 (define_expand "and3"
-  [(set (match_operand:SWIM1248x 0 "nonimmediate_operand")
-   

[Bug libstdc++/105857] codecvt::do_length causes unexpected buffer overflow

2022-06-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857

--- Comment #2 from andysem at mail dot ru ---
> outside the [s, s + max_size) range

This should be [from, from_to) range. Sorry, posted a little too soon.

[Bug libstdc++/105857] codecvt::do_length causes unexpected buffer overflow

2022-06-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857

--- Comment #1 from andysem at mail dot ru ---
Created attachment 53089
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53089=edit
Test case to reproduce the problem.

[Bug libstdc++/105857] New: codecvt::do_length causes unexpected buffer overflow

2022-06-05 Thread andysem at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105857

Bug ID: 105857
   Summary: codecvt::do_length causes unexpected buffer overflow
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andysem at mail dot ru
  Target Milestone: ---

Consider the following test case:

#include 
#include 

const std::size_t max_size = 10u;
const char text[] = "
!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";

int main()
{
std::locale loc;
std::codecvt< wchar_t, char, std::mbstate_t > const& fac =
std::use_facet< std::codecvt< wchar_t, char, std::mbstate_t > >(loc);
std::mbstate_t mbs = std::mbstate_t();
const char* from = text;
const char* from_to = from + max_size;
std::size_t max = ~static_cast< std::size_t >(0u);
return static_cast< std::size_t >(fac.length(mbs, from, from_to, max));
}

$ g++ -g2 -O0 -o codecvt_length_bug codecvt_length_bug.cpp

Running this causes a crash with a buffer overflow:

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840) at
./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348011840)
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348011840) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348011840, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x77b56476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x77b3c7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x77b9d6f6 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x77cef943 "*** %s ***: terminated\n") at
../sysdeps/posix/libc_fatal.c:155
#6  0x77c4a76a in __GI___fortify_fail (msg=msg@entry=0x77cef8e9
"buffer overflow detected") at ./debug/fortify_fail.c:26
#7  0x77c490c6 in __GI___chk_fail () at ./debug/chk_fail.c:28
#8  0x77c4a199 in __mbsnrtowcs_chk (dst=, src=, nmc=, len=, ps=,
dstlen=) at ./debug/mbsnrtowcs_chk.c:27
#9  0x77e290d2 in std::codecvt::do_length(__mbstate_t&, char const*, char const*, unsigned long)
const () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x52d3 in std::__codecvt_abstract_base::length (this=0x77f86090, __state=..., __from=0x6040
 "
!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~",
 
__end=0x604a 
"*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~",
__max=18446744073709551615) at /usr/include/c++/11/bits/codecvt.h:219
#11 0x523d in main () at codecvt_length_bug.cpp:14

The problem appears to be that std::codecvt< wchar_t, char, std::mbstate_t
>::do_length() accesses characters outside the [s, s + max_size) range,
apparently using the ~static_cast< std::size_t >(0u) as the size limit. This is
against the do_length() definition in the C++ standard, see
[locale.codecvt.virtuals]/12-14
(http://eel.is/c++draft/locale.codecvt.virtuals#lib:codecvt,do_length):

Effects: The effect on the state argument is as if it called do_­in(state,
from, from_­end, from, to, to+max, to) for to pointing to a buffer of at least
max elements.

That is, max is only referred to as the size of the potential output buffer,
and the source buffer is specified as [from, from_end). There is no requirement
for max to be within [from, from_end) bounds. If I change max to (sizeof(text)
- 1u) then the buffer overflow does not happen.

(As to the purpose of this code, it is supposed to calculate the size, in
bytes, of the initial sequence of complete characters not larger than
max_size.)

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.2.0-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet

[Bug target/105506] Error building GCC 12.1.0 against MinGW-w64: fatal error: cannot execute 'cc1': CreateProcess: No such file or directory

2022-06-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105506

--- Comment #5 from Brecht Sanders  
---
Created attachment 53088
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53088=edit
Process Monitor when running `gcc -print-prog-name=cc1`

Process Monitor when running `gcc -print-prog-name=cc1`

[Bug target/105506] Error building GCC 12.1.0 against MinGW-w64: fatal error: cannot execute 'cc1': CreateProcess: No such file or directory

2022-06-05 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105506

--- Comment #4 from Brecht Sanders  
---
I just ran `gcc -print-prog-name=cc1` and saw the output was only `cc1` while
on working versions it reports a full path to `cc1.exe` (e.g.
`d:/prog/winlibs64_stage/custombuilt/share/gcc/bin/../libexec/gcc/x86_64-w64-mingw32/12.1.0/cc1.exe`).

In this minimal case Process Monitor also shows the handle to `cc1.exe` is
successfully opened but the subsequent calls to QueryInformationVolume and
QueryAllInformationFile fail with BUFFER_OVERFLOW.

[Bug target/105856] ice in emit_move_insn, at expr.cc:4011

2022-06-05 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105856

--- Comment #1 from David Binderman  ---
The bug first appears sometime after git hash de57440858591a88.

[PATCH] PR tree-optimization/105835: Two narrowing patterns for match.pd.

2022-06-05 Thread Roger Sayle

This patch resolves PR tree-optimization/105835, which is a code quality
(dead code elimination) regression at -O1 triggered/exposed by a recent
change to canonicalize X&-Y as X*Y.  The new (shorter) form exposes some
missed optimization opportunities that can be handled by adding some
extra simplifications to match.pd.

One transformation is to simplify "(short)(x ? 65535 : 0)" into the
equivalent "x ? -1 : 0", or more accurately x ? (short)-1 : (short)0",
as INTEGER_CSTs record their type, and integer conversions can be
pushed inside COND_EXPRs reducing the number of gimple statements.

The other transformation is that (short)(X * 65535), where X is [0,1],
into the equivalent (short)X * -1, (or again (short)-1 where tree's
INTEGER_CSTs encode their type).  This is valid because multiplications
where one operand is [0,1] are guaranteed not to overflow, and hence
integer conversions can also be pushed inside these multiplications.

These narrowing conversion optimizations can be identified by range
analyses, such as EVRP, but these are only performed at -O2 and above,
which is why this regression is only visible with -O1.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2022-06-05  Roger Sayle  

gcc/ChangeLog
* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)):
Narrow integer multiplication by a zero_one_valued_p operand.
(convert (cond @1 INTEGER_CST@2 INTEGER_CST@3)): Push integer
conversions inside COND_EXPR where both data operands are
integer constants.

gcc/testsuite/ChangeLog
* gcc.dg/pr105835.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 2d3ffc4..d705947 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1800,6 +1800,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && !TYPE_UNSIGNED (TREE_TYPE (@0)))
   (mult (convert @0) @1)))
 
+/* Narrow integer multiplication by a zero_one_valued_p operand.
+   Multiplication by [0,1] is guaranteed not to overflow.  */
+(simplify
+ (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
+ (if (INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
+  (mult (convert @1) (convert @2
+
 /* Convert ~ (-A) to A - 1.  */
 (simplify
  (bit_not (convert? (negate @0)))
@@ -4265,6 +4274,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 )
 #endif
 
+(simplify
+ (convert (cond@0 @1 INTEGER_CST@2 INTEGER_CST@3))
+ (if (INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+  (cond @1 (convert @2) (convert @3
+
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
be extended.  */
 /* This pattern implements two kinds simplification:
diff --git a/gcc/testsuite/gcc.dg/pr105835.c b/gcc/testsuite/gcc.dg/pr105835.c
new file mode 100644
index 000..354c81c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105835.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+void foo();
+
+static int b;
+
+static short a(short c, unsigned short d) { return c - d; }
+
+int main() {
+int e = -(0 < b);
+if (a(1, e))
+b = 0;
+else
+foo();
+}
+
+/* { dg-final { scan-tree-dump-not "goto" "optimized" } } */


[Bug target/105856] New: ice in emit_move_insn, at expr.cc:4011

2022-06-05 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105856

Bug ID: 105856
   Summary: ice in emit_move_insn, at expr.cc:4011
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

This C code:

#pragma pack(1)
struct {
  unsigned f0;
} static g_251 = {6};
void g_329_3() { func_19(g_251); }

when compiled with -O2 on a arm-32 compiler natively or cross, does this:

during RTL pass: expand
bug819.c: In function ‘g_329_3’:
bug819.c:5:18: internal compiler error: in emit_move_insn, at expr.cc:4011
5 | void g_329_3() { func_19(g_251); }
  |  ^~
0x67dcba emit_move_insn(rtx_def*, rtx_def*)
/home/dcb/gcc/trunk.git/gcc/expr.cc:4011
0xa2940d load_register_parameters
/home/dcb/gcc/trunk.git/gcc/calls.cc:2192
0xa2c59b expand_call(tree_node*, rtx_def*, int)
/home/dcb/gcc/trunk.git/gcc/calls.cc:3593
0xba77d0 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier,
 rtx_def**, bool)
/home/dcb/gcc/trunk.git/gcc/expr.cc:11621

The bug appears sometime in the week before git hash aec868578d851576.

[Bug c++/105851] Error: "Duplicate key positions selected" when recreating cfns.h

2022-06-05 Thread qcorba at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105851

--- Comment #4 from Eric Tang  ---
(In reply to Andreas Schwab from comment #3)
> $$ is makefile quoting, you need to resolve the quoting manually if you want
> to run the command outside of make.

Is there a means to check that?

I removed cfns.h and configured with --enable-maintainer-mode but the header
file was not recreated and build failed with:
../.././gcc/cp/except.c:1023:18: fatal error: cfns.h: No such file or directory
compilation terminated.

RE: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.

2022-06-05 Thread Roger Sayle

Hi Uros,
Many thanks for your speedy review.  This revised patch implements
all three of your recommended improvements; the use of
ix86_binary_operator_ok with code UNKNOWN, the removal of
"n" constraints from const_int_operand predicates, and the use
of force_reg (for input operands, and REG_P for output operands)
to ensure that it's always safe to call gen_lowpart/gen_highpart.

[If we proceed with the recent proposal to split double word 
addition, subtraction and other operations before reload, then
these new add/sub variants should be updated at the same time,
but for now this patch keeps double word patterns consistent].
 
This revised patch has been retested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without 
--target_board=unix{-m32} with no new failures.  Ok for mainline?


2022-06-05  Roger Sayle  
Uroš Bizjak  

gcc/ChangeLog
PR target/91681
* config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
(*add3_doubleword_zext): New define_insn_and_split.
(*sub3_doubleword_zext): New define_insn_and_split.
(*concat3_1): New define_insn_and_split replacing
previous define_split for implementing DST = (HI<<32)|LO as
pair of move instructions, setting lopart and hipart.
(*concat3_2): Likewise.
(*concat3_3): Likewise, where HI is zero_extended.
(*concat3_4): Likewise, where HI is zero_extended.
* config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
(kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
(kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
(vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP unspec.
(vec_pack_trunc_): Likewise.

gcc/testsuite/ChangeLog
PR target/91681
* g++.target/i386/pr91681.C: New test case (from the PR).
* gcc.target/i386/pr91681-1.c: New int128 test case.
* gcc.target/i386/pr91681-2.c: Likewise.
* gcc.target/i386/pr91681-3.c: Likewise, but for ia32.


Thanks again,
Roger
--

> -Original Message-
> From: Uros Bizjak 
> Sent: 03 June 2022 11:08
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more
> optimizations.
> 
> On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle 
> wrote:
> >
> >
> > Technically, PR target/91681 has already been resolved; we now
> > recognize the highpart multiplication at the tree-level, we no longer
> > use the stack, and we currently generate the same number of
> > instructions as LLVM.  However, it is still possible to do better, the
> > current x86_64 code to generate a double word addition of a zero extended
> operand, looks like:
> >
> > xorl%r11d, %r11d
> > addq%r10, %rax
> > adcq%r11, %rdx
> >
> > when it's possible (as LLVM does) to use an immediate constant:
> >
> > addq%r10, %rax
> > adcq$0, %rdx
> >
> > To do this, the backend required one or two simple changes, that then
> > themselves required one or two more obscure tweaks.
> >
> > The simple starting point is to define a zero_extendditi2 pattern, for
> > zero extension from DImode to TImode on TARGET_64BIT that is split
> > after reload.  Double word (TImode) addition/subtraction is split
> > after reload, so that constrains when things should happen.
> >
> > With zero extension now visible to combine, we add two new
> > define_insn_and_split that add/subtract a zero extended operand in
> > double word mode.  These apply to both 32-bit and 64-bit code
> > generation, to produce adc $0 and sbb $0.
> >
> > The first strange tweak is that these new patterns interfere with the
> > optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair of
> > register moves, or more accurately the combine splitter no longer
> > triggers as we're now converting two instructions into two
> > instructions (not three instructions into two instructions).  This is
> > easily repaired (and extended to handle TImode) by changing from a
> > pair of define_split (that handle operand commutativity) to a set of
> > four define_insn_and_split (again to handle operand commutativity).
> >
> > The other/final strange tweak that the above splitters now interfere
> > with AVX512's kunpckdq instruction which is defined as identical RTL,
> > DW:DI = (HI:SI<<32)|zero_extend(LO:SI).  To distinguish this, and also
> > avoid AVX512 mask registers being used by reload to perform SImode
> > scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) to the
> > unpack mask operations, which matches what sse.md does for the other
> > mask specific (logic) operations.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-06-03  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md 

[PATCH take #2] Fold truncations of left shifts in match.pd

2022-06-05 Thread Roger Sayle

Hi Richard,
Many thanks for taking the time to explain how vectorization is supposed
to work.  I now see that vect_recog_rotate_pattern in tree-vect-patterns.cc
is supposed to handle lowering of rotations to (vector) shifts, and
completely agree that adding support for signed types (using appropriate
casts to unsigned_type_for and casting the result back to the original
signed type) is a better approach to avoid the regression of pr98674.c.

I've also implemented your suggestions of combining the proposed new
(convert (lshift @1 INTEGER_CST@2)) with the existing one, and at the
same time including support for valid shifts greater than the narrower
type, such as (short)(x << 20),  to constant zero.  Although this optimization
is already performed during the tree-ssa passes, it's convenient to
also catch it here during constant folding.

This revised patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32}, with no new failures.  Ok for mainline?

2022-06-05  Roger Sayle  
Richard Biener  

gcc/ChangeLog
* match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
left shifts by a constant when the result is truncated, and the
shift constant is well-defined.
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Add
support for rotations of signed integer types, by lowering
using unsigned vector shifts.

gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-4.c: New test case.
* gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
* gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
* gcc.dg/vect/vect-over-widen-1.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.


Thanks again,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 02 June 2022 12:03
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] Fold truncations of left shifts in match.pd
> 
> On Thu, Jun 2, 2022 at 12:55 PM Roger Sayle 
> wrote:
> >
> >
> > Hi Richard,
> > > +  /* RTL expansion knows how to expand rotates using shift/or.  */
> > > + if (icode == CODE_FOR_nothing
> > > +  && (code == LROTATE_EXPR || code == RROTATE_EXPR)
> > > +  && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing
> > > +  && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing)
> > > +icode = (int) optab_handler (lshr_optab, vec_mode);
> > >
> > > but we then get the vector costing wrong.
> >
> > The issue is that we currently get the (relative) vector costing wrong.
> > Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar
> > code requires two shifts and an ior, so believes its profitable to
> > vectorize this loop using two vector shifts and an vector ior.  But
> > once match.pd simplifies the truncate and recognizes the HImode rotate we
> end up with:
> >
> > pr98674.c:6:16: note:   ==> examining statement: _6 = _1 r>> 8;
> > pr98674.c:6:16: note:   vect_is_simple_use: vectype vector(8) short int
> > pr98674.c:6:16: note:   vect_is_simple_use: operand 8, type of def: constant
> > pr98674.c:6:16: missed:   op not supported by target.
> > pr98674.c:8:33: missed:   not vectorized: relevant stmt not supported: _6 = 
> > _1
> r>> 8;
> > pr98674.c:6:16: missed:  bad operation or unsupported loop bound.
> >
> >
> > Clearly, it's a win to vectorize HImode rotates, when the backend can
> > perform
> > 8 (or 16) rotations at a time, but using 3 vector instructions, even
> > when a scalar rotate can performed in a single instruction.
> > Fundamentally, vectorization may still be desirable/profitable even when the
> backend doesn't provide an optab.
> 
> Yes, as said it's tree-vect-patterns.cc job to handle this not natively 
> supported
> rotate by re-writing it.  Can you check why vect_recog_rotate_pattern does not
> do this?  Ah, the code only handles !TYPE_UNSIGNED (type) - not sure why
> though (for rotates it should not matter and for the lowered sequence we can
> convert to desired signedness to get arithmetic/logical shifts)?
> 
> > The current situation where the i386's backend provides expanders to
> > lower rotations (or vcond) into individual instruction sequences, also 
> > interferes
> with
> > vector costing.   It's the vector cost function that needs to be fixed, not 
> > the
> > generated code made worse (or the backend bloated performing its own
> > RTL expansion workarounds).
> >
> > Is it instead ok to mark pr98674.c as XFAIL (a regression)?
> > The tweak to tree-vect-stmts.cc was based on the assumption that we
> > wished to continue vectorizing this loop.  Improving scalar code
> > generation really shouldn't disable vectorization like this.
> 

[Bug tree-optimization/105855] missed optimization - vectorization -fsanitize=signed-integer-overflow

2022-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105855

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Andrew Pinski  ---
You could in theory version the loop for the no overflow case.
Now is the question becomes is it worth the cost of implementing it in the
compiler. I doubt it.

[Bug tree-optimization/105855] missed optimization - vectorization -fsanitize=signed-integer-overflow

2022-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105855

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

Re: [1/2] PR96463 - aarch64 specific changes

2022-06-05 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 1 Jun 2022 at 14:12, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 12 May 2022 at 16:15, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Wed, 11 May 2022 at 12:44, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Prathamesh Kulkarni  writes:
> >> >> > On Fri, 6 May 2022 at 16:00, Richard Sandiford
> >> >> >  wrote:
> >> >> >>
> >> >> >> Prathamesh Kulkarni  writes:
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> >> >> >> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> >> >> > index c24c0548724..1ef4ea2087b 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> >> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> >> >> > @@ -44,6 +44,14 @@
> >> >> >> >  #include "aarch64-sve-builtins-shapes.h"
> >> >> >> >  #include "aarch64-sve-builtins-base.h"
> >> >> >> >  #include "aarch64-sve-builtins-functions.h"
> >> >> >> > +#include "aarch64-builtins.h"
> >> >> >> > +#include "gimple-ssa.h"
> >> >> >> > +#include "tree-phinodes.h"
> >> >> >> > +#include "tree-ssa-operands.h"
> >> >> >> > +#include "ssa-iterators.h"
> >> >> >> > +#include "stringpool.h"
> >> >> >> > +#include "value-range.h"
> >> >> >> > +#include "tree-ssanames.h"
> >> >> >>
> >> >> >> Minor, but: I think the preferred approach is to include "ssa.h"
> >> >> >> rather than include some of these headers directly.
> >> >> >>
> >> >> >> >
> >> >> >> >  using namespace aarch64_sve;
> >> >> >> >
> >> >> >> > @@ -1207,6 +1215,56 @@ public:
> >> >> >> >  insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode 
> >> >> >> > (0));
> >> >> >> >  return e.use_contiguous_load_insn (icode);
> >> >> >> >}
> >> >> >> > +
> >> >> >> > +  gimple *
> >> >> >> > +  fold (gimple_folder ) const OVERRIDE
> >> >> >> > +  {
> >> >> >> > +tree arg0 = gimple_call_arg (f.call, 0);
> >> >> >> > +tree arg1 = gimple_call_arg (f.call, 1);
> >> >> >> > +
> >> >> >> > +/* Transform:
> >> >> >> > +   lhs = svld1rq ({-1, -1, ... }, arg1)
> >> >> >> > +   into:
> >> >> >> > +   tmp = mem_ref [(int * {ref-all}) arg1]
> >> >> >> > +   lhs = vec_perm_expr.
> >> >> >> > +   on little endian target.  */
> >> >> >> > +
> >> >> >> > +if (!BYTES_BIG_ENDIAN
> >> >> >> > + && integer_all_onesp (arg0))
> >> >> >> > +  {
> >> >> >> > + tree lhs = gimple_call_lhs (f.call);
> >> >> >> > + auto simd_type = aarch64_get_simd_info_for_type (Int32x4_t);
> >> >> >>
> >> >> >> Does this work for other element sizes?  I would have expected it
> >> >> >> to be the (128-bit) Advanced SIMD vector associated with the same
> >> >> >> element type as the SVE vector.
> >> >> >>
> >> >> >> The testcase should cover more than just int32x4_t -> svint32_t,
> >> >> >> just to be sure.
> >> >> > In the attached patch, it obtains corresponding advsimd type with:
> >> >> >
> >> >> > tree eltype = TREE_TYPE (lhs_type);
> >> >> > unsigned nunits = 128 / TREE_INT_CST_LOW (TYPE_SIZE (eltype));
> >> >> > tree vectype = build_vector_type (eltype, nunits);
> >> >> >
> >> >> > While this seems to work with different element sizes, I am not sure 
> >> >> > if it's
> >> >> > the correct approach ?
> >> >>
> >> >> Yeah, that looks correct.  Other SVE code uses aarch64_vq_mode
> >> >> to get the vector mode associated with a .Q “element”, so an
> >> >> alternative would be:
> >> >>
> >> >> machine_mode vq_mode = aarch64_vq_mode (TYPE_MODE (eltype)).require 
> >> >> ();
> >> >> tree vectype = build_vector_type_for_mode (eltype, vq_mode);
> >> >>
> >> >> which is more explicit about wanting an Advanced SIMD vector.
> >> >>
> >> >> >> > +
> >> >> >> > + tree elt_ptr_type
> >> >> >> > +   = build_pointer_type_for_mode (simd_type.eltype, VOIDmode, 
> >> >> >> > true);
> >> >> >> > + tree zero = build_zero_cst (elt_ptr_type);
> >> >> >> > +
> >> >> >> > + /* Use element type alignment.  */
> >> >> >> > + tree access_type
> >> >> >> > +   = build_aligned_type (simd_type.itype, TYPE_ALIGN 
> >> >> >> > (simd_type.eltype));
> >> >> >> > +
> >> >> >> > + tree tmp = make_ssa_name_fn (cfun, access_type, 0);
> >> >> >> > + gimple *mem_ref_stmt
> >> >> >> > +   = gimple_build_assign (tmp, fold_build2 (MEM_REF, 
> >> >> >> > access_type, arg1, zero));
> >> >> >>
> >> >> >> Long line.  Might be easier to format by assigning the fold_build2 
> >> >> >> result
> >> >> >> to a temporary variable.
> >> >> >>
> >> >> >> > + gsi_insert_before (f.gsi, mem_ref_stmt, GSI_SAME_STMT);
> >> >> >> > +
> >> >> >> > + tree mem_ref_lhs = gimple_get_lhs (mem_ref_stmt);
> >> >> >> > + tree vectype = TREE_TYPE (mem_ref_lhs);
> >> >> >> > + tree lhs_type = TREE_TYPE (lhs);
> >> >> >>
> >> >> >> Is this necessary?  The code above supplied the types and I wouldn't
> >> >> >> have expected them to change during the build process.
> >> >> >>
> >> >> >> > +
> >> >> >> > + int source_nelts = 

[C++ PATCH take #2] PR c++/96442: Improved error recovery in enumerations.

2022-06-05 Thread Roger Sayle

Hi Jason,
My apologies for the long delay, but I've finally got around to
implementing your suggested improvements (implied by your review):
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591504.html
of my patch for PR c++/96442:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590716.html

The "How does that happen?" is insightful and leads to a cleaner
solution, setting ENUM_UNDERLYING_TYPE to integer_type_node when
issuing an error, so that this invariant holds during the parser's
error recovery.  I've also moved the new testcase to the g++.dg/parse
subdirectory as per your feedback on my previous ICE-on-invalid fixes.

This patch has been tested on x86_64-pc-linunx-gnu with make bootstrap
and make -k check with no new (unexpected) failures.  Ok for mainline?


2022-06-05  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96442
* decl.cc (start_enum): When emitting a "must be integral" error,
set ENUM_UNDERLYING_TYPE to integer_type_node, to avoid an ICE
downstream in build_enumeration.

gcc/testsuite/ChangeLog
PR c++/96442
* g++.dg/parse/pr96442.C: New test cae.

Thanks again,
Roger
--

> -Original Message-
> From: Jason Merrill 
> Sent: 10 March 2022 05:06
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Subject: Re: [C++ PATCH] PR c++/96442: Another improved error recovery in
> enumerations.
> 
> On 2/22/22 08:02, Roger Sayle wrote:
> >
> > This patch resolves PR c++/96442, another ICE-after-error regression.
> > In this case, invalid code attempts to use a non-integral type as the
> > underlying type for an enumeration (a record_type in the example given
> > in the bugzilla PR), for which the parser emits an error message but
> > allows the inappropriate type to leak to downstream code.
> 
> How does that happen?
> 
> Would it help to change dependent_type_p in start_enum to
> WILDCARD_TYPE_P?
> 
> > The minimal
> > safe fix is to double check that the enumeration's underlying type
> > EUTYPE satisfies INTEGRAL_TYPE_P before calling int_fits_type_p in
> > build_enumerator.  This is a one line fix, but correcting indentation
> > and storing a common subexpression in a variable makes the change look
> > a little bigger.
> >
> > This patch has been tested on x86_64-pc-linunx-gnu with make bootstrap
> > and make -k check with no new (unexpected) failures.  Ok for mainline?
> >
> >
> > 2022-02-22  Roger Sayle  
> >
> > gcc/cp/ChangeLog
> > PR c++/96442
> > * decl.cc (build_enumeration): Check ENUM_UNDERLYING_TYPE is
> > INTEGRAL_TYPE_P before calling int_fits_type_p.
> >
> > gcc/testsuite/ChangeLog
> > PR c++/96442
> > * g++.dg/pr96442.C: New test cae.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e0d397d..ca735d3 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -16306,8 +16306,11 @@ start_enum (tree name, tree enumtype, tree 
underlying_type,
   else if (dependent_type_p (underlying_type))
ENUM_UNDERLYING_TYPE (enumtype) = underlying_type;
   else
-error ("underlying type %qT of %qT must be an integral type", 
-   underlying_type, enumtype);
+   {
+ error ("underlying type %qT of %qT must be an integral type", 
+underlying_type, enumtype);
+ ENUM_UNDERLYING_TYPE (enumtype) = integer_type_node;
+   }
 }
 
   /* If into a template class, the returned enum is always the first
diff --git a/gcc/testsuite/g++.dg/parse/pr96442.C 
b/gcc/testsuite/g++.dg/parse/pr96442.C
new file mode 100644
index 000..235bb11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/pr96442.C
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+enum struct a : struct {};
+template  enum class a : class c{};
+enum struct a {b};
+// { dg-excess-errors "" }


[Bug tree-optimization/105855] New: missed optimization - vectorization -fsanitize=signed-integer-overflow

2022-06-05 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105855

Bug ID: 105855
   Summary: missed optimization - vectorization
-fsanitize=signed-integer-overflow
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: muecker at gwdg dot de
  Target Milestone: ---

It would be nice if -fsanitize=signed-integer-overflow
would impact optimization less, so it could be used
in production more often.


In the following example, using this flag prevents
vectorization:


void f(int i, float * restrict a, float * restrict b) 
{ 
for (int j = i; j < i + 4; ++j)
a[j] = b[j] + 1.;
}


https://godbolt.org/z/raqdd794x

[Bug target/105854] New: ICE: in extract_constrain_insn, at recog.cc:2692 (insn does not satisfy its constraints: sse2_lshrv1ti3)

2022-06-05 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105854

Bug ID: 105854
   Summary: ICE: in extract_constrain_insn, at recog.cc:2692 (insn
does not satisfy its constraints: sse2_lshrv1ti3)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 53087
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53087=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O -fcaller-saves -mavx512vl testcase.c 
testcase.c: In function 'foo':
testcase.c:29:1: error: insn does not satisfy its constraints:
   29 | }
  | ^
(insn 326 325 321 2 (set (reg:V1TI 52 xmm16 [128])
(lshiftrt:V1TI (reg:V1TI 52 xmm16 [128])
(const_int 0 [0]))) "testcase.c":18:14 6365 {sse2_lshrv1ti3}
 (nil))
during RTL pass: cprop_hardreg
testcase.c:29:1: internal compiler error: in extract_constrain_insn, at
recog.cc:2692
0x775bee _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/repo/gcc-trunk/gcc/rtl-error.cc:108
0x775c7b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/repo/gcc-trunk/gcc/rtl-error.cc:118
0x7647b9 extract_constrain_insn(rtx_insn*)
/repo/gcc-trunk/gcc/recog.cc:2692
0x130f4d4 copyprop_hardreg_forward_1
/repo/gcc-trunk/gcc/regcprop.cc:826
0x13108b3 execute
/repo/gcc-trunk/gcc/regcprop.cc:1406
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-992-20220605001627-gad6919374be-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r13-992-20220605001627-gad6919374be-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220605 (experimental) (GCC)

[Bug libstdc++/80331] unused const std::string not optimized away

2022-06-05 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331

--- Comment #10 from Marc Glisse  ---
(In reply to AK from comment #9)
> can't repro this with gcc 12.1 Seems like this is fixed?

No. As stated in other comments, it still reproduces with a longer string (or
with -D_GLIBCXX_USE_CXX11_ABI=0).

[Bug middle-end/105853] [13 regression] ice in pieces_addr constructor

2022-06-05 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105853

--- Comment #1 from David Binderman  ---
Reduced C code seems to be:

struct {
  struct {
short e16[3];
  }
} const eth_addr_zero = {{}};
void compose_nd_na_ipv6_src() { packet_set_nd(eth_addr_zero); }

Note no -march= setting is required.

[Bug middle-end/105853] [13 regression] ice in pieces_addr constructor

2022-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105853

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
Version|12.0|13.0
Summary|ice in pieces_addr  |[13 regression] ice in
   |constructor |pieces_addr constructor
   Keywords||ice-on-valid-code
  Component|c   |middle-end

[Bug c/105853] New: ice in pieces_addr constructor

2022-06-05 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105853

Bug ID: 105853
   Summary: ice in pieces_addr constructor
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Created attachment 53086
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53086=edit
C source code

The attached C code does this:

$ ../results/bin/gcc -c -w -march=bdver2 bug818.c
during RTL pass: expand
lib/packets.c: In function ‘compose_nd_ns’:
lib/packets.c:1701:5: internal compiler error: Segmentation fault
0xdad1c9 crash_signal(int)
/home/dcb/gcc/working/gcc/../../trunk.git/gcc/toplev.cc:322
0x955a4d pieces_addr::pieces_addr(rtx_def*, bool, rtx_def* (*)(void*, void*,
long, fixed_size_mode), void*)
/home/dcb/gcc/working/gcc/../../trunk.git/gcc/expr.cc:996
0x955a4d op_by_pieces_d::op_by_pieces_d(unsigned int, rtx_def*, bool, rtx_def*,
bool, rtx_def* (*)(void*, void*, l
ong, fixed_size_mode), void*, unsigned long, unsigned int, bool, bool)
/home/dcb/gcc/working/gcc/../../trunk.git/gcc/expr.cc:1174

The code seems to break sometime between git hash 919822adc923b00e
and aec868578d851576.

I will have my usual go at reducing the code.

[Bug c++/105852] [13 Regression] ice in instantiate_decl

2022-06-05 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105852

--- Comment #1 from David Binderman  ---
Reduced C++ code seems to be:

template  struct Local { friend Local False(int *); };
Local source_map_url;
Local False(int *);
void New() { False; }
Local False(int *) {}

[Bug c++/105851] Error: "Duplicate key positions selected" when recreating cfns.h

2022-06-05 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105851

Andreas Schwab  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Andreas Schwab  ---
$$ is makefile quoting, you need to resolve the quoting manually if you want to
run the command outside of make.