[Bug libgcc/110955] New: SIGSEGV in libgcc_s.so.1`classify_object_over_fdes+0x140 on Solaris SPARC with GCC 13 runtime

2023-08-08 Thread sumbera at volny dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110955

Bug ID: 110955
   Summary: SIGSEGV in
libgcc_s.so.1`classify_object_over_fdes+0x140 on
Solaris SPARC with GCC 13 runtime
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sumbera at volny dot cz
  Target Milestone: ---

I see this on Solaris 11.4 SPARC with libgcc_s.so.1 from GCC 13.1.0 and 13.2.0.
But not with libgcc_s.so.1 from GCC 12.2.0. I don't see this with Solaris i386.

/usr/bin/gnome-shell terminated by SIGSEGV with following stack:

Loading modules: [ libc.so.1 ld.so.1 ]
gnome-shell:core> $C
7ed252312fd1 libgcc_s.so.1`classify_object_over_fdes+0x140(a044b6f700?,
7ec86d0593d4?, 7ed252313950?, 0?, 0?, b3c4?)
7ed2523130a1 libgcc_s.so.1`__register_frame_info_bases+0x48(7ec76d05e010?,
a044b6f700?, a044b6f700?, 0?, 0?, a04460aa08?)
7ed252313161
libLLVM-13.so`_ZN4llvm19RTDyldMemoryManager16registerEHFramesEPhmm+0x30(a0445ef9e0?,
7ec76d05e010?, 7ec76d05e010?, 4c?, 7ec76d518944?, 0?)
7ed252313231
swrast_dri.so`_ZN26DelegatingJITMemoryManager16registerEHFramesEPhmm+0x40(a0446337a0?,
7ec76d05e010?, 7ec76d05e010?, 4c?, 7ec76dbc2a40?, 0?)
7ed2523132e1
libLLVM-13.so`_ZN4llvm14RuntimeDyldELF16registerEHFramesEv+0xac(a044611130?,
0?, f0?, a04460c3f8?, 4?, 0?)
7ed2523133a1
libLLVM-13.so`_ZN4llvm11RuntimeDyld16registerEHFramesEv+0x40(a04460f370?,
a04460f470?, 1?, 0?, a04460f310?, 1?)
7ed252313461
libLLVM-13.so`_ZN4llvm5MCJIT21finalizeLoadedModulesEv+0x488(a04460f0a0?,
a04460f370?, a04460f430?, a04460f448?, a04460f430?, a04460f430?)
7ed252313571
libLLVM-13.so`_ZN4llvm5MCJIT14finalizeObjectEv+0x290(a04460f0a0?, a04460f3c8?,
7ed252313e20?, a04460f3f0?, 7ed252313e60?, 7ed252313e60?)
7ed2523136e1 libLLVM-13.so`LLVMGetPointerToGlobal+0x38(a04460f0a0?,
a044607d88?, 291c00?, 0?, 0?, 336?)
7ed2523137a1 swrast_dri.so`llvmpipe_update_fs+0xe94(a0423cc3c0?,
a0446010b4?, a042451ba0?, a0426a6080?, a044601000?, 0?)
7ed2523140f1 swrast_dri.so`llvmpipe_update_derived+0x468(a0423cc3c0?,
43de?, 43d9?, a0426a44a0?, 4299?, 43df?)
7ed2523141a1 swrast_dri.so`llvmpipe_draw_vbo+0x3f4(a0423cc3c0?,
7ed252314b50?, 0?, 0?, 7ed252314b30?, 1?)
7ed252314261 swrast_dri.so`util_blitter_draw_rectangle+0x250(a042651fd0?,
a042654790?, 7ec763aa09c8?, 0?, 7ec7639f2248?, 7ed252314b50?)
7ed252314371 swrast_dri.so`blitter_draw_tex+0x1c4(a042651fd0?, 0?, 0?,
3c0?, 258?, a047321990?)
7ed2523144e1 swrast_dri.so`util_blitter_blit_generic+0x1760(a042651fd0?,
a047321990?, 7ed25231519c?, 0?, 2?, 0?)
7ed252314761 swrast_dri.so`util_blitter_blit+0xf4(a042651fd0?,
7ed252315190?, 0?, 1?, a043d52ba0?, a047321990?)
7ed2523148d1 swrast_dri.so`lp_blit+0x3bc(a0423cc3c0?, 0?, 0?, 0?, 0?,
4590?)
7ed252314a31 swrast_dri.so`util_gen_mipmap+0xd0(a0423cc3c0?, a0445edc60?,
4b0?, 1?, a?, 0?)
7ed252314b81 swrast_dri.so`st_generate_mipmap+0x188(a0427d47c0?, de1?,
a043d52730?, 0?, a042694130?, a?)
7ed252314c51 swrast_dri.so`_mesa_GenerateMipmap+0x174(de1?, 3?, 14d78?,
a043f6ef80?, a043d52730?, a0427d47c0?)
7ed252314d01 libglapi.so.0.0.0`shared_dispatch_stub_674+0x1c(de1?, 3?,
7ed25231566c?, a042699290?, a04281d2d8?, a042698a40?)
7ed252314db1
libmutter-cogl-9.so.0.0.0`_cogl_texture_gl_generate_mipmaps+0xc4(a043d52650?,
a042698a40?, a042698a40?, 7ed25231566c?, 7ed252315668?, a?)
7ed252314e71
libmutter-cogl-9.so.0.0.0`_cogl_texture_2d_pre_paint+0x48(a043d52650?, 1?, ff?,
a0445eda20?, 8?, a042698a40?)
7ed252314f21
libmutter-cogl-9.so.0.0.0`_cogl_texture_pre_paint+0x1c(a043d52650?, 1?,
7ed25231588c?, a0445edb20?, 0?, a0445edb00?)
7ed252314fd1
libmutter-cogl-9.so.0.0.0`_cogl_pipeline_layer_pre_paint+0x60(a0445edb20?, 0?,
a0445eda20?, 1?, a0445edb20?, a0445edb20?)
7ed252315091
libmutter-cogl-9.so.0.0.0`_cogl_rectangles_validate_layer_cb+0x14(a0445eda20?,
0?, 7ed252315b30?, 1?, 8?, 1?)
7ed252315141
libmutter-cogl-9.so.0.0.0`cogl_pipeline_foreach_layer+0x98(a0445eda20?,
7ec7648449a8?, 7ed252315b30?, 0?, 0?, a0445eda20?)
7ed252315211
libmutter-cogl-9.so.0.0.0`_cogl_framebuffer_draw_multitextured_rectangles+0x44(a043fe86a0?,
a0445eda20?, 7ed252315c68?, 1?, a0445ed9c0?, a0445eda20?)
7ed252315391
libmutter-cogl-9.so.0.0.0`cogl_framebuffer_draw_textured_rectangle+0x48(a043fe86a0?,
a0445eda20?, a043d52650?, a0445ed960?, a042699d90?, a0445edb20?)
7ed252315481
libmutter-9.so.0.0.0`meta_background_get_texture+0x820(a0439c2600?, 0?,
a042b62aa4?, 7ed252315ee0?, 4000300?, 0?)
7ed2523155b1
libmutter-9.so.0.0.0`meta_background_content_paint_content+0x88c(7ed252315ed0?,
7ed252315ee0?, a04407c8c0?, a04426af50?, 0?, a042b62a20?)
7ed2523156f1
libmutter-clutter-9.so.0.0.0`_clutter_content_paint_content+0x28(a042b62a20?,
a043212910?, a04407c8c0?, a04426af50?, 60?, a0432140e0?)
7ed252315

[Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #22 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:b66e613a1a8d5b8fc9d8b03f7b60260700acf833

commit r14-3095-gb66e613a1a8d5b8fc9d8b03f7b60260700acf833
Author: Richard Biener 
Date:   Tue Jul 25 15:36:30 2023 +0200

rtl-optimization/110587 - speedup find_hard_regno_for_1

The following applies a micro-optimization to find_hard_regno_for_1,
re-ordering the check so we can easily jump-thread by using an else.
This reduces the time spent in this function by 15% for the testcase
in the PR.

PR rtl-optimization/110587
* lra-assigns.cc (find_hard_regno_for_1): Re-order checks.

[Bug tree-optimization/109893] [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-160-gf828503eeb79ad1f1ada6db7deccc5abcc2f3ca3

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109893

--- Comment #2 from Andrew Pinski  ---
  # j_10 = PHI <0B(4), &e(6)>
...
  _21 = j_10 == 0B;
  _22 = j_10 == &e;
  _23 = _21 | _22;


The only pass which is able to optmize the above is pre.

That is take:
```
int e;

void g();

int f(int a, int b)
{
  int *t;
  if (a)
t = 0;
  else
t = &e;
  if (b)
g();
  int t1 = t == 0;
  int t2 = t == &e;
  return t1|t2;
}
```

[Bug tree-optimization/110131] [12/13/14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r12-6924-gc2b610e7c6c

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110131

--- Comment #5 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #4)
> So I have a VRP patch which gets us to:

  /* If the value range is defined by more than one pair,
 try to optimize to a singularity if either
 the first or last pair is a singleton.  */

That is if we have:
a range like: [0, 0][3, 32768][4294934529, +INF]

and we the comparison like `_37 <= 2`
Since 2 is a value between the first 2 pairs, we can just say this should be
the same as `_37 == 0` because that is the only value that is valid here.

The same idea happens for the last 2 pairs (and the last pair is a singleton).

Also if we only 2 pairs prefer the one which is 0 (since 0 is always simplier).

[Bug tree-optimization/110248] ivopts could under-cost for some addressing modes on len_{load,store}

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110248

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:4a8e6fa8016f8daf184dec49f371ca71b1cb0f01

commit r14-3093-g4a8e6fa8016f8daf184dec49f371ca71b1cb0f01
Author: Kewen Lin 
Date:   Wed Aug 9 00:41:52 2023 -0500

ivopts: Call valid_mem_ref_p with ifn [PR110248]

As PR110248 shows, to get the expected query results for
that internal functions LEN_{LOAD,STORE} is able to adopt
some addressing modes, we need to pass down the related
IFN code as well.  This patch is to make IVOPTs pass down
ifn code for USE_PTR_ADDRESS type uses, it adjusts the
related functions {strict_,}memory_address_addr_space_p,
and valid_mem_ref_p as well.

PR tree-optimization/110248

gcc/ChangeLog:

* recog.cc (memory_address_addr_space_p): Add one more argument ch
of
type code_helper and pass it to
targetm.addr_space.legitimate_address_p
instead of ERROR_MARK.
(offsettable_address_addr_space_p): Update one function pointer
with
one more argument of type code_helper as its assignees
memory_address_addr_space_p and strict_memory_address_addr_space_p
have been adjusted, and adjust some call sites with ERROR_MARK.
* recog.h (tree.h): New include header file for tree_code
ERROR_MARK.
(memory_address_addr_space_p): Adjust with one more unnamed
argument
of type code_helper with default ERROR_MARK.
(strict_memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Add one unnamed
argument of type code_helper.
* tree-ssa-address.cc (valid_mem_ref_p): Add one more argument ch
of
type code_helper and pass it to memory_address_addr_space_p.
* tree-ssa-address.h (valid_mem_ref_p): Adjust the declaration with
one more unnamed argument of type code_helper with default value
ERROR_MARK.
* tree-ssa-loop-ivopts.cc (get_address_cost): Use ERROR_MARK as
code
by default, change it with ifn code for USE_PTR_ADDRESS type use,
and
pass it to all valid_mem_ref_p calls.

[Bug tree-optimization/110248] ivopts could under-cost for some addressing modes on len_{load,store}

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110248

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:0412f0e374de1f66e20c407e0b519324af3fd5b6

commit r14-3094-g0412f0e374de1f66e20c407e0b519324af3fd5b6
Author: Kewen Lin 
Date:   Wed Aug 9 01:15:46 2023 -0500

rs6000: Teach legitimate_address_p about LEN_{LOAD,STORE} [PR110248]

This patch is to teach rs6000_legitimate_address_p to
handle the queried rtx constructed for LEN_{LOAD,STORE},
since lxvl and stxvl doesn't support x-form or ds-form,
so consider it as not legitimate when outer code is PLUS.

PR tree-optimization/110248

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Check if
the given code is for ifn LEN_{LOAD,STORE}, if yes then make it not
legitimate when outer code is PLUS.

[Bug tree-optimization/110248] ivopts could under-cost for some addressing modes on len_{load,store}

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110248

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:165b1f6ad1d3969e2c23417797362d0528e65c79

commit r14-3092-g165b1f6ad1d3969e2c23417797362d0528e65c79
Author: Kewen Lin 
Date:   Wed Aug 9 00:02:26 2023 -0500

targhooks: Extend legitimate_address_p with code_helper [PR110248]

As PR110248 shows, some middle-end passes like IVOPTs can
query the target hook legitimate_address_p with some
artificially constructed rtx to determine whether some
addressing modes are supported by target for some gimple
statement.  But for now the existing legitimate_address_p
only checks the given mode, it's unable to distinguish
some special cases unfortunately, for example, for LEN_LOAD
ifn on Power port, we would expand it with lxvl hardware
insn, which only supports one register to hold the address
(the other register is holding the length), that is we
don't support base (reg) + index (reg) addressing mode for
sure.  But hook legitimate_address_p only considers the
given mode which would be some vector mode for LEN_LOAD
ifn, and we do support base + index addressing mode for
normal vector load and store insns, so the hook will return
true for the query unexpectedly.

This patch is to introduce one extra argument of type
code_helper for hook legitimate_address_p, it makes targets
able to handle some special case like what's described
above.

PR tree-optimization/110248

gcc/ChangeLog:

* coretypes.h (class code_helper): Add forward declaration.
* doc/tm.texi: Regenerate.
* lra-constraints.cc (valid_address_p): Call target hook
targetm.addr_space.legitimate_address_p with an extra parameter
ERROR_MARK as its prototype changes.
* recog.cc (memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Likewise.
* target.def (legitimate_address_p,
addr_space.legitimate_address_p):
Extend with one more argument of type code_helper, update the
documentation accordingly.
* targhooks.cc (default_legitimate_address_p): Adjust for the
new code_helper argument.
(default_addr_space_legitimate_address_p): Likewise.
* targhooks.h (default_legitimate_address_p): Likewise.
(default_addr_space_legitimate_address_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p):
Adjust
with extra unnamed code_helper argument with default ERROR_MARK.
* config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
* config/arc/arc.cc (arc_legitimate_address_p): Likewise.
* config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/avr/avr.cc (avr_addr_space_legitimate_address_p):
Likewise.
* config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
* config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
* config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
* config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/cris/cris.cc (cris_legitimate_address_p): Adjust with
extra
unnamed code_helper argument with default ERROR_MARK.
* config/csky/csky.cc (csky_legitimate_address_p): Likewise.
* config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
Likewise.
* config/frv/frv.cc (frv_legitimate_address_p): Likewise.
* config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p):
Likewise.
* config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p):
Likewise.
* config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise.
* config/i386/i386.cc (ix86_legitimate_address_p): Likewise.
* config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise.
* config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise.
* config/lm32/lm32.cc (lm32_legitimate_address_p): Likewise.
* config/loongarch/loongarch.cc (loongarch_legitimate_address_p):
Likewise.
* config/m32c/m32c.cc (m32c_legitimate_address_p): Likewise.
(m32c_addr_space_legitimate_address_p): Likewise.
* config/m32r/m32r.cc (m32r_legitimate_address_p): Likewise.
* config/m68k/m68k.cc (m68k_legitimate_address_p): Likewise.
* config/mcore/mcore.cc (mcore_legitimate_address_p): Likewise.
* config/microblaze/microblaze-protos.h

[Bug tree-optimization/110131] [12/13/14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r12-6924-gc2b610e7c6c

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110131

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Target Milestone|12.4|14.0

--- Comment #4 from Andrew Pinski  ---
So I have a VRP patch which gets us to:
```
  # RANGE [irange] int [-32768, -1][2, 32767]
  _34 = (intD.6) _13;
  # RANGE [irange] unsigned int [2, 32767][4294934528, +INF]
  _30 = (unsigned int) _13;
  _22 = _30 == 4294967295;
  # RANGE [irange] int [-32768, 0][2, 32767]
  _35 = _22 ? _34 : 0;

```

Which is:
`(unsigned int)_13 == 4294967295 ? (intD.6) _13 : 0`

or rather `_13 == -1 ? -1 : 0`
or rather just `-(_13 == -1)`


  if (_35 != 5)

And obvious [-1,0] != 5.

So maybe:
```
(simplify
 (cond
  (eq:c@3 (convert1? @0) INTEGER_CST@1)
  (convert2? @0)
  INTEGER_CST@2
 )
 (if (INTEGRAL_TYPE_P (type))
  (cond @3 (convert @1) @2)
 )
)
```
There might be a few more checks dealing with @1 though.

[Bug tree-optimization/85316] [meta-bug] VRP range propagation missed cases

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
Bug 85316 depends on bug 107822, which changed state.

Bug 107822 Summary: [13/14/14 Regression] Dead Code Elimination Regression at 
-Os (trunk vs. 12.2.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107822

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/107822] [13/14/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0)

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107822

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #7 from Andrew Pinski  ---
Fixed.

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread de34 at live dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

--- Comment #15 from Jiang An  ---
(In reply to Arthur O'Dwyer from comment #11)
> @jwakely, I propose that this issue should be recategorized as a compiler
> bug. (And I'm also voting effectively "NAD" on LWG3967.)

Hmm... IMO given the current specification seems to be ambiguous, the status
quo of GCC's __is_nothrow_* can be considered conforming even though they're
obviously buggy (inconsistent).

[Bug fortran/109684] compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-08 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109684

--- Comment #14 from kargl at gcc dot gnu.org ---
(In reply to Paul Thomas from comment #13)
> (In reply to Steve Kargl from comment #12)
> > On Mon, Aug 07, 2023 at 10:04:54PM +, kargl at gcc dot gnu.org wrote:
> > > 
> > > diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
> > > index 3cd470ddcca..b0bb8bc1471 100644
> > > --- a/gcc/fortran/resolve.cc
> > > +++ b/gcc/fortran/resolve.cc
> > > @@ -17966,7 +17966,9 @@ resolve_types (gfc_namespace *ns)
> > > 
> > >for (n = ns->contained; n; n = n->sibling)
> > >  {
> > > -  if (gfc_pure (ns->proc_name) && !gfc_pure (n->proc_name))
> > > +  if (gfc_pure (ns->proc_name)
> > > + && !gfc_pure (n->proc_name)
> > > + && !n->proc_name->attr.artificial)
> > > gfc_error ("Contained procedure %qs at %L of a PURE procedure 
> > > must "
> > >"also be PURE", n->proc_name->name,
> > >&n->proc_name->declared_at);
> > > 
> > > pault, dos the above look correct?
> > > 
> > 
> > This patch passes a regression test with no new regressions
> > on x86_64-*-*freebsd.
> 
> Hi Steve,
> 
> That will certainly fix the bug. An alternative crosses my mind, which is to
> check the pureness of the final routines as the wrapper is being built and
> give the wrapper the pure attribute if they are all pure.
> 

Just saw that you attached a patch on 5/23/23 that it essentially the same as I
suggested.  I tried to simply set the final->attr.pure to 1, but this ran into
issues with the argument list having intent(inout) instead of just intent(in).

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #10 from Chenghui Pan  ---
(In reply to Stefan Schulze Frielinghaus from comment #9)
> Thanks for the reproducer and sorry for the hassle.
> 
> The normal form of a constant for a mode with fewer bits than in
> HOST_WIDE_INT is a sign extended version of the original constant.  This
> even holds for unsigned constants which I missed.  The following should fix
> this:
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index e46d202d0a7..9e5bf96a09d 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -12059,7 +12059,7 @@ simplify_compare_const (enum rtx_code code,
> machine_mode mode,
>: (GET_MODE_SIZE (int_mode)
>   - GET_MODE_SIZE (narrow_mode_iter)));
>   *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
> - *pop1 = GEN_INT (n);
> + *pop1 = gen_int_mode (n, narrow_mode_iter);
>   return adjusted_code;
> }
>  }
> 
> Can you give this a try?

Bootstrapping is successful with this, thank you!

[Bug tree-optimization/110931] [14 Regression] Dead Code Elimination Regression since r14-2890-gcc2003cd875

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110931

Andrew Pinski  changed:

   What|Removed |Added

 Blocks||85316

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> Basically there is a missing VRP happening here:
> l.0_1 [irange] int [-INF, -65536][0, 0][65536, +INF]
> Partial equiv (b_6 pe8 l.0_1)
>  :
> b_6 = (char) l.0_1;
> ...
> Obvious that b_6 will have the range [0,0] as the other parts of l.0_1 is
> outside of that range. But for some reason VRP didn't figure that out here
> ...

Oh it looks like we don't prop NONZERO back and I missed that when I first
looked at this.

In this case we have:
l&(short)(-1) == 0

But we don't record that in the above, only a range ...


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

[Bug tree-optimization/108360] [13/14 Regression] Dead Code Elimination Regression at -Os since r13-2048-g418b71c0d535bf

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108360

--- Comment #11 from Andrew Pinski  ---
I have a patch which gets us to:
```
  short int b.2_1;
...
  b.2_1 = b;
  _2 = b.2_1 <= 0;
  _4 = (char) _2;
  f = _4;
  f.5_5 = (unsigned char) _2;
  _6 = f.5_5 << 4;
  e = _6;
  _19 = b.2_1 <= 0;
  _7 = (short int) _19;
  _21 = _7 << 4;
  if (b.2_1 == _21)

```
Which is closer.

```
int f(short b)
{
  auto _19 = b <= 0;
  auto _7 = (short int) _19;
  auto _21 = _7 << 4;
  return b == _21;
}
```
is just `return false;` as _21's range is `[0,4]` but it is only 0 when `b > 0`

And then maybe:
(simplify
 (eq:c @0 (rshift (convert (le @0 integer_zero_p)) INTEGER_CST@1))
 (if (wi::to_wide(@1) < element_precision (TREE_TYPE (@0)) - 1)
  ( { false_value; } ))

Is this worth this pattern, I don't know ...

[Bug tree-optimization/108397] Missed optimization with [0, 0][-1U,-1U] range arithmetics

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108397

--- Comment #6 from Andrew Pinski  ---
I have a patch to fix the testcase in comment #2.

After that patch we have:
```
  _1 = o_10(D) == 0;
  _2 = (long long int) _1;
  _3 = -_2;
  t1_11 = (long long unsigned int) _3;
  _4 = t1_11 == 18446744073709551615;
  _5 = (long long int) _4;
  _6 = -_5;
  t2_12 = (long long unsigned int) _6;
  _7 = t1_11 <= t2_12;
  _8 = (long long int) _7;
  _9 = -_8;
  t3_13 = (long long unsigned int) _9;
```

Forwprop will not change _4 to _1 though. because t1_11 is used twice.

[Bug middle-end/110954] [14 Regression] Wrong code with -O0

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110954

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |middle-end

--- Comment #2 from Andrew Pinski  ---
Generic has different type constraints than gimple and that is what is
confusing here.
bitwise_inverted_equal_p cannot check comparisons to see if they are inverse of
each other unless the type is a boolean type ...

[Bug tree-optimization/110954] [14 Regression] Wrong code with -O0

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110954

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-09
 CC||pinskia at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |14.0
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #1 from Andrew Pinski  ---
I think this is mine.

[Bug tree-optimization/110954] New: [14 Regression] Wrong code with -O0

2023-08-08 Thread vsevolod.livinskiy at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110954

Bug ID: 110954
   Summary: [14 Regression] Wrong code with -O0
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vsevolod.livinskiy at gmail dot com
  Target Milestone: ---

Link to the compiler explorer:
https://godbolt.org/z/WYoT4hW9v

Reproducer:
#include 
unsigned long long a;
void b(unsigned long long *c, int g) { *c = g; }
int d, e = -38921963;
long f;
int main() {
  d = (-1807546494482798067UL - f < (6033086967267 > 0)) & e |
  !(-1807546494482798067UL - f < (6033086967267 > 0));
  b(&a, d);
  printf("%llu\n", a);
  if (a != 1)
__builtin_abort();
}

Error:
>$ g++ -O0 test.cpp && ./a.out 
18446744073670629653
Aborted (core dumped)
>$ /usr/bin/g++-11 -O0 test.cpp && ./a.out 
1

gcc version 14.0.0 20230808 (20659be04c2749f9f47b085f1789eee0d145fb36)

[Bug c/109836] -Wpointer-sign should be enabled by default

2023-08-08 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109836

Eric Gallager  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-August/
   ||626732.html

--- Comment #6 from Eric Gallager  ---
(In reply to Eric Gallager from comment #5)
> (In reply to Eric Gallager from comment #4)
> > How about:
> > 
> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index 0d0ad0a6374..f046d91d03b 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -1178,7 +1178,7 @@ C ObjC C++ ObjC++ Var(warn_pointer_arith) Warning
> > LangEnabledBy(C ObjC C++ ObjC+
> >  Warn about function pointer arithmetic.
> >  
> >  Wpointer-sign
> > -C ObjC Var(warn_pointer_sign) Warning LangEnabledBy(C ObjC,Wall ||
> > Wpedantic)
> > +C ObjC Var(warn_pointer_sign) Warning LangEnabledBy(C ObjC,Wall ||
> > Wpedantic || Wextra)
> >  Warn when a pointer differs in signedness in an assignment.
> >  
> >  Wpointer-compare
> 
> I sent this to the gcc-patches mailing list:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620137.html

Updated version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626732.html

[Bug target/110953] Introducing the "wincall" Calling Convention for GCC

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110953

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/110953] Introducing the "wincall" Calling Convention for GCC

2023-08-08 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110953

--- Comment #2 from cqwrteur  ---
Parameters

9+ 8 7 6 5 4 3 2 1
floating-point and __m128
stack  XMM7  XMM6  XMM5  XMM4  XMM3  XMM2  XMM1  XMM0

__m256
stack YMM7  YMM6  YMM5  YMM4  YMM3  YMM2  YMM1  YMM0

__m512
stack ZMM7  ZMM6  ZMM5  ZMM4  ZMM3  ZMM2  ZMM1  ZMM0


Sorry

[Bug target/110953] Introducing the "wincall" Calling Convention for GCC

2023-08-08 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110953

--- Comment #1 from cqwrteur  ---
TLDR:

floating-point and __m128
stack XMM8  XMM7  XMM6  XMM5  XMM4  XMM3  XMM2  XMM1  XMM0

__m256
stack YMM8  YMM7  YMM6  YMM5  YMM4  YMM3  YMM2  YMM1  YMM0

__m512
stack ZMM8  ZMM7  ZMM6  ZMM5  ZMM4  ZMM3  ZMM2  ZMM1  ZMM0

bool, integer and __uint128_t/__int128_t and std::float128_t
stack R19   R18   R17   R16   R9R8RDX   RCX

Aggregates (8, 16, 32, or 64 bits. 128 bits split to 2) and __m64
stack R19   R18   R17   R16   R9R8RDX   RCX

Other aggregates, as pointers
stack R19   R18   R17   R16   R9R8RDX   RCX

carry flag for exception

[Bug target/110953] New: Introducing the "wincall" Calling Convention for GCC

2023-08-08 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110953

Bug ID: 110953
   Summary: Introducing the "wincall" Calling Convention for GCC
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: unlvsur at live dot com
  Target Milestone: ---

I present a novel calling convention named "wincall" designed specifically for
GCC. This convention is accompanied by the [[__gnu__::__wincall__]] attribute
and caters to the latest Intel APX instructions on Windows systems, excluding
Linux, BSD, and similar platforms.

Motivation:

The current Windows calling convention exhibits inefficiencies and introduces
performance bottlenecks to C++ programs. This is particularly evident in
libstdc++ components such as "span" and "string_view":

Reference: std::span is not zero-cost on microsoft abi.std::span is not
zero-cost on microsoft abi.
https://www.reddit.com/r/cpp/comments/p0pkcv/stdspan_is_not_zerocost_on_microsoft_abi

https://developercommunity.visualstudio.com/t/post/10433601

The innovative Herbception mechanism, as proposed in P0709 by Herb Sutter,
necessitates passing std::error using two registers and a carry flag. However,
the existing Windows calling convention only allows returning one register.

The current calling conventions allocate just four registers for parameter
passing. Given that Intel has extended x86_64 registers from 16 to 32 for APX,
this presents an opportune moment to introduce a new calling convention to make
optimal use of these additional registers.

Notably, Windows DLL APIs are labeled with [[__gnu__::__stdcall]],
[[__gnu__::__cdecl__]], or [[__gnu__::__fastcall__]]. Implementing this new
convention will not disrupt code that interfaces with DLLs. Furthermore, MSVC
provides an option to toggle the default calling convention.

Eliminating the requirement for empty objects to occupy a register slot would
substantially ease the burden on C++ programmers.

The Windows ABI already follows a caller-saved approach for passing registers,
thus incorporating more registers for parameter passing should not pose issues.

Objectives:

Minimize the register usage for calls into the [[gnu::fastcall]] convention,
the sole existing calling convention for Windows.
Retain caller-saved registers, consistent with Windows conventions.
Ensure compatibility with the Itanium C++ ABI, without impacting the sysvabi.
Implement the proposed "wincall" convention first and allow Microsoft and Clang
to adopt it subsequently.
Seamlessly integrate with the existing Itanium C++ ABI rule for C++ objects'
return behavior (as currently practiced by GCC, not MSVC).
Guidelines:

Eliminate the necessity for empty objects to claim register slots.
Return the first parameter using the rax register and the second parameter
using the rdx register (similar to the 32-bit x86 convention).
When dealing with structures of 16 bits, split them into two parameters (unless
the object is empty, in which case no registers are used). Objects of lengths
1, 2, 4, 8, 16, 32, or 64 bits employ a single register. A 128-bit object uses
two registers, with the remaining bits passed using the object's address.
Adhere to the Itanium ABI rule for C++ objects' return, consistent with GCC's
practice.
Preserve the caller-saved parameter approach utilized in current Windows
conventions.


floating-point and __m128
stack XMM8  XMM7  XMM6  XMM5  XMM4  XMM3  XMM2  XMM1  XMM0

__m256
stack YMM8  YMM7  YMM6  YMM5  YMM4  YMM3  YMM2  YMM1  YMM0

__m512
stack ZMM8  ZMM7  ZMM6  ZMM5  ZMM4  ZMM3  ZMM2  ZMM1  ZMM0

bool, integer and __uint128_t/__int128_t and std::float128_t
stack R19   R18   R17   R16   R9R8RDX   RCX

Aggregates (8, 16, 32, or 64 bits. 128 bits split to 2) and __m64
stack R19   R18   R17   R16   R9R8RDX   RCX

Other aggregates, as pointers
stack R19   R18   R17   R16   R9R8RDX   RCX

Return values

A scalar return value that can fit into 64 bits, including the __m64 type, is
returned through RAX.
A scalar return value that can fit into 128 bits, is returned through RAX (low)
and RDX (high).

Non-scalar types including floats, doubles, and vector types such as __m128,
__m128i, __m128d are returned in XMM0. The state of unused bits in the value
returned in RAX or XMM0 is undefined.

User-defined types can be returned by value from global functions and static
member functions. To return a user-defined type by value in RAX (RDX for 128
bits), it must have a length of 1, 2, 4, 8, 16, 32, 64 or 128 bits.

The "Herbception" concept involves a structure named std::error:

struct error
{
void * domain;
uintptr_t code;
};

In the context of this convention, std::error is passed using rax for the
"domain" and rdx for the "code." Additionally, a carry flag is employed to
handle exceptions. Herbception triggers an exception when the carry flag is
set.

[Bug tree-optimization/108397] Missed optimization with [0, 0][-1U,-1U] range arithmetics

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108397

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Andrew Pinski  ---
I am going to change/fix va-values.cc:test_for_singularity to implement this.

The comment before this function even says:
```
/* We are comparing trees OP0 and OP1 using COND_CODE.  OP0 has
   a known value range VR.

   If there is one and only one value which will satisfy the
   conditional, then return that value.  Else return NULL.
```
Which does make it sound like it should have done that too.

[Bug middle-end/77432] warn about null check after pointer dereference

2023-08-08 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77432

--- Comment #8 from Eric Gallager  ---
(In reply to David Malcolm from comment #7)
> > Not sure if these are dupes or not (would we want a non-analyzer
> > implementation of this warning?)
> 
> Do we want a non-analyzer implementation of this warning, or should we close
> this out?

Well how difficult would a non-analyzer implementation be?

[Bug c++/87504] inconsistent diagnostic style between C and C++ for binary operators

2023-08-08 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87504

Eric Gallager  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

--- Comment #11 from Eric Gallager  ---
(In reply to Eric Gallager from comment #10)
> (In reply to Eric Gallager from comment #9)
> > (In reply to David Malcolm from comment #8)
> > > Author: dmalcolm
> > > Date: Thu Dec 20 14:18:48 2018
> > > New Revision: 267299
> > > 
> > > URL: https://gcc.gnu.org/viewcvs?rev=267299&root=gcc&view=rev
> > > Log:
> > > -Wtautological-compare: fix comparison of macro expansions
> > > 
> > > gcc/c-family/ChangeLog:
> > >   PR c++/87504
> > >   * c-warn.c (get_outermost_macro_expansion): New function.
> > >   (spelled_the_same_p): Use it to unwind the macro expansions, and
> > >   compare the outermost macro in each nested expansion, rather than
> > >   the innermost.
> > > 
> > > gcc/testsuite/ChangeLog:
> > >   PR c++/87504
> > >   * c-c++-common/Wtautological-compare-8.c: New test.
> > > 
> > > 
> > > Added:
> > > trunk/gcc/testsuite/c-c++-common/Wtautological-compare-8.c
> > > Modified:
> > > trunk/gcc/c-family/ChangeLog
> > > trunk/gcc/c-family/c-warn.c
> > > trunk/gcc/testsuite/ChangeLog
> > 
> > so is this fixed now?
> 
> WAITING on a reply

no reply; assuming this is fixed now. Feel free to reopen if there's still more
to do

[Bug target/21824] [meta-bug] bootstrap bugs for *-gnu*

2023-08-08 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21824

Eric Gallager  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Eric Gallager  ---
(In reply to Eric Gallager from comment #3)
> (In reply to Eric Gallager from comment #2)
> > (In reply to Alfred M. Szmidt from comment #1)
> > > Could someone go over these bugs and commit the pending patches?
> > 
> > Only bug 21823 is left now.
> 
> OK, that's closed now; can this one be closed now, too?

No reply; I'm going to assume that this is fixed now. Feel free to reopen if
there are new bootstrap bugs with *-gnu*.

[Bug target/55656] objc/obj-c++ failures present under darwin11

2023-08-08 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55656

--- Comment #10 from Eric Gallager  ---
(In reply to Iain Sandoe from comment #9)
> (In reply to Eric Gallager from comment #8)
> > (In reply to Iain Sandoe from comment #7)
> > > fixed on master
> > > e.g.
> > > https://gcc.gnu.org/pipermail/gcc-testresults/2021-March/669584.html
> > 
> > So... are we keeping this open for backports, then? 
> 
> yes, most of the fixes are code-gen ones - so I think we need them on at
> least one branch that can be built with c++98.
> 
> > If so, to which branches?
> 
> 10.x .. not sure if it's necessary to go back further (let's see there might
> be a subset of the changes that are worth doing).

10.x is closed now...

[Bug tree-optimization/100457] [meta-bug] Bugs relating to the enabling of vectorization at -O2 in GCC 12+

2023-08-08 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100457

Eric Gallager  changed:

   What|Removed |Added

Summary|[meta-bug] Enabling O2  |[meta-bug] Bugs relating to
   |vectorization in GCC 12 |the enabling of
   ||vectorization at -O2 in GCC
   ||12+

--- Comment #1 from Eric Gallager  ---
retitling slightly for clarity

[Bug libstdc++/110952] New: Allocator::pointer is required to be implicitly convertible from and into a native pointer

2023-08-08 Thread kamkaz at windowslive dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110952

Bug ID: 110952
   Summary: Allocator::pointer is required to be implicitly
convertible from and into a native pointer
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kamkaz at windowslive dot com
  Target Milestone: ---

For std::list and containers based on _Rb_tree (std::(multi_)set,
std::(multi_)map) there are some non-standard requirements currently imposed on
`std::allocator_traits::pointer` type.

In the implementation of these containers, this pointer type (which might be a
fancy_pointer) is required to be implicitly convertible from and into native
pointers, which in this case are equivalent to
`std::pointer_traits::element_type *`.

This bug is present in all the GCC versions I managed to test, from 6.2 until
13.2.

The proper way to convert from/into these custom pointer types is to use:
- std::__to_address(__ptr) to obtain the native pointer (which either calls
`__ptr.operator->()` or `std::pointer_traits::to_address(__ptr)`)
- std::pointer_traits::pointer_to(*__ptr) to get back the potentially
"fancy" pointer.

This proper way of handling allocator pointer is already implemented in
std::forward_list.

To fix this bug, the following changes must be made:

In bits/stl_tree.h:
Current:
protected:
  _Link_type
  _M_get_node()
  { return _Alloc_traits::allocate(_M_get_Node_allocator(), 1); }

  void
  _M_put_node(_Link_type __p) _GLIBCXX_NOEXCEPT
  { _Alloc_traits::deallocate(_M_get_Node_allocator(), __p, 1); }

Fixed:
protected:
  _Link_type
  _M_get_node()
  { 
auto __ptr = _Alloc_traits::allocate(_M_get_Node_allocator(), 1);
return std::__to_address(__ptr);
  }

  void
  _M_put_node(_Link_type __p) _GLIBCXX_NOEXCEPT
  {
typedef typename _Alloc_traits::pointer _Ptr;
auto __ptr = std::pointer_traits<_Ptr>::pointer_to(*__p);
_Alloc_traits::deallocate(_M_get_Node_allocator(), __ptr, 1);
  }

In bits/stl_list.h:
Current:
  typename _Node_alloc_traits::pointer
  _M_get_node()
  { return _Node_alloc_traits::allocate(_M_impl, 1); }

  void
  _M_put_node(typename _Node_alloc_traits::pointer __p) _GLIBCXX_NOEXCEPT
  { _Node_alloc_traits::deallocate(_M_impl, __p, 1); }

Fixed:
  typename _Node_alloc_traits::value_type*
  _M_get_node()
  {
auto __ptr = _Node_alloc_traits::allocate(_M_impl, 1);
return std::__to_address(__ptr);
  }

  void
  _M_put_node(typename _Node_alloc_traits::value_type* __p)
_GLIBCXX_NOEXCEPT
  {
typedef typename _Node_alloc_traits::pointer _Ptr;
auto __ptr = std::pointer_traits<_Ptr>::pointer_to(*__p);
_Node_alloc_traits::deallocate(_M_impl, __ptr, 1);
  }

This fix does not goes along with the coding style (81 characters in a line),
it might require some extra typedefs.

It is NOT a duplicate of Bug 57272 - it's not about the internal representation
of the nodes, just handling and requirements imposed on the allocator pointer.

There are no ABI issues here that I can think of.
There is a minuscule possibility it might be a breaking change for someone - if
their Fancy Pointer's implicit conversions behaved differently than its
`pointer_to` and `.operator->()` (or if they didn't provide them and relied on
implicit conversions, which are not part of the standard).

Here there is a small example reproducing the issue:
https://godbolt.org/z/fnno3jGYs

Note, that if implicit construction from `T*` and `operator T*()` are added to
the fancy pointer type, the example compiles.

(Yes, ppointer there doesn't meet the requirement of RandomAccessIterator that
is required for Allocator::pointer. However, since these functionalities are
not used by the mentioned containers, it doesn't matter here).

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #11 from Alexander Monakov  ---
(In reply to Alexander Monakov from comment #8)
> inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
> {
> memcpy(p, &x, sizeof(x));
> }
> 
> 
> We deciding to not inline this, while inlining its get_unaligned
> counterpart? Seems bizarre.

I can reproduce this part, and on my side it's caused by _FORTIFY_SOURCE: with
fortification, put_unaligned indeed looks bigger during inlining:

mbedtls_put_unaligned_uint32 (void * p, uint32_t x)
{
  long unsigned int _3;

   [local count: 1073741824]:
  _3 = __builtin_object_size (p_2(D), 0);
  __builtin___memcpy_chk (p_2(D), &x, 4, _3);
  return;

}

mbedtls_get_unaligned_uint64 (const void * p)
{
  long unsigned int _3;

   [local count: 1073741824]:
  _3 = MEM  [(char * {ref-all})p_2(D)];
  return _3;

}

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #14 from Andrew Pinski  ---
Since it was agreeded as a dup, let's mark it as such.

*** This bug has been marked as a duplicate of bug 100470 ***

[Bug c++/100470] std::is_nothrow_move_constructible incorrect behavior for explicitly defaulted members

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100470

Andrew Pinski  changed:

   What|Removed |Added

 CC||nikolasklauser at berlin dot de

--- Comment #5 from Andrew Pinski  ---
*** Bug 106611 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/108397] Missed optimization with [0, 0][-1U,-1U] range arithmetics

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108397

--- Comment #4 from Andrew Pinski  ---
Also:
```
int f(int a, int b)
{
int c = a == b;
c = -c;
return c <= -1;
}
```
At `-O2 -fwrapv` is not optimized at the gimple level but is at the RTL level
even.

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

--- Comment #13 from Arthur O'Dwyer  ---
(In reply to Andrew Pinski from comment #12)
> I suspect this is a dup of bug 100470 then.

Yep, I agree. My previous comment was a longwinded version of jwakely's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100470#c1 :)

[Bug tree-optimization/108397] Missed optimization with [0, 0][-1U,-1U] range arithmetics

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108397

--- Comment #3 from Andrew Pinski  ---
I have seen other bug reports having a similar issue too.

[Bug tree-optimization/108397] Missed optimization with [0, 0][-1U,-1U] range arithmetics

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108397

--- Comment #2 from Andrew Pinski  ---
I think the simple missed optimization here is:
```
int f2(int a)
{
 if(a != -1 && a != 0)
   __builtin_unreachable();
  unsigned c = a;
  if(c > 121212)
return 1;
  return 0;
}
```
This should be optimized to just:
```
int f2_(int a)
{
  return a != 0;
}
```


That is if we have:
```
  # RANGE [irange] unsigned int [0, 0][+INF, +INF]
  a.0_1 = (unsigned int) a_4(D);
  if (a.0_1 > 121212)
```
Since we have two values for a.0_1 we can just compare to one or the other in
the above case.

Then that will optimize the original testcase as we can optimize away the
nop_convert and then negative and then we get:
t1 <= t1
and that is folded trivially to true.

[Bug testsuite/110951] [13/14] RISCV: rv32 newlib gcc.c-torture testsuite fails with xgcc: fatal error: Cannot find suitable multilib set for '-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'

2023-08-08 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110951

--- Comment #1 from Edwin Lu  ---
On rv32 newlib GCC 12 the issue is different and might be unrelated:

- ABI is incompatible with that of the selected emulation:
  target emulation `elf64-littleriscv' does not match `elf32-littleriscv'
- ./20031020-1.exe(.text.exit): relocation "_exit+0x0 (type R_RISCV_CALL_PLT)"
goes out of range
- file class ELFCLASS64 incompatible with ELFCLASS32
- final link failed: file in wrong format

rv32 linux GCC 13/14 the issue is also different and might be unrelated
FAIL: gcc.c-torture/execute/20031020-1.c   -O0  (test for excess errors)
Testing execute/20031020-1.c,   -O1
Executing on host:
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/ 
/scratch/ewlu/ci/triage/torture/gcc/gcc/testsuite/gcc.c-torture/execute/20031020-1.c
 -march=rv32gc -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output-O1
 -w  -lm  -o ./20031020-1.exe(timeout = 600)
output is In file included from
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/features.h:515,
 from
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/bits/libc-header-start.h:33,
 from
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/limits.h:26,
 from
/scratch/ewlu/ci/triage/torture/build/lib/gcc/riscv64-unknown-linux-gnu/14.0.0/include/limits.h:205,
 from
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/include/limits.h:205,
 from
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/include/syslimits.h:7,
 from
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/include/limits.h:34,
 from
/scratch/ewlu/ci/triage/torture/gcc/gcc/testsuite/gcc.c-torture/execute/20031020-1.c:6:
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/gnu/stubs.h:11:11:
fatal error: gnu/stubs-ilp32d.h: No such file or directory
compilation terminated.
 status 1
Checking pattern "sparc-*-sunos*" with x86_64-pc-linux-gnu
Checking pattern "alpha*-*-*" with x86_64-pc-linux-gnu
Checking pattern "hppa*-*-hpux*" with x86_64-pc-linux-gnu
compiler exited with status 1

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

--- Comment #12 from Andrew Pinski  ---
I suspect this is a dup of bug 100470 then.

[Bug testsuite/110951] New: [13/14] RISCV: rv32 newlib gcc.c-torture testsuite fails with xgcc: fatal error: Cannot find suitable multilib set for '-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'

2023-08-08 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110951

Bug ID: 110951
   Summary: [13/14] RISCV: rv32 newlib gcc.c-torture testsuite
fails with xgcc: fatal error: Cannot find suitable
multilib set for
'-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ewlu at rivosinc dot com
  Target Milestone: ---

Seeing the following error on newlib rv32 builds for most (all?)
gcc.c-torture/execute tests without the dg directives (for example,
gcc.c-torture/execute/20031020-1.c).

Error message: xgcc: fatal error: Cannot find suitable multilib set for
'-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'
results in a compilation failure

Confirmed that the error exists on gcc-13.2.0 and gcc-14. 

Example can be found on
https://github.com/patrick-rivos/riscv-gnu-toolchain/issues/137

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #10 from Alexander Monakov  ---
Ah, the non-static inlines are intentional, the corresponding extern
declarations appear in library/platform_util.c. Sorry, I missed that file the
first time around.

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #11 from Arthur O'Dwyer  ---
Jiang An wrote:
> I've mailed the LWG Chair to submit an LWG issue that requests clarification 
> of "is known not to throw any exceptions".
> FYI, there's at least one library implementor holding the same opinion as 
> yours.
> https://quuxplusone.github.io/blog/2023/04/17/noexcept-false-equals-default/

Quuxplusone here. :) I don't think this is LWG jurisdiction at all. This isn't
even a bug in libstdc++'s . This is purely a GCC core-language
bug. GCC's builtin __is_nothrow_constructible(T, T&&) simply returns the wrong
answer when the selected constructor is "trivial, but noexcept(false)."

// https://godbolt.org/z/5szW6KeWq
struct C {
  C(C&&) noexcept(false) = default;
};
static_assert(!__is_nothrow_constructible(C, C&&));
  // GCC+EDG fail; Clang+MSVC succeed

Notice that the builtin returns the correct answer when the selected
constructor is "non-trivial, noexcept(false), but still defaulted so we know it
can't throw." The problem is specifically with *trivial* ctors.

@jwakely, I propose that this issue should be recategorized as a compiler bug.
(And I'm also voting effectively "NAD" on LWG3967.)

[Bug c++/100482] namespaces as int in decltype expression

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100482

--- Comment #3 from CVS Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:a90bd3ea6d1ba27b15476f0a768d7952c6723420

commit r14-3087-ga90bd3ea6d1ba27b15476f0a768d7952c6723420
Author: Nathaniel Shead 
Date:   Tue Aug 8 12:48:43 2023 +1000

c++: Report invalid id-expression in decltype [PR100482]

This patch ensures that any errors raised by finish_id_expression when
parsing a decltype expression are properly reported, rather than
potentially going ignored and causing invalid code to be accepted.

We can also now remove the separate check for templates without args as
this is also checked for in finish_id_expression.

PR c++/100482

gcc/cp/ChangeLog:

* parser.cc (cp_parser_decltype_expr): Report errors raised by
finish_id_expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-100482.C: New test.

Signed-off-by: Nathaniel Shead 

[Bug c++/110938] [11/12/13/14 Regression] miscompile if implicit special member is deleted and mutable

2023-08-08 Thread richard-gccbugzilla at metafoo dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110938

--- Comment #4 from Richard Smith  ---
Looks like the trait difference only happens if the templated constructor is
not deleted, but the ABI mismatch happens regardless. Possibly there are two
separate issues here?

[Bug middle-end/110950] RISC-V vector ICE in expand_const_vector

2023-08-08 Thread jeremy.bennett at embecosm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950

--- Comment #1 from Jeremy Bennett  ---
Created attachment 55709
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55709&action=edit
Script to run the compilation

[Bug middle-end/110950] New: RISC-V vector ICE in expand_const_vector

2023-08-08 Thread jeremy.bennett at embecosm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950

Bug ID: 110950
   Summary: RISC-V vector ICE in expand_const_vector
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jeremy.bennett at embecosm dot com
  Target Milestone: ---

Created attachment 55708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55708&action=edit
C source

The following code (testcase.c) causes an ICE when using RISC-V vector as
target.

a;
b() {
  long *c = 0;
  int *d;
  for (; a; ++a)
c[a] = d[-a];
}

Compiled with

riscv64-unknown-linux-gnu-gcc -march=rv64gcv -mabi=lp64d \
  -c -Ofast --param=riscv-autovec-preference=scalable \
  testcase.c

Output is

testcase.c:1:1: warning: data definition has no type or storage class
1 | a;
  | ^
testcase.c:1:1: warning: type defaults to 'int' in declaration of 'a'
[-Wimplicit-int]
testcase.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
2 | b() {
  | ^
during RTL pass: expand
testcase.c: In function 'b':
testcase.c:6:10: internal compiler error: in expand_const_vector, at
config/riscv/riscv-v.cc:1510
6 | c[a] = d[-a];
  | ~^~~
0x8e70b6 expand_const_vector
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:1510
0x14a4df4 riscv_vector::legitimize_move(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:1524
0x184044f gen_movrvvm8qi(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/vector.md:1054
0xc56b57 rtx_insn* insn_gen_fn::operator()(rtx_def*,
rtx_def*) const
/home/jeremy/gittrees/mustang/gcc/gcc/recog.h:407
0xc56b57 emit_move_insn_1(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:4164
0xc56f65 emit_move_insn(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:4334
0xc2c4cd force_reg(machine_mode, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/explow.cc:693
0x14a2cee shuffle_generic_patterns
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:3120
0x14a2cee expand_vec_perm_const_1
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:3151
0x14a32b3 riscv_vector::expand_vec_perm_const(machine_mode, machine_mode,
rtx_def*, rtx_def*, rtx_def*, vec_perm_indices const&)
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:3203
0xefe0ce expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*,
int_vector_builder > const&, machine_mode, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/optabs.cc:6508
0xc4f682 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:10453
0xc53c58 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:10805
0xc4cb7a expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:9010
0xc4cb7a expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.h:310
0xc4cb7a expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:9345
0xc53c58 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:10805
0xc6062d store_expr(tree_node*, rtx_def*, int, bool, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:6325
0xc62201 expand_assignment(tree_node*, tree_node*, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:6043
0xb1f05c expand_gimple_stmt_1
/home/jeremy/gittrees/mustang/gcc/gcc/cfgexpand.cc:3946
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

System information
--

Using built-in specs.
COLLECT_GCC=./riscv64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/jeremy/gittrees/mustang/install/libexec/gcc/riscv64-unknown-linux-gnu/14.0.0/lto-wrapper
Target: riscv64-unknown-linux-gnu
Configured with: /home/jeremy/gittrees/mustang/gcc/configure
--target=riscv64-unknown-linux-gnu
--prefix=/home/jeremy/gittrees/mustang/install
--with-sysroot=/home/jeremy/gittrees/mustang/install/sysroot
--with-pkgversion=gf9d93f8cc24 --with-system-zlib --enable-shared --enable-tls
--enable-languages=c,c++,fortran --disable-libmudflap --disable-libssp
--disable-libquadmath --disable-libsanitizer --disable-nls --disable-bootstrap
--src=/home/jeremy/gittrees/mustang/gcc --enable-multilib --with-abi=lp64d
--with-arch=rv64gc --with-tune= --with-isa-spec=20191213 'CFLAGS_FOR_TARGET=-O2
   -mcmodel=medany' 'CXX

[Bug tree-optimization/110949] New: ((cast)cmp) - 1 should be tranformed into (cast)cmp` where cmp` is the inverse of cmp

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110949

Bug ID: 110949
   Summary: ((cast)cmp) - 1 should be tranformed into (cast)cmp`
where cmp` is the inverse of cmp
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
int f1(int a, int t)
{
  auto _6 = a == 115;
  auto _7 = (signed int) _6;
  return _6 - 1;
}
```
This should be the same as:
```
int f2(int a, int t)
{
  auto _6 = a != 115;
  auto _7 = (signed int) _6;
  return -_6;
}
```

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #9 from Alexander Monakov  ---
(In reply to Alexander Monakov from comment #2)
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
> mistake anyway (if they are really meant to be non-static inlines, shouldn't
> there be a comment?)

Can you address this on the mbedtls side? Even if it doesn't help with the
observed slowdown, it will remain a problem for the future if left unfixed.

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #9 from Stefan Schulze Frielinghaus  
---
Thanks for the reproducer and sorry for the hassle.

The normal form of a constant for a mode with fewer bits than in HOST_WIDE_INT
is a sign extended version of the original constant.  This even holds for
unsigned constants which I missed.  The following should fix this:

diff --git a/gcc/combine.cc b/gcc/combine.cc
index e46d202d0a7..9e5bf96a09d 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12059,7 +12059,7 @@ simplify_compare_const (enum rtx_code code,
machine_mode mode,
   : (GET_MODE_SIZE (int_mode)
  - GET_MODE_SIZE (narrow_mode_iter)));
  *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
- *pop1 = GEN_INT (n);
+ *pop1 = gen_int_mode (n, narrow_mode_iter);
  return adjusted_code;
}
 }

Can you give this a try?

[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:ad5b757d99b5a121198b79a6a42c1f15ae86a190

commit r14-3085-gad5b757d99b5a121198b79a6a42c1f15ae86a190
Author: Uros Bizjak 
Date:   Tue Aug 8 18:53:51 2023 +0200

i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math
[PR110832]

Also introduce -m[no-]partial-vector-fp-math option to disable trapping
V2SF named patterns in order to avoid generation of partial vector V4SFmode
trapping instructions.

The new option is enabled by default, because even with sanitization,
a small but consistent speed up of 2 to 3% with Polyhedron capacita
benchmark can be achieved vs. scalar code.

Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
vs. scalar code.  This is what clang does by default, as it defaults
to -fno-trapping-math.

PR target/110832

gcc/ChangeLog:

* config/i386/i386.opt (mpartial-vector-fp-math): New option.
* config/i386/mmx.md (movq__to_sse): Do not sanitize
upper part of V2SFmode register with -fno-trapping-math.
(v2sf3): Enable for ix86_partial_vec_fp_math.
(divv2sf3): Ditto.
(v2sf3): Ditto.
(sqrtv2sf2): Ditto.
(*mmx_haddv2sf3_low): Ditto.
(*mmx_hsubv2sf3_low): Ditto.
(vec_addsubv2sf3): Ditto.
(vec_cmpv2sfv2si): Ditto.
(vcondv2sf): Ditto.
(fmav2sf4): Ditto.
(fmsv2sf4): Ditto.
(fnmav2sf4): Ditto.
(fnmsv2sf4): Ditto.
(fix_truncv2sfv2si2): Ditto.
(fixuns_truncv2sfv2si2): Ditto.
(floatv2siv2sf2): Ditto.
(floatunsv2siv2sf2): Ditto.
(nearbyintv2sf2): Ditto.
(rintv2sf2): Ditto.
(lrintv2sfv2si2): Ditto.
(ceilv2sf2): Ditto.
(lceilv2sfv2si2): Ditto.
(floorv2sf2): Ditto.
(lfloorv2sfv2si2): Ditto.
(btruncv2sf2): Ditto.
(roundv2sf2): Ditto.
(lroundv2sfv2si2): Ditto.
* doc/invoke.texi (x86 Options): Document
-mpartial-vector-fp-math option.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110832-1.c: New test.
* gcc.target/i386/pr110832-2.c: New test.
* gcc.target/i386/pr110832-3.c: New test.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #8 from Alexander Monakov  ---
Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a
straightforward wrapper for memcpy:

inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
{
memcpy(p, &x, sizeof(x));
}


We deciding to not inline this, while inlining its get_unaligned counterpart?
Seems bizarre.

[Bug tree-optimization/110941] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2379-gc496d15954c

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110941

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-08
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
  # RANGE [irange] unsigned int [0, 16][20, 20][24, 24][65532,
4294901776][4294967293, +INF] MASK 0xfffc VALUE 0x0
  # ivtmp.18_37 = PHI 
  # RANGE [irange] unsigned short [0, +INF] MASK 0xfffc VALUE 0xfffc
  _21 = (unsigned short) ivtmp.18_37;
  if (_21 <= 24)


Confirmed. The range for _21 is way to conserative.
It should have been: `[0, 16][20, 20][24, 24]`.

[Bug ipa/110378] IPA-SRA for destructors

2023-08-08 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110378

--- Comment #8 from Christophe Lyon  ---
Created attachment 55707
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55707&action=edit
pr110378-1.C.083i.sra

[Bug ipa/110378] IPA-SRA for destructors

2023-08-08 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110378

Christophe Lyon  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #7 from Christophe Lyon  ---
The new test fails on arm:
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++14  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++14  scan-tree-dump-not optimized
"shouldnotexist"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++17  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++17  scan-tree-dump-not optimized
"shouldnotexist"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++20  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++20  scan-tree-dump-not optimized
"shouldnotexist"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++98  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++98  scan-tree-dump-not optimized
"shouldnotexist"

I'm attaching pr110378-1.C.083i.sra

[Bug libstdc++/110860] std::format("{:f}",2e304) invokes undefined behaviour

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110860

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jonathan Wakely  ---
Fixed for 13.3, thanks for the report.

[Bug libstdc++/110862] format out of bounds read on format string "{0:{0}"

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110862

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jonathan Wakely  ---
Fixed for 13.3, thanks for the report.

[Bug libstdc++/110917] std::format_to(int*, ...) fails to compile because of _S_make_span

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110917

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Jonathan Wakely  ---
Fixed for 13.3, thanks for the report.

[Bug libstdc++/110860] std::format("{:f}",2e304) invokes undefined behaviour

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110860

--- Comment #6 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:a059403794add2934961780662e320ba77798a7e

commit r13-7699-ga059403794add2934961780662e320ba77798a7e
Author: Jonathan Wakely 
Date:   Mon Aug 7 15:30:03 2023 +0100

libstdc++: Fix incorrect use of abs and log10 in std::format [PR110860]

The std::formatter implementation for floating-point types uses
__builtin_abs and __builtin_log10 to avoid including all of , but
those functions are not generic. The result of abs(2e304) is -INT_MIN
which is undefined, and then log10(INT_MIN) is NaN. As well as being
undefined, we fail to grow the buffer correctly, and then loop more
times than needed to allocate a buffer and try formatting the value into
it again.

We can use if-constexpr to choose the correct form of log10 to use for
the type, and avoid using abs entirely. This avoids the undefined
behaviour and should mean we only reallocate and retry std::to_chars
once.

libstdc++-v3/ChangeLog:

PR libstdc++/110860
* include/std/format (__formatter_fp::format): Do not use
__builtin_abs and __builtin_log10 with arbitrary floating-point
types.

(cherry picked from commit bb3ceeb6520c13fc5ca08af7d43fbd3f975e72b0)

[Bug libstdc++/110917] std::format_to(int*, ...) fails to compile because of _S_make_span

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110917

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:0f0152a93d15b24ebc7f6c7455baaded6a63fb2e

commit r13-7698-g0f0152a93d15b24ebc7f6c7455baaded6a63fb2e
Author: Jonathan Wakely 
Date:   Mon Aug 7 14:37:25 2023 +0100

libstdc++: Constrain __format::_Iter_sink for contiguous iterators
[PR110917]

We can't write to a span<_CharT> if the contiguous iterator has a value
type that isn't _CharT.

libstdc++-v3/ChangeLog:

PR libstdc++/110917
* include/std/format (__format::_Iter_sink):
Constrain partial specialization for contiguous iterators to
require the value type to be CharT.
* testsuite/std/format/functions/format_to.cc: New test.

(cherry picked from commit c5ea5aecac323e9094e4dc967f54090cb244bc6a)

[Bug libstdc++/110862] format out of bounds read on format string "{0:{0}"

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110862

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:55eb7e92a60adfae43aaf58bb9c81050d39d82c9

commit r13-7697-g55eb7e92a60adfae43aaf58bb9c81050d39d82c9
Author: Jonathan Wakely 
Date:   Thu Aug 3 08:45:43 2023 +0100

libstdc++: Fix past-the-end increment in std::format [PR110862]

At the end of a replacement field we should check that the closing brace
is actually present before incrementing past it.

libstdc++-v3/ChangeLog:

PR libstdc++/110862
* include/std/format (_Scanner::_M_on_replacement_field):
Check for expected '}' before incrementing iterator.
* testsuite/std/format/string.cc: Check "{0:{0}" format string.

(cherry picked from commit 5d87f71bb462ccb78dd3d9d810ea08d96869cb4b)

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||92716

--- Comment #7 from Andrew Pinski  ---
I am 99% sure this is basically PR 92716.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92716
[Bug 92716] -Os doesn't inline byteswap function even though it's a single
instruction

[Bug c++/94162] ICE [neg] bad return type in defaulted <=>

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94162

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #15 from Arthur O'Dwyer  ---
The test case in #c10 seems to be fixed since GCC 12; the rest were fixed since
GCC 11. Should this bug be RESOLVED FIXED at this point?
https://godbolt.org/z/d16x181xh

[Bug tree-optimization/103281] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103281

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|10.5|14.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Andrew Pinski  ---
Fixed for GCC 14 and since this is a missed optimization with generated code,
it is less likely to show up in real code so closing as fixed.

[Bug c++/101943] ICE: Segmentation fault (in cat_tag_for)

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101943

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #4 from Arthur O'Dwyer  ---
This seems to be fixed since GCC 11.1; should it be RESOLVED FIXED at this
point?
https://godbolt.org/z/Tox8f716q

[Bug c++/110948] New: Incorrect -Winvalid-constexpr on virtual defaulted operator==

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110948

Bug ID: 110948
   Summary: Incorrect -Winvalid-constexpr on virtual defaulted
operator==
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: arthur.j.odwyer at gmail dot com
  Target Milestone: ---

Bug #98712 seems related.

// https://godbolt.org/z/eKKxcovEn
struct D;
struct B {
bool operator==(const B&) const;
virtual bool operator==(const D&) const;
};
struct D : B {
bool operator==(const D&) const override = default;
};

GCC alone gives this bogus -Winvalid-constexpr warning:

: In member function 'virtual constexpr bool D::operator==(const D&)
const':
:10:10: warning: call to non-'constexpr' function 'bool
B::operator==(const B&) const' [-Winvalid-constexpr]
   10 | bool operator==(const D&) const override = default;
  |  ^~~~
:5:10: note: 'bool B::operator==(const B&) const' declared here
5 | bool operator==(const B&) const;
  |  ^~~~

This is obviously contrived code, but the symptom might indicate that GCC is
too eager to pretend that the user actually wrote `constexpr`, in situations
where the compiler is merely supposed to make an implicitly defined function
constexpr-friendly if possible.

The AFAICT-analogous situation with `operator=` instead of `operator==`
correctly compiles without warning: https://godbolt.org/z/Mof1qaadr

[Bug tree-optimization/28794] missed optimization with non-COND_EXPR and vrp and comparisions

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28794

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #9 from Andrew Pinski  ---
Fixed finally.

[Bug tree-optimization/103281] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103281
Bug 103281 depends on bug 28794, which changed state.

Bug 28794 Summary: missed optimization with non-COND_EXPR and vrp and 
comparisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28794

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c/110947] Should -Wmissing-variable-declarations not trigger on register variables?

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110947

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||diagnostic
   Last reconfirmed||2023-08-08
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug c/65213] Extend -Wmissing-declarations to variables [i.e. add -Wmissing-variable-declarations]

2023-08-08 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65213

Nick Desaulniers  changed:

   What|Removed |Added

 CC||ndesaulniers at google dot com

--- Comment #6 from Nick Desaulniers  ---
Thanks for implementing this!

I filed a follow up wrt to how this diagnostic interacts with `register`
storage variables.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110947 PTAL

[Bug tree-optimization/103281] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103281

--- Comment #11 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:aadc5c07feb0ab08729ab25d0d896b55860ad9e6

commit r14-3084-gaadc5c07feb0ab08729ab25d0d896b55860ad9e6
Author: Andrew Pinski 
Date:   Mon Aug 7 00:05:21 2023 -0700

VR-VALUES [PR28794]: optimize compare assignments also

This patch fixes the oldish (2006) bug where VRP was not
optimizing the comparison for assignments while handling
them for GIMPLE_COND only.
It just happens to also solves PR 103281 due to allowing
to optimize `c < 1` to `c == 0` and then we get
`(c == 0) == c` (which was handled by r14-2501-g285c9d04).

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

PR tree-optimization/103281
PR tree-optimization/28794

gcc/ChangeLog:

* vr-values.cc
(simplify_using_ranges::simplify_cond_using_ranges_1): Split out
majority to ...
(simplify_using_ranges::simplify_compare_using_ranges_1): Here.
(simplify_using_ranges::simplify_casted_cond): Rename to ...
(simplify_using_ranges::simplify_casted_compare): This
and change arguments to take op0 and op1.
(simplify_using_ranges::simplify_compare_assign_using_ranges_1):
New method.
(simplify_using_ranges::simplify): For tcc_comparison assignments
call
simplify_compare_assign_using_ranges_1.
* vr-values.h (simplify_using_ranges): Add
new methods, simplify_compare_using_ranges_1 and
simplify_compare_assign_using_ranges_1.
Rename simplify_casted_cond and simplify_casted_compare and
update argument types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr103281-1.c: New test.
* gcc.dg/tree-ssa/vrp-compare-1.c: New test.

[Bug tree-optimization/28794] missed optimization with non-COND_EXPR and vrp and comparisions

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28794

--- Comment #8 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:aadc5c07feb0ab08729ab25d0d896b55860ad9e6

commit r14-3084-gaadc5c07feb0ab08729ab25d0d896b55860ad9e6
Author: Andrew Pinski 
Date:   Mon Aug 7 00:05:21 2023 -0700

VR-VALUES [PR28794]: optimize compare assignments also

This patch fixes the oldish (2006) bug where VRP was not
optimizing the comparison for assignments while handling
them for GIMPLE_COND only.
It just happens to also solves PR 103281 due to allowing
to optimize `c < 1` to `c == 0` and then we get
`(c == 0) == c` (which was handled by r14-2501-g285c9d04).

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

PR tree-optimization/103281
PR tree-optimization/28794

gcc/ChangeLog:

* vr-values.cc
(simplify_using_ranges::simplify_cond_using_ranges_1): Split out
majority to ...
(simplify_using_ranges::simplify_compare_using_ranges_1): Here.
(simplify_using_ranges::simplify_casted_cond): Rename to ...
(simplify_using_ranges::simplify_casted_compare): This
and change arguments to take op0 and op1.
(simplify_using_ranges::simplify_compare_assign_using_ranges_1):
New method.
(simplify_using_ranges::simplify): For tcc_comparison assignments
call
simplify_compare_assign_using_ranges_1.
* vr-values.h (simplify_using_ranges): Add
new methods, simplify_compare_using_ranges_1 and
simplify_compare_assign_using_ranges_1.
Rename simplify_casted_cond and simplify_casted_compare and
update argument types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr103281-1.c: New test.
* gcc.dg/tree-ssa/vrp-compare-1.c: New test.

[Bug c/110947] New: Should -Wmissing-variable-declarations not trigger on register variables?

2023-08-08 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110947

Bug ID: 110947
   Summary: Should -Wmissing-variable-declarations not trigger on
register variables?
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ndesaulniers at google dot com
  Target Milestone: ---

Via:
https://lore.kernel.org/all/CAKwvOd=8kxkD9p+WW-F047ShN=r32slyyfpgzhydw3bxtdd...@mail.gmail.com/

I'm looking to enable -Wmissing-variable-declarations in the Linux kernel
(gcc-14 just gained support for this warning).

In one case I noticed that:

register unsigned long current_stack_pointer asm("rsp");

declared in a header would trigger this:

> no previous declaration for 'current_stack_pointer' 
> [-Wmissing-variable-declarations]

So we could add:

extern unsigned long current_stack_pointer;

before the

register unsigned long current_stack_pointer asm("rsp");

but that seems excessive. Perhaps we can simply not diagnose in that case?

Filed this bug report against clang as well:
https://github.com/llvm/llvm-project/issues/64509

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-08-08 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #15 from Martin Uecker  ---
GCC seems to allocate enough for sizeof(struct foo) + n * sizeof(char) but not
for sizeof(struct { int a; char b; char t[n]; }).

[Bug fortran/109684] compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-08 Thread pault at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109684

--- Comment #13 from Paul Thomas  ---
(In reply to Steve Kargl from comment #12)
> On Mon, Aug 07, 2023 at 10:04:54PM +, kargl at gcc dot gnu.org wrote:
> > 
> > diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
> > index 3cd470ddcca..b0bb8bc1471 100644
> > --- a/gcc/fortran/resolve.cc
> > +++ b/gcc/fortran/resolve.cc
> > @@ -17966,7 +17966,9 @@ resolve_types (gfc_namespace *ns)
> > 
> >for (n = ns->contained; n; n = n->sibling)
> >  {
> > -  if (gfc_pure (ns->proc_name) && !gfc_pure (n->proc_name))
> > +  if (gfc_pure (ns->proc_name)
> > + && !gfc_pure (n->proc_name)
> > + && !n->proc_name->attr.artificial)
> > gfc_error ("Contained procedure %qs at %L of a PURE procedure must "
> >"also be PURE", n->proc_name->name,
> >&n->proc_name->declared_at);
> > 
> > pault, dos the above look correct?
> > 
> 
> This patch passes a regression test with no new regressions
> on x86_64-*-*freebsd.

Hi Steve,

That will certainly fix the bug. An alternative crosses my mind, which is to
check the pureness of the final routines as the wrapper is being built and give
the wrapper the pure attribute if they are all pure.

Cheers

Paul

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #6 from Dave Rodgman  ---
Under clang, we see that mbedtls_xor being inlined, or not, causes an
equivalent perf difference. Note that mbedtls_xor is inline in the gcc O2
version and not in the gcc Os version.

Not inline mbedtls_xor, -Os clang:
  AES-XTS-128  : 834549 KiB/s,  0 cycles/byte
  AES-XTS-256  : 674383 KiB/s,  0 cycles/byte

Inline mbedtls_xor, -Os clang:
  AES-XTS-128  :2664799 KiB/s,  0 cycles/byte
  AES-XTS-256  :2278008 KiB/s,  0 cycles/byte


However, if I mark mbedtls_xor as static inline (actually, for testing
purposes, I created a static inline copy in aes.c), gcc still does not inline
it. I am not sure why. If I use "__attribute__((always_inline))" gcc will
inline it.

So it looks like gcc is overly averse to inlining this function, or is getting
the cost/benefit of inline-ing wrong here?

For 3/5 cases, we know at compile time that n == 16, so the function will
compile to four instructions:

139c:   3dc00021ldr q1, [x1]
13a0:   3dc00040ldr q0, [x2]
13a4:   6e211c00eor v0.16b, v0.16b, v1.16b
13a8:   3d80str q0, [x0]

so it does seem surprising that gcc doesn't want to inline this.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #14 from Jonathan Wakely  ---
Ah, and the patch will pessimize cases like str.assign(str2.begin(),
str2.end()) where
str.capacity() >= str2.capacity().

The current implementation in terms of replace(begin(), end(), first, last)
handles that, because replace is overloaded for string iterators and pointers.

Also solvable, but it's getting more complicated.

[Bug target/110899] RFE: Attributes preserve_most and preserve_all

2023-08-08 Thread matz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899

--- Comment #11 from Michael Matz  ---
(In reply to Florian Weimer from comment #10)
> (In reply to Michael Matz from comment #9)
> > > > I don't see how that helps.  Imagine a preserve_all function foo that 
> > > > calls
> > > > printf.  How do you propose that 'foo' saves all parts of the SSE 
> > > > registers,
> > > > even those that aren't invented yet, or those that can't be touched by 
> > > > the
> > > > current ISA?  (printf might clobber all of these)
> > > 
> > > Vector registers are out of scope for this.
> > 
> > Why do you say that?  From clang: "Furthermore it also preserves all
> > floating-point registers (XMMs/YMMs)."  (for preserve_all, but this
> > bugreport does include that variant of the attribute).
> 
> Ugh, I preferred not to look at it because it's likely that the Clang
> implementation is broken (not future-proof).

I see, then we need to make clear that we aren't going to do anything about
preserve_all with clangs wording, in context of this report.

FWIW, in my implementation referred to above I chose to also have two variants:
one saving/restoring only the SSE2 parts of *mm8-*mm15 (i.e. xmm8-xmm15),
and one guaranteering to not clobber anything of *mm8-*mm15.  (No guarantees
about the *mm16 upwards).  The first variant can call foreign functions,
the second variant simply is allowed to only call functions that also give
that guarantee.

(There is also the question of mask registers, the clang docu doesn't talk
about them.  And I still would like to know the reason for the seemingly
arbitrary choice to leave some regs call clobbered for aarch64).

> > > But lets look at APX. If printf is recompiled to use APX, then it will
> > > clobber the extended register file. If we define __preserve_most__ the way
> > > we do in my psABI proposal (i.e., *not* as everything but %r11), the
> > > extended APX registers are still caller-saved.
> > 
> > Right, for preserve_most _with your wording_ it works out.  preserve_all
> > or preserve_most with clang wording doesn't.
> 
> In glibc, we already use a full context switch with XSAVE for the dynamic
> loader trampoline. As far as I understand it, it's not future-proof. The
> kernel could provide an interface that is guaranteed to work because it only
> enables those parts of the register file that it can context-switch. I can
> probably get the userspace-only implementation into glibc, but the kernel
> interface seems unlikely. We'd also have to work out the interaction of
> preserve_all and unwinding, setjmp etc.; not sure if there is a proper
> solution for that.

There are a couple possibilities to implement a halfway solution for this,
via XSAVE and friends, or via runtime dispatching dependend on current CPU
(e.g. provide a generic save/restore-stuff function in libgcc).  The problem
will always be where the memory for this save/restore pattern should come from,
its size isn't constant at compile time.  That's also solvable, but it's 
becoming more and more hairy.

That's why I chose to simply disallow calling foreign functions from those
that want to give very strict guarantees.  We could do the same for
preserve_all, if we absolutely want to have it.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #5 from Dave Rodgman  ---
(In reply to Richard Biener from comment #3)
> Note you shouldn't use -Os if you care about performance.  GCC is quite
> reasonable with code size increases at -O2 (as compared to other compilers).
> Instead I suggest you use -flto with -O2 to decrease the size of the final
> executable/library and give GCC better knowledge on unit growth.

Understood, but I think it depends on the magnitude of the perf difference. I'd
expect a smallish perf drop, say 10%, from -Os to be reasonable, but I'd
consider a 3x perf difference to be a compiler issue.(In reply to Alexander
Monakov from comment #2)
> So basically missed inlining at -Os, even memcpy wrappers are not inlined.
> 
> Can you provide a reproducible testcase?
> 
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
> mistake anyway (if they are really meant to be non-static inlines, shouldn't
> there be a comment?)
> 
> Does making them 'static inline' rectify the problem?

The easiest way to reproduce is to use the benchmark tool:

make programs/test/benchmark CC=gcc CFLAGS="-Os"
programs/test/benchmark aes_xts

I don't have a compact reproducer, sorry.

[Bug tree-optimization/49955] Fails to do partial basic-block SLP

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #7 from Richard Biener  ---
This is fixed now.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 49955, which changed state.

Bug 49955 Summary: Fails to do partial basic-block SLP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/49955] Fails to do partial basic-block SLP

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:d9f3ea61fe36e2de3354b90b65ff8245099114c9

commit r14-3078-gd9f3ea61fe36e2de3354b90b65ff8245099114c9
Author: Richard Biener 
Date:   Mon Aug 7 14:44:20 2023 +0200

tree-optimization/49955 - BB reduction with odd number of lanes

The following enhances BB reduction vectorization to support
vectorizing only a subset of the lanes, keeping the rest as
scalar ops.  For now we try to make the number of lanes even
by leaving alone the "last" lane.  That's because SLP discovery
with all lanes will fail too soon to get us any hint on which
lane to strip and likewise we don't know what vector modes the
target supports so restricting ourselves to power-of-two or
other cases isn't easy.

This is enough to get at the vectorization opportunity for the
testcase in the PR - albeit with the chosen lanes not optimal
but at least vectorizable.

PR tree-optimization/49955
* tree-vectorizer.h (_slp_instance::remain_stmts): New.
(SLP_INSTANCE_REMAIN_STMTS): Likewise.
* tree-vect-slp.cc (vect_free_slp_instance): Release
SLP_INSTANCE_REMAIN_STMTS.
(vect_build_slp_instance): Make the number of lanes of
a BB reduction even.
(vectorize_slp_instance_root_stmt): Handle unvectorized
defs of a BB reduction.

* gfortran.dg/vect/pr49955.f: New testcase.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Dave Rodgman  changed:

   What|Removed |Added

   Keywords|missed-optimization |
  Component|ipa |other
 Target|aarch64 |

--- Comment #4 from Dave Rodgman  ---
>From a quick test, it doesn't look like the unaligned access inlining is the
issue:

Not static inline, -Os:
  AES-XTS-128  : 853799 KiB/s,  0 cycles/byte
  AES-XTS-256  : 749919 KiB/s,  0 cycles/byte

Static inline, -Os:

  AES-XTS-128  : 885380 KiB/s,  0 cycles/byte
  AES-XTS-256  : 752995 KiB/s,  0 cycles/byte

Not static inline, -O2:
  AES-XTS-128  :2822656 KiB/s,  0 cycles/byte
  AES-XTS-256  :2425721 KiB/s,  0 cycles/byte

Static inline, -O2:
  AES-XTS-128  :2692321 KiB/s,  0 cycles/byte
  AES-XTS-256  :2446391 KiB/s,  0 cycles/byte

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Richard Biener  changed:

   What|Removed |Added

  Component|other   |ipa
 Target||aarch64
   Keywords||missed-optimization
 CC||marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
Note you shouldn't use -Os if you care about performance.  GCC is quite
reasonable with code size increases at -O2 (as compared to other compilers). 
Instead I suggest you use -flto with -O2 to decrease the size of the final
executable/library and give GCC better knowledge on unit growth.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Biener  ---
Fixed.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:31ec413098bd334115aff73fc755e49afd3ac371

commit r14-3076-g31ec413098bd334115aff73fc755e49afd3ac371
Author: Richard Biener 
Date:   Tue Aug 8 12:46:42 2023 +0200

tree-optimization/110924 - fix vop liveness for noreturn const CFG parts

The virtual operand live problem used by sinking assumes we have
virtual uses at each end point of the CFG but as shown in the PR
this isn't true for parts for example ending in __builtin_unreachable.
The following removes the optimization made possible by this and
now requires marking backedges.

PR tree-optimization/110924
* tree-ssa-live.h (virtual_operand_live): Update comment.
* tree-ssa-live.cc (virtual_operand_live::get_live_in): Remove
optimization, look at each predecessor.
* tree-ssa-sink.cc (pass_sink_code::execute): Mark backedges.

* gcc.dg/torture/pr110924.c: New testcase.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
So basically missed inlining at -Os, even memcpy wrappers are not inlined.

Can you provide a reproducible testcase?

Note that inline functions in mbedtls/library/alignment.h all miss the 'static'
qualifier, which affects inlining decisions, and looks like a mistake anyway
(if they are really meant to be non-static inlines, shouldn't there be a
comment?)

Does making them 'static inline' rectify the problem?

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #13 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #11)
> With this change we make a single allocation+copy and then do a cheap move
> assignment:
> 
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -1711,4 +1711,4 @@
>  basic_string&
>  assign(_InputIterator __first, _InputIterator __last)
> -{ return this->replace(begin(), end(), __first, __last); }
> +{ return *this = basic_string(__first, __last, get_allocator()); }

Except it's not cheap for C++98 because it's a copy, so we're back to
allocating and copying everything twice. That's solvable though.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #12 from Jonathan Wakely  ---
That would also benefit this overload:

  basic_string&
  assign(initializer_list<_CharT> __l)
  { return this->assign(__l.begin(), __l.size()); }

That currently goes via replace(begin(), end(), l.begin(), l.end()) but we know
that an initializer_list cannot possibly alias the string's contents.

But we can do even better and avoid any copy if __l.size() <= capacity():

  basic_string&
  assign(initializer_list<_CharT> __l)
  {
const size_type __n = __l.size();
if (__n > capacity())
  *this = basic_string(__l.begin(), __l.end(), get_allocator());
else
  {
if (__n)
  _S_copy(_M_data(), __l.begin(), __n);
_M_set_length(__n);
  }
return *this;
  }

[Bug modula2/110779] SysClock can not read the clock

2023-08-08 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110779

--- Comment #9 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #8 from Gaius Mulley  ---
> Created attachment 55703
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55703&action=edit
> Proposed fix (addendum)
>
> Here is a patch which tests for all the functions and structs in wrapclock.cc.

That patch does restore i386-pc-solaris2.11 bootstrap, thanks.

Testsuite results are good, with two exceptions: for 32-bit compilation
(only, 64-bit is fine), two tests reliably time out at all optimization
levels:

WARNING: gm2/iso/run/pass/m2date.mod execution,  -O  program timed out.

WARNING: gm2/iso/run/pass/testclock2.mod execution,  -O  program timed out.

Running them under truss shows that the last system call each is

5032:   clock_settime(3, 0x080975F0)Err#1 EPERM [sys_time]

Running testclock2.mod under gdb shows

Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 1 (LWP 1)]
0x08064ad4 in daysInMonth (year=42582828, month=7)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:225
225 BEGIN
(gdb) bt
#0  0x08064ad4 in daysInMonth (year=42582828, month=7)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:225
#1  0x08065152 in daysInYear (year=42582828, month=, 
day=)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:132
#2  ExtractDate (day=@0x808748c: 0, month=@0x8087484: 0, year=@0x8087480: 0, 
days=53508608997914)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:152
#3  m2iso_SysClock_GetClock (userData=...)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:204
#4  0x0805f652 in _M2_testclock2_init ()
#5  0x08068d7a in m2pim_M2Dependent_ConstructModules (
applicationmodule=0x805c2b4, libname=0x805c2bf, 
overrideliborder=0x805c2d8, argc=1, argv=0xfeffdafc, envp=0xfeffdb04)
at
/vol/gcc/src/hg/master/local/libgm2/libm2pim/../../gcc/m2/gm2-libs/M2Dependent.mod:809
#6  0x0805fca9 in _M2_init ()
#7  0x0805fcee in main ()

That year value seems very strange.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #1 from Dave Rodgman  ---
Disassembly under -Os:

139c :
139c:   a9b67bfdstp x29, x30, [sp, #-160]!
13a0:   910003fdmov x29, sp
13a4:   a9046bf9stp x25, x26, [sp, #64]
13a8:   aa0003f9mov x25, x0
13ac:   9000adrpx0, 0 <__stack_chk_guard>
13b0:   a90153f3stp x19, x20, [sp, #16]
13b4:   f940ldr x0, [x0]
13b8:   a9025bf5stp x21, x22, [sp, #32]
13bc:   2a0103f6mov w22, w1
13c0:   a90363f7stp x23, x24, [sp, #48]
13c4:   a90573fbstp x27, x28, [sp, #80]
13c8:   f941ldr x1, [x0]
13cc:   f9004fe1str x1, [sp, #152]
13d0:   d281mov x1, #0x0// #0
13d4:   710006dfcmp w22, #0x1
13d8:   54000c28b.hi155c   //
b.pmore
13dc:   d1004041sub x1, x2, #0x10
13e0:   aa0203f3mov x19, x2
13e4:   b27c4fe0mov x0, #0xf0   //
#16777200
13e8:   eb3fcmp x1, x0
13ec:   54000bc8b.hi1564   //
b.pmore
13f0:   9101a3f5add x21, sp, #0x68
13f4:   aa0303e2mov x2, x3
13f8:   aa0403f8mov x24, x4
13fc:   aa0503f7mov x23, x5
1400:   aa1503e3mov x3, x21
1404:   91048320add x0, x25, #0x120
1408:   52800021mov w1, #0x1// #1
140c:   9400bl  1210 
1410:   2a0003f4mov w20, w0
1414:   35000540cbnzw0, 14bc 
1418:   520002dbeor w27, w22, #0x1
141c:   d344fe7alsr x26, x19, #4
1420:   1200037band w27, w27, #0x1
1424:   92400e73and x19, x19, #0xf
1428:   910223fcadd x28, sp, #0x88
142c:   d100075asub x26, x26, #0x1
1430:   b100075fcmn x26, #0x1
1434:   54000541b.ne14dc   //
b.any
1438:   b4000433cbz x19, 14bc 
143c:   710002dfcmp w22, #0x0
1440:   d10042fbsub x27, x23, #0x10
1444:   9101e3faadd x26, sp, #0x78
1448:   aa1303e2mov x2, x19
144c:   9a95035acselx26, x26, x21, eq  // eq = none
1450:   aa1b03e1mov x1, x27
1454:   910223f5add x21, sp, #0x88
1458:   aa1703e0mov x0, x23
145c:   9400bl  0 
1460:   d2800217mov x23, #0x10  // #16
1464:   aa1303e3mov x3, x19
1468:   aa1a03e2mov x2, x26
146c:   aa1803e1mov x1, x24
1470:   aa1503e0mov x0, x21
1474:   9400bl  0 
1478:   cb1302e3sub x3, x23, x19
147c:   8b130342add x2, x26, x19
1480:   8b130361add x1, x27, x19
1484:   8b1302a0add x0, x21, x19
1488:   9400bl  0 
148c:   aa1503e3mov x3, x21
1490:   aa1503e2mov x2, x21
1494:   2a1603e1mov w1, w22
1498:   aa1903e0mov x0, x25
149c:   9400bl  1210 
14a0:   2a0003f4mov w20, w0
14a4:   35c0cbnzw0, 14bc 
14a8:   aa1703e3mov x3, x23
14ac:   aa1a03e2mov x2, x26
14b0:   aa1503e1mov x1, x21
14b4:   aa1b03e0mov x0, x27
14b8:   9400bl  0 
14bc:   9000adrpx0, 0 <__stack_chk_guard>
14c0:   f940ldr x0, [x0]
14c4:   f9404fe2ldr x2, [sp, #152]
14c8:   f941ldr x1, [x0]
14cc:   eb010042subsx2, x2, x1
14d0:   d281mov x1, #0x0// #0
14d4:   54000500b.eq1574   //
b.none
14d8:   9400bl  0 <__stack_chk_fail>
14dc:   f100027fcmp x19, #0x0
14e0:   1a9f07e0csetw0, ne  // ne = any
14e4:   6a1b001ftst w0, w27
14e8:   54e0b.eq1504   //
b.none
14ec:   b5dacbnzx26, 1504 
14f0:   a94687e0ldp x0, x1, [sp, #104]
14f4:   a90787e0stp x0, x1, [sp, #120]
14f8:   aa1503e1mov x1, x21
14fc:   aa1503e0mov x0, x21
1500:   97fffb63bl  28c 
1504:   aa1503e2mov x2, x2

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #11 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #6)
> And _M_replace_dispatch creates a new copy anyway:
> 
>   _M_replace_dispatch(const_iterator __i1, const_iterator __i2,
> _InputIterator __k1, _InputIterator __k2,
> std::__false_type)
>   {
>   // _GLIBCXX_RESOLVE_LIB_DEFECTS
>   // 2788. unintentionally require a default constructible allocator
>   const basic_string __s(__k1, __k2, this->get_allocator());
>   const size_type __n1 = __i2 - __i1;
>   return _M_replace(__i1 - begin(), __n1, __s._M_data(),
> __s.size());
>   }

When distance(k1, k2) > this->capacity() this function will make two copies of
[k1,k2) and allocate twice. So even with the checks for disjunct strings, we do
a lot more work than the copy construction benchmarks.

With this change we make a single allocation+copy and then do a cheap move
assignment:

--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1711,4 +1711,4 @@
 basic_string&
 assign(_InputIterator __first, _InputIterator __last)
-{ return this->replace(begin(), end(), __first, __last); }
+{ return *this = basic_string(__first, __last, get_allocator()); }

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #10 from Jonathan Wakely  ---
(In reply to Jan Schultke from comment #8)
> From what I could read in the `char_traits::move` code that presumably gets
> called, this function explicitly tests for overlap between the memory
> regions, and dispatches to cheap functions if possible. The input size was 8
> MiB, so it is unlikely that the overhead from this overlap detection is
> contributing in any relevant capacity.

I think you're reading it wrong. The overlap detection in char_traits::move is
only for constant evaluation, because we can't use memmove.

The overlap detection that matters here is in _M_replace, long before we use
char_traits::move.

> Basically, due to this overlap testing, `assign` SHOULD be just as fast as
> other methods if there is no overlap (and in this case, there clearly is
> none). However, it is 14x slower, so something is off.
> 
> Either I haven't followed the logic correctly, or there is a mistake in this
> dispatching logic which leads to much worse performance for .assign().

Or the optimizers don't optimize away all the checks in _M_replace and so we
don't unroll everything to a simple memmove, but do all the runtime checks
every time. Which is what I think is happening.

[Bug other/110946] New: 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Bug ID: 110946
   Summary: 3x perf regression with -Os on M1 Pro
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dave.rodgman at arm dot com
  Target Milestone: ---

Please see
https://github.com/Mbed-TLS/mbedtls/pull/7784/commits/6cfd9b54ae0d06451c1a46a10e57fa099878bb03
for details.

On M1 Pro, under -Os, we see a 3.1x performance regression for AES-XTS, which
can be solved by forcing -O2 for two functions. For comparison, clang -Os gives
around 5% perf regression (which is more in the ballpark that I'd expect). So
it looks to me like gcc is getting something wrong when compiling these two
functions with -Os.

We measured a smaller but still significant difference (20-25%) on x86-64.

Affects all versions of gcc that I was able to test with (9 .. 12).

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #9 from Jonathan Wakely  ---
That improves things:


Benchmark  Time CPU   Iterations

BenchmarkInit  38136 ns38069 ns18243
BenchmarkAssignmentOp  45476 ns45382 ns15038
BenchmarkAssign45767 ns45653 ns15457
BenchmarkCopy  56617 ns56515 ns11721

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #8 from Chenghui Pan  ---
(In reply to Stefan Schulze Frielinghaus from comment #6)
> I tried to reproduce it with a cross compiler while using the reproducer
> from PR110867 without getting an ICE.  Can you attach a pre processed source
> file and a corresponding gcc invocation?

I attach the a preprocessed file that ICE happening when bootstrapping. Sorry
for not adding it at first. 

The command that uses to compile this file is:
/home/panchenghui/upstream-unmodded/stuff/gcc/./prev-gcc/xg++ -save-temps
-B/home/panchenghui/upstream-unmodded/stuff/gcc/./prev-gcc/
-B/home/panchenghui/upstream-unmodded/install/loongarch64-unknown-linux-gnu/bin/
-nostdinc++
-B/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs

-I/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/include/loongarch64-unknown-linux-gnu

-I/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/include
 -I/home/panchenghui/upstream-unmodded/gcc/libstdc++-v3/libsupc++
-L/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/src/.libs
-L/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
 -fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC-fno-exceptions
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -fno-PIE -I. -I.
-I/home/panchenghui/upstream-unmodded/gcc/gcc
-I/home/panchenghui/upstream-unmodded/gcc/gcc/.
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../include 
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libcpp/include
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libcody 
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libdecnumber
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libdecnumber/dpd
-I../libdecnumber -I/home/panchenghui/upstream-unmodded/gcc/gcc/../libbacktrace
  -o tree-cfgcleanup.o -MT tree-cfgcleanup.o -MMD -MP -MF
./.deps/tree-cfgcleanup.TPo
/home/panchenghui/upstream-unmodded/gcc/gcc/tree-cfgcleanup.cc

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #7 from Chenghui Pan  ---
Created attachment 55706
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55706&action=edit
preprocessed file of gcc/tree-cfgcleanup.cc, ICE occurred in this place.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #8 from Jan Schultke  ---
(In reply to Jonathan Wakely from comment #4)
> Please provide the testcase in a usable form, not just a link to an external
> site (which uses its own custom benchmark macros). This is requested at
> https://gcc.gnu.org/

Thanks, I will remember to do that in the future.

> This is not equivalent to the other forms of copying in the benchmark,
> because string::assign has to handle possible aliasing. It's valid to do
> things like str.assign(str.data()+1, str.data()+2).

>From what I could read in the `char_traits::move` code that presumably gets
called, this function explicitly tests for overlap between the memory regions,
and dispatches to cheap functions if possible. The input size was 8 MiB, so it
is unlikely that the overhead from this overlap detection is contributing in
any relevant capacity.

Basically, due to this overlap testing, `assign` SHOULD be just as fast as
other methods if there is no overlap (and in this case, there clearly is none).
However, it is 14x slower, so something is off.

Either I haven't followed the logic correctly, or there is a mistake in this
dispatching logic which leads to much worse performance for .assign().

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #7 from Jonathan Wakely  ---
(except with correct allocator propagation)

  1   2   >