[Bug middle-end/116230] Testsuite of liborcus fails with GCC 14 on i586 since r14-1891-g154c69039571c6

2024-08-06 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116230

--- Comment #6 from Martin Jambor  ---
Right, when I saw the equality test of doubles I thought it must be the test. I
forgot about the discrepancy of representation in memory and in the FPU. 
Thanks a lot for taking a look.

[Bug middle-end/116230] Testsuite of liborcus fails with GCC 14 on i586 since r14-1891-g154c69039571c6

2024-08-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116230

--- Comment #1 from Martin Jambor  ---
Created attachment 58830
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58830=edit
minimized test-case

I have tried to minimize the testcase with cvise and came up with the
attached file.  However, note that unlike the original, the minimized
version fails with a segfault, not an abort.  But it starts failing
with the same revision.

mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$
/home/mjambor/gcc/13/inst/bin/g++ -O2 -m32 -w min.ii
mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$ ./a.out  && echo OK
OK
mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$
/home/mjambor/gcc/14/inst/bin/g++ -O2 -m32 -w min.ii
mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$ ./a.out  && echo OK
' BFMLPSVZ -6000
Segmentation fault (core dumped)

[Bug middle-end/116230] New: Testsuite of liborcus fails with GCC 14 on i586 since r14-1891-g154c69039571c6

2024-08-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116230

Bug ID: 116230
   Summary: Testsuite of liborcus fails with GCC 14 on i586 since
r14-1891-g154c69039571c6
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: vmakarov at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: i586-linux-gnu

Created attachment 58829
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58829=edit
pre-processed source

OpenSUSE Tumbleweed package of liborcus does not build because of a
failed test in its test-suite.  The failure looks eerily like a
micompilation to me, a comparison of two doubles ends up taking the
branch for the case when they are not equal even when a follow-up
check with memcmp (added by me) claims they are bit-identical.

I am attaching pre-processed source that I extracted from the
test-suite and which, when compiled with GCC 13 when on a x86_64-linux
host with options -O2 -m32 to get a 32-bit binary, runs fine but
asserts when build with GCC 14 and the same options.  I could bisect
this behavior to r14-1891-g154c69039571c6 (Vladimir N. Makarov: RA:
Ignore conflicts for some pseudos from insns throwing a final
exception).

mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$
/home/mjambor/gcc/13/inst/bin/g++ -O2 -m32 orig.ii
mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$
LD_LIBRARY_PATH=/home/mjambor/gcc/13/inst/lib64:/home/mjambor/gcc/13/inst/lib
./a.out && echo OK
OK
mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$
/home/mjambor/gcc/14/inst/bin/g++ -O2 -m32 orig.ii  
mjambor@fischer:/home/mjambor/gcc/mine/test/liborcus$
LD_LIBRARY_PATH=/home/mjambor/gcc/14/inst/lib64:/home/mjambor/gcc/14/inst/lib
./a.out && echo OK
ehm?
'-6.e3' was expected to be parsed as (-6000 ), but the parser parsed it as
(-6000
a.out: tst.cc:298: void test_generic_number_parsing(): Assertion
`run_checks(checks)' failed.
Aborted (core dumped)

[Bug ipa/115815] [13/14/15 Regression] ICE: in purge_all_uses, at ipa-param-manipulation.cc:632 with -O2 -flto and incorrect usage of attribute destructor

2024-07-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115815

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
   Keywords|needs-bisection |

--- Comment #4 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6a5i4bc6k@virgil.suse.cz/T/#u

[Bug target/58416] Incorrect x87-based union copying code

2024-07-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

Martin Jambor  changed:

   What|Removed |Added

  Attachment #58724|0   |1
is obsolete||

--- Comment #24 from Martin Jambor  ---
Created attachment 58752
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58752=edit
Another wip patch

To give sort-of a status update, this is the current state of my WIP fix.  It
clearly needs more thinking about the reverse storage cases and still fails
g++.dg/vect/pr64410.cc - but I'm afraid that would require a target hook to
identify problematic FP modes.  Otherwise it passes bootstrap and testsuite on
x86_64-linux.

[Bug target/58416] Incorrect x87-based union copying code

2024-07-23 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

--- Comment #19 from Martin Jambor  ---
(In reply to Richard Biener from comment #18)
> (In reply to Martin Jambor from comment #15)
> > Created attachment 58724 [details]
> > simple (wip) fix
> > 
> > I'm wondering whether just simply something like this would not be enough. 
> > I have looked at total scalarization and we will not replace a type found in
> > the IL with another one there. Similarly, only propagation through
> > assignments fiddles with existing types (when they are not aggregate) only
> > when propagating from RHS to LHS and not the other way round.
> > 
> > If we want to be more aggressive, we can add a flag when the new predicate
> > fails but there is a good bitwise_type_for_mode and then when the flag is
> > set, use that type instead in analyze_access_subtree.
> > 
> > Note that so far I have only tested the attached patch with
> >   make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c"
> >   make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c"
> >   make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c"
> > 
> > I'll have a look at full test results tomorrow morning.
> 
> I think it should be OK to fix the wrong-code issue in this bug but it
> prevents scalarization completely, likely at least failing
> gcc.dg/tree-ssa/sra-6.c

I did check that even yesterday night and it passes but g++.dg/vect/pr64410.cc
and gcc.dg/tree-ssa/pr32964.c fail.  I'm investigating.

> 
> Note the name of the predicate should probably reflect the problem,
> like can_use_type_as_storage_for (tree storage_type, tree value_type)
> or so.

Yes, I'm aware it is less than ideal so far.  I was not sure what the final
version would be like.  THe version from yesterday unnecessarily pessimizes
cases where the "other" type is a structure containing just the float or one
element array, or a combination, that at least should be addresses.

> 
> With my patch I trieds to instead use a bit-pattern preserving load
> similar as what we do with non-mode precision integer prevailing types.
> 
> Note I plan to add a new target hook to identify problematic FP modes.

That would be super-helpful.

[Bug target/58416] Incorrect x87-based union copying code

2024-07-22 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

--- Comment #15 from Martin Jambor  ---
Created attachment 58724
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58724=edit
simple (wip) fix

I'm wondering whether just simply something like this would not be enough.  I
have looked at total scalarization and we will not replace a type found in the
IL with another one there. Similarly, only propagation through assignments
fiddles with existing types (when they are not aggregate) only when propagating
from RHS to LHS and not the other way round.

If we want to be more aggressive, we can add a flag when the new predicate
fails but there is a good bitwise_type_for_mode and then when the flag is set,
use that type instead in analyze_access_subtree.

Note that so far I have only tested the attached patch with
  make -k check-gcc RUNTESTFLAGS="tree-ssa.exp=*sra*.c"
  make -k check-g++ RUNTESTFLAGS="dg.exp=*sra*.c"
  make -k check-gcc RUNTESTFLAGS="dg.exp=*sra*.c"

I'll have a look at full test results tomorrow morning.

[Bug rtl-optimization/115876] [15 regression] ext-dce.cc has ubsan issues; shifting negative values

2024-07-22 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115876

--- Comment #11 from Martin Jambor  ---
Our weekend ubsan bootstrap and test (of revision
r15-2173-ge0d997e913f811) still reported failures when compiling
testcase gfortran.dg/ieee/large_1.f90 (at -O2 and higher).

[Bug ipa/108007] [12 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-07-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #26 from Martin Jambor  ---
And on 12 this is fixed.

[Bug middle-end/115967] New: ubsan: shift exponent 64 is too large for 64-bit type HOST_WIDE_INT in ext-dce.cc on line 600 since r15-1901-g98914f9eba5f19

2024-07-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115967

Bug ID: 115967
   Summary: ubsan: shift exponent 64 is too large for 64-bit type
HOST_WIDE_INT in ext-dce.cc on line 600 since
r15-1901-g98914f9eba5f19
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: law at gcc dot gnu.org
Blocks: 63426
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

Undefined behavior sanitizer reports a failure when running Fortran
testcase gfortran.dg/ieee/large_1.f90 at -O2 and higher:

  /home/mjambor/gcc/mine/src/gcc/ext-dce.cc:600:15: runtime error: shift
exponent 64 is too large for 64-bit type 'long unsigned int'
  /home/mjambor/gcc/mine/src/gcc/ext-dce.cc:404:23: runtime error: left shift
of negative value -1
  FAIL: gfortran.dg/ieee/large_1.f90   -O2  (test for excess errors)

The failure is present since the introduction of the source file
ext-dce.cc with commit r15-1901-g98914f9eba5f19 (Jeff Law:
[to-be-committed][RISC-V][V3] DCE analysis for extension elimination)

One way to reproduce the issue is to bootstrap GCC with Fortran
enabled and with --with-build-config=bootstrap-ubsan and then run the
test case as usual.

It is however much easier to (on an x86_64-linux at least) simply
apply the following patch and then run
  make -k check-gfortran RUNTESTFLAGS="ieee.exp=large_1.f90"

--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -597,6 +597,7 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap
live_tmp)
  bit = subreg_lsb (y).to_constant ();
  if (dst_mask)
{
+ gcc_assert (bit < 64);
  dst_mask <<= bit;
  if (!dst_mask)
dst_mask = -0x1ULL;


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63426
[Bug 63426] [meta-bug] Issues found with -fsanitize=undefined

[Bug target/109130] [13/14/15 Regression] 464.h264ref regressed by 6.5% on a Neoverse-N1 CPU with PGO, LTO, -Ofast and -march=native

2024-07-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109130

Martin Jambor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Martin Jambor  ---
(In reply to Andrew Pinski from comment #5)
> This looks like it was fixed already, back to 355 which is GCC 13 .

Indeed.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-07-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 109130, which changed state.

Bug 109130 Summary: [13/14/15 Regression] 464.h264ref regressed by 6.5% on a 
Neoverse-N1 CPU with PGO, LTO, -Ofast and -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109130

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/115739] New: Building cross-compiler to sparc-wrs-vxworks fails since r15-1594-g55947b32c38a40

2024-07-01 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115739

Bug ID: 115739
   Summary: Building cross-compiler to sparc-wrs-vxworks fails
since r15-1594-g55947b32c38a40
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: linkw at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: sparc-wrs-vxworks

Starting with r15-1594-g55947b32c38a40 (Kewen Lin: Replace
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook mode_for_floating_type),
building a cross compiler from x86_64-linux to sparc-wrs-vxworks fails
with error:

/home/mjambor/gcc/mine/src/gcc/config/sparc/sparc.cc:13986:12: error:
‘SPARC_LONG_DOUBLE_TYPE_SIZE’ was not declared in this scope
13986 | return SPARC_LONG_DOUBLE_TYPE_SIZE == 128 ? TFmode : DFmode;
  |^~~


To test yourself, configure GCC with:

  ../src/configure --prefix=/home/mjambor/gcc/mine/inst
--enable-languages=c,c++ --enable-checking=yes --disable-bootstrap
--disable-libsanitizer --disable-multilib --disable-libcilkrts
--target=sparc-wrs-vxworks 

and then make the compiler proper with:

  make -j64 all-host CXXFLAGS="-O0 -g" CFLAGS="-O0 -g"

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-06-14 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Martin Jambor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-14
 Ever confirmed|0   |1

--- Comment #5 from Martin Jambor  ---
Re-confirmed with the released GCC 14.1.

[Bug target/115463] New: 526.blender_r regressed 5% on Zen2 with -Ofast -flto -march=native since r15-1058-gc989e59fc99d99

2024-06-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115463

Bug ID: 115463
   Summary: 526.blender_r regressed 5% on Zen2 with -Ofast -flto
-march=native since r15-1058-gc989e59fc99d99
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: hongyuw at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---

The run-time of benchmark 526.blender_r from SPEC INTrate 2017 has
regressed 5.3% on Zen2-based CPUs when compiled with -Ofast -flto
-march-mnative since r15-1058-gc989e59fc99d99 (Hongyu Wang: [APX CCMP]
Support APX CCMP).  I was not expecting this patch to cause any
changes in code generated for non-APX CPUs but I have double checked.

The regression can be seen/tracked at
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.487.0


There are also smaller regressions that happened around the same time:

- zen2 -Ofast -flto -fprofile-use -march=native: 3%
  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=286.487.0

- skylake -Ofast -flto -march=native: 4%
  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=801.487.0

- skylake -Ofast -flto -fprofile-use -march=native: 3%
  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=792.487.0

- zen3 -Ofast -flto -march=native: 2%
  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.487.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/115462] New: 416.gamess regressed 4-6% on x86_64 since r15-882-g1d6199e5f8c1c0

2024-06-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115462

Bug ID: 115462
   Summary: 416.gamess regressed 4-6% on x86_64 since
r15-882-g1d6199e5f8c1c0
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: crazylht at gmail dot com
Blocks: 26163
  Target Milestone: ---

Benchmark 416.gamess from SPECINT 2006 recently regressed on all
x86_64 CPUs we track using many of the compiler options we track.  I
have bisected the one on Zen3 CPU using -O2 -flto (so -march=generic)
to r15-882-g1d6199e5f8c1c0 (liuhongt: Reduce cost of MEM (A + imm)).

Regressing hosts and options:

  - zen2 -O2 -flto: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=292.50.0
  - zen2 -O2 -march=native: 6%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=291.50.0
  - zen2 -O2 -flto -march=native: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=290.50.0
  - zen2 -Ofast: 4%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=300.50.0

  - skylake -O2: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=784.50.0
  - skylake -O2 flto: 4%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=799.50.0
  - skylake -O2 -march=native: 6%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=787.50.0
  - skylake -O2 -flto -march=native: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=788.50.0
  - skylake -Ofast: 4%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=789.50.0

  - zen3 -O2 -flto: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=469.50.0
  - zen3 -O2 -flto -fprofile-use: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=464.50.0
  - zen3 -O2 -flto -march=native: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=465.50.0
  - zen3 -Ofast: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=466.50.0

  - zen4 -O2 -flto: 4%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=956.50.0
  - zen4 -O2 -march=native: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=961.50.0
  - zen4 -O2 -flto -march=native: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=993.50.0
  - zen4 -Ofast: 4%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=967.50.0
  - zen4 -Ofast -march=native: 6%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=965.50.0
  - zen4 -Ofast -flto -march=native: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=992.50.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/115438] New: 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

2024-06-11 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438

Bug ID: 115438
   Summary: 503.bwaves_r regressed 5-11% on different x86_64
machines at -Ofast -march=native since
r15-1006-gd93353e6423eca
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---

The run-tie of 503.bwaves_r from SPEC INTrate 2017 regressed by 5-11%
on different x86_64 machines at -Ofast -march=native (specifically
without LTO) since r15-1006-gd93353e6423eca (Richard Biener: Do
single-lane SLP discovery for reductions).  I have bisected the issue
on zen3, the other regressions however appeared around the same time:

- zen3: 11% https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.427.0
- zen2: 7%  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.427.0
- skylake: 5%
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=801.427.0
- zen4: 5%  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.427.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/115329] New: [15 Regression] ICE in extract_insn, at recog.cc:2812 since r15-930-ge715204f203d31

2024-06-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115329

Bug ID: 115329
   Summary: [15 Regression] ICE in extract_insn, at recog.cc:2812
since r15-930-ge715204f203d31
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: ubizjak at gmail dot com
  Target Milestone: ---

Compiling the testcase (minimized from grub2):

int grub_swap_bytes32_x, grub_load_public_key___trans_tmp_1;
void grub_load_public_key() {
  grub_load_public_key___trans_tmp_1 = __builtin_bswap32(grub_swap_bytes32_x);
}

with options:

-Os -m32 -S -fno-common -std=gnu99 -march=i386 test.c

leads to the following ICE since revision r15-930-ge715204f203d31
(Uros Bizjak: i386: Rewrite bswaphi2 handling [PR115102]):

test.c: In function ‘grub_load_public_key’:
test.c:4:1: error: unrecognizable insn:
4 | }
  | ^
(insn 5 2 6 2 (set (reg:SI 102)
(ior:SI (and:SI (mem/c:SI (symbol_ref:SI ("grub_swap_bytes32_x") [flags
0x2] ) [1 grub_swap_bytes32_x+0 S4
A32])
(const_int -65536 [0x]))
(lshiftrt:SI (bswap:SI (mem/c:SI (symbol_ref:SI
("grub_swap_bytes32_x") [flags 0x2] ) [1 grub_swap_bytes32_x+0 S4 A32]))
(const_int 16 [0x10] "test.c":3:40 -1
 (nil))
during RTL pass: vregs
test.c:4:1: internal compiler error: in extract_insn, at recog.cc:2812
0x807d4c _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/mjambor/gcc/mine/src/gcc/rtl-error.cc:108
0x807d68 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/mjambor/gcc/mine/src/gcc/rtl-error.cc:116
0x806289 extract_insn(rtx_insn*)
/home/mjambor/gcc/mine/src/gcc/recog.cc:2812
0xca0b90 instantiate_virtual_regs_in_insn
/home/mjambor/gcc/mine/src/gcc/function.cc:1612
0xca0b90 instantiate_virtual_regs
/home/mjambor/gcc/mine/src/gcc/function.cc:1995
0xca0b90 execute
/home/mjambor/gcc/mine/src/gcc/function.cc:2042

[Bug c/115310] Option -Werror=return-type is too aggressive with -std=gnu89

2024-06-01 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115310

--- Comment #4 from Martin Jambor  ---
(In reply to Sam James from comment #2)
> In such environments, you don't need an explicit
> -Werror=return-type.

I agree I don't need it but it is there.

> So, you're asking presumably about testing with < GCC 14 to emulate
> >= GCC 14 behaviour?

No.  Specifically in openSUSE, -Werror=return-type is part of the
default compiler flags.  We would like to silence the new errors such
as implicit-int in packages written in pre-99 C with putting
-std=gnu89 rather than -fpermissive in package spec files.  We think
that -std=gnu89 option really better describes what is going on,
i.e. that the code is old rather than somewhat broken.

But because of the behavior described in this bug we cannot - without
either also explicitely specifying -Wno-error=return-type or
persuading the project to weaken the default warnings.

> It works fine without the explicit -Werror=return-type on GCC 14.

The compiler sees an explicit -Werror=return-type but it is not
explicitely spelled out in the package spec files.

[Bug c/115310] New: Option -Werror=return-type is too aggressive with -std=gnu89

2024-05-31 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115310

Bug ID: 115310
   Summary: Option -Werror=return-type is too aggressive with
-std=gnu89
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Consider:

 echo 'main () { return 0; }' > t.c

and then

  gcc -S -Werror=return-type t.c

works but

  gcc -S -Werror=return-type -std=gnu89 t.c

causes an error.

This reduces the ability to use -std=gnu89 to avoid new errors which
used to be warnings in environments where -Werror=return-type is the
default.

[Bug middle-end/115277] [13/14/15 regression] ICF needs to match loop bound estimates

2024-05-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115277

--- Comment #2 from Martin Jambor  ---
(In reply to Jan Hubicka from comment #1)
> Reproduces on 14 and trunk. GCC 12 is not able to determine the loop bound
> during early optimizations

What about gcc 13?

[Bug other/115174] New test case gcc.dg/lto/pr113359-2 fails

2024-05-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115174

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Jambor  ---
Should be fixed with r13-8785-gc827f46d8652d7

Sorry for forgetting to backport the testcase fix.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-05-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #32 from Martin Jambor  ---
(In reply to Marc Poulhiès from comment #31)
> Hello Martin,
> 
> Any chance the fix that fixes the new test for 32bits can be also backported?
> 
> 4923ed49b93352bcf9e43cafac38345e4a54c3f8
> https://gcc.gnu.org/g:4923ed49b93352bcf9e43cafac38345e4a54c3f8
> 
> Not sure why it's not tagged so that it would appear here.

My apologies for not including this commit, I completely forgot about it. 
Unfortunately I'm afraid it will have to wait until after the 13.3 release, but
I will backport it quickly afterwards.  Sorry again.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #20 from Martin Jambor  ---
The IL we generate the jump function from is:
  
  _1 = cclauses_2(D) != 0B;
  c_parser_omp_all_clauses (_1);

Which translates to the expected jump function:
  callsite  void c_parser_omp_teams(int**)/3 -> int*
c_parser_omp_all_clauses(bool)/1 :
 param 0: PASS THROUGH: 0, op ne_expr 0B

so IPA looks like it's doing what it should.

(In reply to Aldy Hernandez from comment #6)
> I wonder if something like this would work.
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 5781f50..ea8a685 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1730,6 +1730,8 @@ ipa_value_range_from_jfunc (vrange ,
> }
>else
> {
> + if (TREE_CODE_CLASS (operation) == tcc_comparison)
> +   vr_type = boolean_type_node;
>   Value_Range op_res (vr_type);
>   Value_Range res (vr_type);
>   tree op = ipa_get_jf_pass_through_operand (jfunc);

This looks OKish and we also do a similar thing in
ipa_get_jf_arith_result.

Also note that the ipa_value_range_from_jfunc already has a parameter
that tells it what type the result should be.  It is called parm_type,
which is boolean_type in the case that ICEs.  So we can even bail out
if we really encounter jump function created from bad IL.

I was thinking of using use parm_type from the beginning, to
initialize op_res with it, but there are jump functions representing
an operation followed by a truncation, for example for:

  _2 = complain_6(D) & 1;
  _3 = (int) std_alignof_7(D);
  cxx_sizeof_or_alignof_type (_3, _2);

where _r is in fact bool (has smaller size and precision) and trying
to make ranger do the bit_and_expr directly to bool leads to a failed
assert in fold_range (the test of m_operator->operand_check_p).

So doing the operation in the original type - unless it is a
comparison - and then using ipa_vr_operation_and_type_effects seems to
be the right thing to do.

But I am really curious why propagate_vr_across_jump_function does not
need the same check for tcc_comparison operators and generally why is
it so different (in the non-scc case)?  Why is ipa_supports_p (this
predicate has a really really really bad name BTW and I am completely
at loss as to what it does and how or why) used there and not in
ipa_value_range_from_jfunc?

(I also cannot prevent myself from ranting a little that it would
really help if all the ranger (helper) classes and functions were
better documented.)

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-05-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Martin Jambor  ---
Fixed.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-05-14 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #30 from Martin Jambor  ---
...so set to fixed as well.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-05-14 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #29 from Martin Jambor  ---
Fixed

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-13 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #19 from Martin Jambor  ---
The following minimized testcase ICEs with r15-312-g36e877996936ab
cross-compiler to ppc64le with -O2 nicely:


void omp_clause_elt_check(int *, const char *, const char *);
enum { C_OMP_CLAUSE_SPLIT_COUNT };
enum c_omp_region_type { C_ORT_OMP };
void c_finish_omp_clauses(int *, c_omp_region_type);
int *c_parser_omp_all_clauses_prev;
int *c_parser_omp_all_clauses(bool finish_p) {
  if (finish_p)
c_finish_omp_clauses(c_parser_omp_all_clauses_prev, C_ORT_OMP);
  return c_parser_omp_all_clauses_prev;
}
int c_parser_omp_teams___trans_tmp_104;
static void c_parser_omp_teams(int **cclauses) {
  c_parser_omp_all_clauses(cclauses);
  omp_clause_elt_check(_parser_omp_teams___trans_tmp_104, "", __FUNCTION__);
}
void c_parser_omp_target() {
  int *cclauses[C_OMP_CLAUSE_SPLIT_COUNT];
  c_parser_omp_teams(cclauses);
}

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #16 from Martin Jambor  ---
I'll have look, hopefully on Monday.

[Bug ipa/106935] [11/12/13/14/15 Regression] ICE in redirect_call_stmt_to_callee, at cgraph.cc:1505 since r10-5098-g9b14fc3326e08797

2024-05-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106935

Martin Jambor  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Martin Jambor  ---
We hit an assert guarding that we have not already massaged call
arguments before modifying them during call redirection as that would
end up in wring code.  We do that by looking first whether the decl in
the statement is the same as the decl of the cgraph_edge callee and if
not, if the node associated with the decl from the statement has any
parameter adjustment info.

The issue here is that we are in the process of inlining an artificial
thunk, which calls to a cgraph_node clone with adjustments from its
inception.  That would normally not be a problem because of the first
check above (both decls would be the same, we don't really redirect
these calls, not even in this case).  But the call is actually
recursive, and so the decl from the call graph edge is one created by
save_inline_function_body whereas the one in the statement is the
original one.

I guess we need to detect this particular situation.

[Bug c++/114935] New: Miscompilation of initializer_list in presence of exceptions

2024-05-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114935

Bug ID: 114935
   Summary: Miscompilation of initializer_list in
presence of exceptions
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: jason at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

The following testcase:

#include 
#include 

void __attribute__((noipa))
tata(std::initializer_list init)
{
  throw 1;
}

int
main()
{
  try
{
  tata({ "0123456789012346" }); // using shorter string or "..."s works
}
  catch (...)
{
}
}

aborts when compiled with GCC 14 even when not optimizing.

I have bisected the failure to r14-1705-g2764335bd336f2 (Jason
Merrill: c++: build initializer_list in a loop [PR105838])

This has been extracted from libstorage-ng testsuite and originally
filed as https://bugzilla.opensuse.org/show_bug.cgi?id=1223820

[Bug tree-optimization/107021] [13 Regression] 511.povray_r error with -Ofast -march=znver2 -flto since r13-2810-gb7fd7fb5011106

2024-05-02 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107021

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #11 from Martin Jambor  ---
It seems that clang is hitting the same problem now:
https://discourse.llvm.org/t/fast-math-spec-2017-fp-failure-for-povray/74959

[Bug ipa/106935] [11/12/13/14/15 Regression] ICE in redirect_call_stmt_to_callee, at cgraph.cc:1505 since r10-5098-g9b14fc3326e08797

2024-04-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106935

--- Comment #3 from Martin Jambor  ---
This ICE no longer happens with GCC 13, in fact after r13-4240-gfeeb0d68f1c708
(Martin Jambor: ipa-cp: Do not consider useless aggregate constants).  From the
patch description, it does not look to be a fix of the underlying issue.

[Bug ipa/102310] [11/12 Regression] ICE in visit_ref_for_mod_analysis with OpenACC

2024-04-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102310

Martin Jambor  changed:

   What|Removed |Added

  Known to work||13.1.0
Summary|[11/12/13/14/15 Regression] |[11/12 Regression] ICE in
   |ICE in  |visit_ref_for_mod_analysis
   |visit_ref_for_mod_analysis  |with OpenACC
   |with OpenACC|

--- Comment #10 from Martin Jambor  ---
This has been fixed in GCC 13 by r13-2665-g23baa717c991d7 (Julian Brown:
OpenMP/OpenACC struct sibling list gimplification extension and rework).

[Bug tree-optimization/113964] [11/12/13/14/15 Regression] repeat copy of struct

2024-04-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113964

--- Comment #5 from Martin Jambor  ---
(In reply to Richard Biener from comment #2)
> No, I think the issue is that ESRA leaves e.f0 alone:
> 
>   e$f3_7 = e.f3;
>   e$f0$f4_8 = e.f0.f4;
>   _1 = e$f0$f4_8;
>   _2 = (unsigned char) _1;
>   e$f3_9 = _2;
>   e.f0 = g_50;
>   e$f3_10 = MEM  [(struct S1 *)_50];
>   e$f0$f4_11 = MEM  [(struct S1 *)_50 + 24B];
>   MEM  [(union U8 *)] = e$f3_10;
>   MEM  [(union U8 *) + 24B] = e$f0$f4_11;
>   g_16 = e.f0;
> 
> it looks like it materializes the e.f0 = g_15 copy but fails to elide that
> (maybe assuming sth else will?)?  And then for some reason the final
> g_16 = e.f90 copy isn't replaced?!
> 
> So somehow SRAs heuristics go off.
> 
> Martin?

I am afraid this is just another example of what flow-insensitive SRA cannot
optimize well.  I'll keep it in the list of testcases to hopefully one day
improve on when we make it flow sensitive.

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #6 from Martin Jambor  ---
(In reply to Paweł Bylica from comment #5)
> (In reply to Martin Jambor from comment #4)
> > In this testcase all (well, both) functions referenced from the array
> > are semantically equivalent which is recognized by ICF but making it
> > be able to pass this information to the inliner would be
> > non-trivial... and is this the common case worth optimizing for?
> 
> I reduced the original code to the array of two identical functions.
> Originally, there weren't identical. I can update the test case if this make
> more sense.

Probably not.  But how many elements does the array have in the original code? 
Perhaps we could speculatively inline them if there are only few.

[Bug testsuite/114662] [14 regression] new test case c_lto_pr113359-2 from r14-9841-g1e3312a25a7b34 fails

2024-04-10 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114662

--- Comment #5 from Martin Jambor  ---
Thanks a lot for taking care of it before I had a chance to.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #75 from Martin Jambor  ---
The above fixes the testcase from comment #58.  I am not sure if any other
testcases discussed here remain unresolved.  I am also not sure to what extent
we want to that patch of mine, I guess I'll re-visit the idea in a few weeks.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #26 from Martin Jambor  ---
This should be fixed on master, I'll backport the fix in a few weeks to at
least gcc-13 where it was reported.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #9 from Martin Jambor  ---
On master this has been fixed by r14-9813-g8cd0d29270d4ed where I
unfortunately copy-pasted a wrong bug number :-/

I assume this needs backporting to at least gcc-13 and gcc-12. I'll do
that in a week or two.

[Bug tree-optimization/113964] [11/12/13/14/15 Regression] repeat copy of struct

2024-04-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113964

--- Comment #4 from Martin Jambor  ---
Oops. I made a mistake, the commit above fixes PR 114247, sorry :-/
This one is the next in my queue.  Sorry again.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #7 from Martin Jambor  ---
Thanks, I will bootstrap and test the patch on x86_64 and submit it
for review then.

Can I ask you, can you please modify the testcase so that it does not
use printf but simply calls __builtin_abort in the miscompiled case
and just returns zero from main if it is OK?  That way we could
include it in our test suite.  Thanks a lot.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #24 from Martin Jambor  ---
(In reply to Jan Hubicka from comment #23)
> I however wonder if we really guarantee to copy the paddings everywhere else
> then the total scalarization part?
> (i.e. in all paths through the RTL expansion)

I wanted that we sometimes don't do that in PR 80689 and the idea was
refused.  And as far as I can recall the code I don't think we do.

Anyway, I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6jzlc25db@virgil.suse.cz/T/#u

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #71 from Martin Jambor  ---
I have sent the patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6le5s25kl@virgil.suse.cz/T/#u

[Bug ipa/111571] [13 Regression] ICE in modify_call, at ipa-param-manipulation.cc:656

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111571

Martin Jambor  changed:

   What|Removed |Added

Summary|[13/14 Regression] ICE in   |[13 Regression] ICE in
   |modify_call, at |modify_call, at
   |ipa-param-manipulation.cc:6 |ipa-param-manipulation.cc:6
   |56  |56

--- Comment #6 from Martin Jambor  ---
Fixed on master, fix queued for backporting to gcc 13 branch.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

--- Comment #4 from Martin Jambor  ---
I don't seem to be able to get riscv64 qemu running in reasonable
time.  Can someone please verify that the following patch fixes
the issue?

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 3e0df6a6f77..b4ca78b652e 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -740,6 +740,12 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
  }
   if (repl)
{
+ if (!useless_type_conversion_p(apm->type, repl->typed.type))
+   {
+ repl = force_value_to_type (apm->type, repl);
+ repl = force_gimple_operand_gsi (, repl,
+  true, NULL, true,
GSI_SAME_STMT);
+   }
  vargs.quick_push (repl);
  continue;
}

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Mine.

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-03-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #22 from Martin Jambor  ---
Created attachment 57828
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57828=edit
Potential fix

I'm testing this patch

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-03-27 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

Martin Jambor  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---
   Last reconfirmed||2024-03-27
 Ever confirmed|0   |1

--- Comment #4 from Martin Jambor  ---
This does not look like a duplicate of PR 111573.

Nevertheless, it is not quite obvious what to do here.  Inlining
happens before unrolling and I am not sure we'd consider unrolling in
early optimizations.  And without unrolling, the load from the array
is not easy to fold.

In this testcase all (well, both) functions referenced from the array
are semantically equivalent which is recognized by ICF but making it
be able to pass this information to the inliner would be
non-trivial... and is this the common case worth optimizing for?

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #66 from Martin Jambor  ---
Created attachment 57750
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57750=edit
Patch comparing jump functions

I'm testing this patch.  (Not sure how to best check that it does not
inadvertently pessimize ICF too much, except for ICF testcases.)

[Bug ipa/114254] [11/12/13 regression] Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

Martin Jambor  changed:

   What|Removed |Added

Summary|[11/12/13/14 regression]|[11/12/13 regression]
   |Indirect inlining through   |Indirect inlining through
   |C++ member pointers fails   |C++ member pointers fails
   |if the underlying class has |if the underlying class has
   |a virtual function  |a virtual function

--- Comment #3 from Martin Jambor  ---
Fixed on trunk.  I may consider backporting to GCC 13 but probably not to
earlier versions.

[Bug ipa/108802] [11/12/13 Regression] missed inlining of call via pointer to member function

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

Martin Jambor  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13 Regression]
   |missed inlining of call via |missed inlining of call via
   |pointer to member function  |pointer to member function

--- Comment #10 from Martin Jambor  ---
Fixed on trunk.  I may consider backporting to GCC 13 but probably not to
earlier versions.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #65 from Martin Jambor  ---
I hope to have some jump-function comparison functions ready for testing later
today.

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-03-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #5 from Martin Jambor  ---
I'd like to ping this, are there plans to implement this in the near-ish term?

[Bug ipa/111571] [13/14 Regression] ICE in modify_call, at ipa-param-manipulation.cc:656

2024-03-15 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111571

--- Comment #4 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gbwf7l@virgil.suse.cz/T/#u

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Martin Jambor  ---
Fixed.

[Bug ipa/114254] [11/12/13/14 regression] Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

--- Comment #1 from Martin Jambor  ---
I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gkzvi4@virgil.suse.cz/T/#u

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-03-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

--- Comment #8 from Martin Jambor  ---
I have proposed an improved patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6r0gkzvi4@virgil.suse.cz/T/#u

[Bug ipa/114254] New: Indirect inlining through C++ member pointers fails if the underlying class has a virtual function

2024-03-06 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114254

Bug ID: 114254
   Summary: Indirect inlining through C++ member pointers fails if
the underlying class has a virtual function
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: jamborm at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57634
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57634=edit
testcase

Just adding a virtual method to the class in our test
testsuite/g++.dg/ipa/iinline-2.C and it will unfortunately stop
working.

At some point the C++ FE got clever and stopped emitting the complex
code checking if a member pointer points to a virtual method or a
normal one when the base class does not have any virtual method.  But
that meant that our testcases stopped exercising the pattern matching
code in ipa_analyze_indirect_call_uses and when that code changed with
r10-917-g3b47da42de621c (Martin Jambor: Make SRA re-construct original
memory accesses when easy) because of a small mistake, we lost the
intended ability to inline also these cases.

So this is a regression against 9.5, unfortunately.

[Bug tree-optimization/114238] New: Multiple 554.roms_r run-time regressions (4%-20%) since r14-9193-ga0b1798042d033

2024-03-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114238

Bug ID: 114238
   Summary: Multiple 554.roms_r run-time regressions (4%-20%)
since r14-9193-ga0b1798042d033
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux

Our LNT instance has detected that runtime of benchmark 554.roms_r
from the SPEC 2017 FPUrate suite regressed on all machines on most
configurations by 4-20%.

For example:

simple -O2 -flto on AMD Zen 3 regressed by 14%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.537.0

on Zen2 -O2 -flto regression is the worst, 20%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=298.537.0

-Ofast -march=native -flto on AMD Zen 4 regressed by 7%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=959.537.0

-Ofast -march=native on AMD Zen 2 regressed by 17%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.537.0

but it also happens on Intel Skylake:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=800.537.0

or Aarch64:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=587.537.0

and there are smaller regressions on the PGO configurations too.

I have bisected the Zen3 -O2 -flto case to r14-9193-ga0b1798042d033
(Richard Biener: tree-optimization/114074 - CHREC multiplication and
undefined overflow).  I have then verified that the zen 4 -Ofast
-march=natice -flto and zen 2 -Ofast -march=native cases have also
been introduces by it:

commit a0b1798042d033fd2cc2c806afbb77875dd2909b
Author: Richard Biener 
Date:   Mon Feb 26 13:33:21 2024 +0100

tree-optimization/114074 - CHREC multiplication and undefined overflow

When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior.  The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.

I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.

PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.

* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

--- Comment #7 from Martin Jambor  ---
I have proposed a patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6y1bdx3yg@virgil.suse.cz/T/#u

[Bug ipa/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-21 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Martin Jambor  ---
Fixed.

[Bug ipa/111573] lambda functions often not inlined and optimized out

2024-02-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111573

--- Comment #2 from Martin Jambor  ---
I cannot see any difference at -O3 with or without -fno-early-inlining.

[Bug tree-optimization/112312] -O3 produces worse code than -O2 for std::ranges::lower_bound in some cases, not marking a loop as finite

2024-02-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112312

--- Comment #4 from Martin Jambor  ---
It seems this has been fixed in current master (which is to become gcc 14).
If my bisecting is correct, it has been fixed by r14-5628-g53ba8d669550d3 (Jan
Hubicka: inter-procedural value range propagation).

I guess it would be nice to add this testcase to the testsuite, so I'm keeping
this bug opened (and on my TODO list).

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Martin Jambor  ---
I think I know what to do.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #15 from Martin Jambor  ---
Created attachment 57462
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57462=edit
Simple testcase (needs disabling early - and only early - SRA)

This is a simpler testcase which exhibits the problem on x86_64-linux
and current master.  Steps to reproduce:

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp 
--disable-tree-esra -flto pr113359.c -c -o 1.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -O2 -fno-strict-aliasing -fno-ipa-cp 
--disable-tree-esra -flto -DFILE2 pr113359.c -c -o 2.o
cc1: note: disable pass tree-esra for functions in the range of [0, 4294967295]

$ ~/gcc/trunk/inst/bin/gcc -flto 1.o 2.o -o test.exe

$ ./test.exe 
Aborted (core dumped)


If you add -fno-ipa-icf to the "compilation" commands, the test will
pass.

Late (post ICF) intra-procedural SRA is necessary to exhibit the
problem.  On the other hand, early SRA must be suppressed or it will
scalarize the aggregate assignment too early and the results will look
different to IPA-ICF.  Instead of using --disable-tree-esra we could
pass the address of tmp in both geta() and getb() to an empty function
coming from a third compilation unit.

Disabling strict aliasing is also necessary to show the problem, with
strict aliasing IPA-ICF takes the alias class of types into acount
when hashing and considers geta() and getb() different from the start.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-02-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #6 from Martin Jambor  ---
I have proposed a patch on the mailing list that converts the array of lattices
to a vector:
https://inbox.sourceware.org/gcc-patches/ri6frxoxzpk@virgil.suse.cz/T/#u

[Bug lto/113712] [11/12/13/14 Regression] lto crash: when building 641.leela_s peek with Example-gcc-linux-x86.cfg (SPEC2017 1.1.9) since r10-3311-gff6686d2e5f797

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113712

--- Comment #20 from Martin Jambor  ---
I have access to the benchmark and building it with -fprofile-generate
it fails for me (with an ICE in add_symbol_to_partition_1) only when I
use -fno-use-linker-plugin and either -std=c++11 or -std=c++03. Using
-std=c++14 also avoids the issue.  In any event, -fno-use-linker-plugin
looks necessary.

[Bug lto/113712] [11/12/13/14 Regression] lto crash: when building 641.leela_s peek with Example-gcc-linux-x86.cfg (SPEC2017 1.1.9) since r10-3311-gff6686d2e5f797

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113712

--- Comment #18 from Martin Jambor  ---
(In reply to Filip Kastl from comment #17)
> I've bisected this (using the test from Andrew Pinski) to
> r10-3311-gff6686d2e5f797

That's a coincidence, with -fno-ipa-sra the testcase fails even earlier,
IPA-SRA was just hiding it, most probably by localizing some symbol before the
linking stage.

Bugs that are only reproducible with -fno-use-linker-plugin are unlikely to get
a high priority.  But I understand that the original issue does not need it?

(Also, the issue is supposed to be reproducible ton x86_64-linux, right?)

[Bug target/113847] [14 Regression] 10% slowdown of 462.libquantum on AMD Ryzen 7700X and Ryzen 7900X

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113847

--- Comment #6 from Martin Jambor  ---
(In reply to Richard Biener from comment #5)
> CCing also Martin who should know how/why IPA SRA doesn't reconstruct the
> component ref chain here 

I have not had a look at this specific case (yet), but IPA-SRA just
doesn't (unlike intraprocedural SRA) and always creates MEM_REFs (in
callers).  I guess we could stream field offsets and/or array_ref
indices and attempt to reconstruct it for simple (non-union,
non-otherwise-overlapping) types, even if it would make the
ipa_adjusted_param type (and thus ipa_param_adjustments) slightly
bigger and add another vector.

> or why it choses the dynamic type as it does
> (possibly local SRA when fully scalarizing an aggregate copy does the same).

That is unlikely.  Total scalarization in intraprocedural SRA just
follows the type of the decl whereas IPA-SRA (and intra-SRA too when
not totally scalarizing) takes all types from existing memory
accesses.

[Bug tree-optimization/113833] 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

--- Comment #4 from Martin Jambor  ---
Created attachment 57397
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57397=edit
-fopt-info-vec before/after comparison

(In reply to Richard Biener from comment #3)
> A compare before/after the patch of -fopt-info-vec output might show the few
> cases that are affected by the patch.

I Hope I have not messed anything up.  I have added -fopt-info-vec right after
-fprofile-use into the spec config and then grepped the output for
':[^:]*:[^:]*: optimized'.  Then I sorted (because the build was parallel) and
compared the output and it seems there are quite a few *fewer* instances of
vectorization happening.

[Bug tree-optimization/110422] asm goto vs SRA

2024-02-09 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Martin Jambor  ---
Fixed on all opened release branches too.

[Bug tree-optimization/113833] New: 435.gromacs fails verification on with -Ofast -march={cascadelake,icelake-server} and PGO after r14-7272-g57f611604e8bab

2024-02-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113833

Bug ID: 113833
   Summary: 435.gromacs fails verification on with -Ofast
-march={cascadelake,icelake-server} and PGO after
r14-7272-g57f611604e8bab
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: fxue at os dot amperecomputing.com
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

After r14-7272-g57f611604e8bab (Feng Xue: Do not count unused scalar
use when marking STMT_VINFO_LIVE_P [PR113091]), our runs of SPEC 2006
CPU benchmark 435.gromacs on Icelake-server CPU compiled with -Ofast
-march=native and PGO (with and without LTO) started failing with
miscompare error:

  0002:  3.07684e+02
 3.03476e+02
   ^

I subsequently verified the failure on an Intel CascadeLake and
bisected it to the aforementioned commit.  We don't see it on our AMD
or Ampere testers (using -march=native).

I guess the miscomparison error may be well within what is expected
when using -Ofast but even in that case it would be nice to have it
documented here that that is indeed expected.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-02-08 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

--- Comment #8 from Martin Jambor  ---
I have proposed a fix on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6bk8r5kfi@virgil.suse.cz/T/#u

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-07 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #14 from Martin Jambor  ---
(In reply to rguent...@suse.de from comment #13)
> Might be also an interaction with IPA ICF in case there's a pointer to
> the pair involved?

Yes, this is exactly what seems to be happening.  The problem goes
away with -fno-icf.

(Possibly because the testcase uses -fno-strict-aliasing,) IPA-ICF
merges two functions which copy a structure and that access type it
what IPA-SRA saves, but loads only the one of the merged functions.
SRA then uses the (wrong) type to split aggregate copies into copies
by individual fields.

I have talked to Honza about this.  It seems that IPA-ICF needs to be
careful about aggreage with holes in different places.  The ideal next
step would be to create a testcase not dependent on IPA-SRA.

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

--- Comment #9 from Martin Jambor  ---
SRA creates the replacements (in GCC 13) during total scalarization,
i.e. the bit that is not driven by pre-existing accesses to
aggregates, but because it sees an aggregate that is small and regular
and so it is split according to its type in the hope it will go away.

Unfortunately in the LTO and non-LTO case, they see a different type.
I have added a dumping of types and fields of totally scalarized
records and got the following.

In the non-LTO case, the type of the aggregate is:
   constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 symtab:1430035184 alias-set -1 canonical-type
0x553cabd0
...

and specifically its third field is a pointer:
  
pointer_to_this >
unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x562729d8 reference_to_this >
used unsigned nonlocal decl_3 DI /usr/include/c++/13/bits/stl_pair.h:194:11
size  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset  constant 0>
bit-offset  constant 64> context >


However, in the LTO case the type of the aggregate is:
   constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 symtab:0 alias-set 98 canonical-type
0x61cc1498
...

which however has an unsigned int as its third field:
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x62410690 precision:32 min  max 
pointer_to_this  reference_to_this
>
unsigned nonlocal SI /usr/include/c++/13/bits/stl_pair.h:194:11
size  constant 32>
unit-size  constant 4>
align:32 warn_if_not_align:0 offset_align 128 decl_not_flexarray: 0
offset  constant 0>
bit-offset  constant 64> context >

An so only an unsigned int replacement is created.

The name of the aggregate indicates it has been created by IPA-SRA and
so that is where I am looking right now, but IPA-SRA simply takes (and
streams) the type of the access in the original function body for
these.  Can't this perhaps be some type-merging issue?

[Bug tree-optimization/113757] [14 regression] ICE when building legion-23.03.0 since r14-8398

2024-02-05 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113757

Martin Jambor  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Martin Jambor  ---
This is a very particular interaction of the patch with speculative
devirtualization.  Mine.

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-31 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #3 from Martin Jambor  ---
(In reply to Richard Biener from comment #1)
> Did you try with -fprofile-partial-training (is that default on?  it
> probably should ...).  Can you please try training with the rate data
> instead of train
> to rule out a mismatch?

With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer
master) goes down from 66% to 54%.  

So far I did not find a way to easily train with the reference run (when I add
"train_with = refrate" to the config, I always get "ERROR: The workload
specified by train_with MUST be a training workload!")

[Bug target/113655] New: Cross compiling to mips64-elf fails because "MIPS_EXPLICIT_RELOCS was not declared" after r14-8386-g58af788d1d0825

2024-01-29 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113655

Bug ID: 113655
   Summary: Cross compiling to mips64-elf fails because
"MIPS_EXPLICIT_RELOCS was not declared" after
r14-8386-g58af788d1d0825
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: syq at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-linux
Target: mips64-elf

Starting with r14-8386-g58af788d1d0825 (MIPS: Accept arguments for
-mexplicit-relocs), when I try to test that cross compilation from
x86_64-linux to target mips64-elf still works by configuring gcc with:

../src/configure --prefix=/home/mjambor/gcc/mine/inst --enable-languages=c,c++
--enable-checking=yes --disable-bootstrap --disable-multilib --enable-obsolete
--target=mips64-elf

and then building just the compiler with make -j64 all-host,

the compilation fails with:

options.cc:3474:3: error: ‘MIPS_EXPLICIT_RELOCS’ was not declared in this
scope; did you mean ‘MIPS_EXPLICIT_RELOCS_NON ’?
 3474 |   MIPS_EXPLICIT_RELOCS, /* mips_opt_explicit_relocs */
  |   ^~~~
  |   MIPS_EXPLICIT_RELOCS_NONE


Our buildbot reports failures when building a cross-compiler for
mips64el-st-linux-gnu, mips64octeon-linux, mipsisa64r2-linux,
mipsisa32r2-linux-gnu, mipsisa64r2-sde-elf, mipsisa32-elfoabi,
mipsisa64-elfoabi, mipsisa64r2el-elf, mipsisa64sr71k-elf,
mipsisa64sb1-elf, mips64-elf, mipsel-elf, mips64vr-elf,
mips64orion-elf, mips-rtems, mips-wrs-vxworks, mipstx39-elf and I
suspect the problem is the same or similar.

[Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Bug ID: 113646
   Summary: PGO hurts run-time of 538.imagick_r as much as 68% at
-Ofast -march=native
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux, aarch64-linux
Target: x86_64-linux, aarch64-linux

Using profile guided optimization is very detrimental when compiling SPEC 2017
FPrate benchmark 538.imagick_r at -Ofast -march=native (with or without LTO) on
all machines where I have tried.

On Zen4, using PGO results in a 68% slower than not doing that without LTO and
65% with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.507.0=966.507.0=959.507.0=958.507.0;

On Zen3, using PGO slows the binary down by 22% when not using LTO and by 30%
with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0=473.507.0=475.507.0=477.507.0;

On Zen2, PGO regresses by 16% without LTO and by 28% with it:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0=293.507.0=287.507.0=286.507.0;

On our Altra CPU, the slowdowns are 26% and 45%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=584.507.0=583.507.0=587.507.0=589.507.0;

On an Intel CascadeLake machine, they are 24% and 41%. (Our LNT Intel machine
is temporarily offline, unfortunately).

It is of course possible that the training workload does not match the
reference one very well.  However, this was not a problem in the past
(apparently the problem is that our non-PGO results improved but our PGO ones
did not).  Also, other compilers such as LLVM achieve better run-times with PGO
than without.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113641] New: 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23

2024-01-28 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641

Bug ID: 113641
   Summary: 510.parest_r with PGO at O2 slower than GCC 12 (7% on
Zen 3&2, 4% on CascadeLake) since
r13-4272-g8caf155a3d6e23
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

During the development of GCC 13, 510.parest_r run-time regressed on x86_64
when built with profile guided optimization and just plain O2 and master than
when using GCC12.  The difference is not big but fairly clear cut, about 7.6%
on Zen3:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=740.457.0=892.457.0=694.457.0;

and about 7.2% on Zen2:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=777.457.0=932.457.0=687.457.0;

The graphs above show use of both LTO and PGO but LTO is not necessary.

I was able to bisect the regression to commit r13-4272-g8caf155a3d6e23 (i386:
Only enable small loop unrolling in backend [PR 107692]).  parest_r is also
about 4% slower when compiled with this revision than with the previous one on
Intel CascadeLake.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

--- Comment #4 from Martin Jambor  ---
(In reply to Hongtao Liu from comment #2)
> A patch is posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html
> 
> Would you give a try to see if it fixes the regression, I don't currently
> have a znver4 machine for testing.

Unfortunately it does not.

(In reply to Richard Biener from comment #3)
> I think we need to figure out what exactly gets slower (and hope it's not
> scattered all over the place)

I have collected some profiles:

r14-5602-ge6269bb69c0734

# Samples: 516K of event 'cycles:u'
# Event count (approx.): 468008188417
# Overhead   Samples  Command  Shared Object   
  Symbol   
#     ... 
. 
.
#
13.55% 69886  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] mc_chroma
11.05% 57017  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_16x16
 9.24% 47693  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_8x8
 8.67% 44733  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] get_ref
 4.84% 24984  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub16x16_dct
 4.16% 21484  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_me_search_ref
 3.30% 17033  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_hadamard_ac_16x16
 2.28% 11770  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_4x4
 2.10% 10824  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_trellis_cabac
 2.07% 10694  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] hpel_filter
 2.05% 10616  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub8x8_dct
 1.86%  9593  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] refine_subpel
 1.70%  8788  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_4x4
 1.57%  8077  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sad_16x16
 1.16%  6324  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] frame_init_lowres_core
 1.14%  5867  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sa8d_8x8
 1.11%  5738  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_cabac_encode_decision_c
 1.08%  5736  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_var_16x16



r14-5603-g2b59e2b4dff421

# Samples: 550K of event 'cycles:u'
# Event count (approx.): 498834737657
# Overhead   Samples  Command  Shared Object   
  Symbol   
#     ... 
. 
.
#
18.21%100151  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_16x16
12.37% 68006  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] mc_chroma
 8.51% 46815  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_8x8
 7.56% 41560  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] get_ref
 4.53% 24901  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub16x16_dct
 3.92% 21561  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_me_search_ref
 3.08% 16963  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_hadamard_ac_16x16
 2.41% 13239  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_satd_4x4
 1.99% 10931  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_trellis_cabac
 1.96% 10801  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] hpel_filter
 1.95% 10764  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] sub8x8_dct
 1.56%  8587  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] quant_4x4
 1.49%  8166  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] refine_subpel
 1.48%  8124  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sad_16x16
 1.09%  6328  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] frame_init_lowres_core
 1.07%  5901  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_pixel_sa8d_8x8
 1.04%  5703  x264_r_peak.min 
x264_r_peak.mine-pgo-Ofast-native-m64  [.] x264_cabac_encode_decision_c

[Bug tree-optimization/107946] [13/14 Regression] 507.cactuBSSN_r regresses by ~9% on znver3 with PGO since r13-3875-g9e11ceef165bc0

2024-01-26 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107946

Martin Jambor  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #7 from Martin Jambor  ---
This regression is still there (as the graphs linked in the summary show).

[Bug target/113600] New: 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-01-25 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600

Bug ID: 113600
   Summary: 525.x264_r run-time regresses by 8% with PGO -Ofast
-march=znver4
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: liuhongt at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu

With profile-feedback, -Ofast and -march=native on an AMD Zen 4, there is a
recent 8% regression:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=979.377.0=966.377.0;

With both PGO and LTO, the situation is similar (6%):
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=977.377.0=958.377.0;

On a Zen3 machine, there is a 2% bump around the same time:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=900.377.0=473.377.0;

I have bisected the (non-LTO) Zen 4 case to commit r14-5603-g2b59e2b4dff421:

2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf is the first bad commit
commit 2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf
Author: liuhongt 
Date:   Thu Nov 16 18:38:39 2023 +0800

Support reduc_{plus,xor,and,ior}_scal_m for vector integer mode.

BB vectorizer relies on the backend support of
.REDUC_{PLUS,IOR,XOR,AND} to vectorize reduction.

gcc/ChangeLog:

PR target/112325
* config/i386/sse.md (reduc__scal_): New expander.
(REDUC_ANY_LOGIC_MODE): New iterator.
(REDUC_PLUS_MODE): Extend to VxHI/SI/DImode.
(REDUC_SSE_PLUS_MODE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112325-1.c: New test.
* gcc.target/i386/pr112325-2.c: New test.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/105275] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275

--- Comment #3 from Martin Jambor  ---
I have re-checked this year again (using master revision
r14-7200-g95440171d0e615)  but this time on a high-frequency Zen3 CPU (EPYC
75F3). Run-time of 525.x264_r built with master with PGO and -O2 improved by
5.49% compared to GCC 13 and so compared to GCC 11 the regression dropped to
4.2%.

Run-time of 538.imagick_r compiled with the same options and master is 5.8%
slower on this CPU than when compiling it with GCC 11.

With both PGO and LTO, 525.x264_r is now only 2.8% slower than GCC 11.  In case
of 538.imagick_r the regression is 2.01% on the zen4, but it is 7.49% on a zen4
machine :-/

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

--- Comment #8 from Martin Jambor  ---
Fixed on trunk.  I did not want to backport this but because this variant does
not require disabling DCE, I will probably do after a few weeks on master, if
there are no issues.

[Bug ipa/108007] [11/12/13/14 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

--- Comment #22 from Martin Jambor  ---
Fixed on trunk.  I did not want to backport this but because of PR 112616 I
will probably do after a few weeks on master, if there are no issues.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-24 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Martin Jambor  ---
Fixed.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-01-22 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

--- Comment #4 from Martin Jambor  ---
The right place where to free stuff in lattices post-IPA would be in
ipa_node_params::~ipa_node_params() where we should iterate over lattices and
deinitialize them or perhaps destruct the array because since ipcp_vr_lattice
directly contains Value_Range which AFAIU directly contains int_range_max which
has a virtual destructor... does not look like a POD anymore.  This has escaped
me when I was looking at the IPA-VR changes but hopefully it should not be too
difficult to deal with.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

--- Comment #5 from Martin Jambor  ---
I have proposed a fix on the mailing list: 
https://inbox.sourceware.org/gcc-patches/ri6cytv3eyy.fsf@/T/#u

[Bug other/94629] 10 issues located by the PVS-studio static analyzer

2024-01-20 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

--- Comment #28 from Martin Jambor  ---
(In reply to David Binderman from comment #27)
> The original article checked gcc-10.
> gcc-13 is checked in the following article:
> 
> https://pvs-studio.com/en/blog/posts/cpp/1067/
> 
> I suspect it would be most unwise if any release of gcc after 13 
> introduced new bugs that were known to pvs-studio.

And is there already a bugzilla bug about these (or should I create one)?
I believe a new one would be better than re-using this one.

[Bug ipa/113490] [14 Regression] ICE: in propagate_vals_across_arith_jfunc, at ipa-cp.cc:2425 at -O3 since r14-285

2024-01-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113490

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Still, let me have a look.

[Bug tree-optimization/110422] asm goto vs SRA

2024-01-19 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

--- Comment #5 from Martin Jambor  ---
Fixed on trunk, I plan to backport to open release branches in the upcoming
weeks.

[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863
Bug 89863 depends on bug 94629, which changed state.

Bug 94629 Summary: 10 issues located by the PVS-studio static analyzer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug other/94629] 10 issues located by the PVS-studio static analyzer

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #26 from Martin Jambor  ---
(In reply to Martin Liška from comment #25)
> No, there's still the 'ipa_polymorphic_call_context::set_by_invariant' issue
> that's waiting for Honza.

Finally fixed with:

https://gcc.gnu.org/g:4f4820964ebffc03249d98239a4ad2b43dd1a486

commit r14-8191-g4f4820964ebffc03249d98239a4ad2b43dd1a486
Author: Jan Hubicka 
Date:   Wed Jan 17 19:16:47 2024 +0100

Remove accidental hack in ipa_polymorphic_call_context::set_by_invariant

I managed to commit a hack setting offset to 0 in
ipa_polymorphic_call_context::set_by_invariant.  This makes it to give up
on multiple
inheritance, but most likely won't give bad code since the ohter base will
be of
different type.

gcc/ChangeLog:

* ipa-polymorphic-call.cc
(ipa_polymorphic_call_context::set_by_invariant): Remove
accidental hack reseting offset.

[Bug tree-optimization/110422] asm goto vs SRA

2024-01-17 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Jambor  ---
Mine.

[Bug ipa/112616] [11/12/13/14 Regression] wrong code at -O{s, 2, 3} on x86_64-linux-gnu since r10-3311

2024-01-16 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112616

--- Comment #6 from Martin Jambor  ---
(In reply to Andrew Pinski from comment #1)
>   # q_11 = PHI <0B(2), removed_return.14_14(D)(4),
> removed_return.14_14(D)(3)>
>   _12 = *q_11;
> 
> 
> WTF

Well, _12 is not used anywhere, so the code expects the entire load to be DCEd.
 But it gets optimized to 

  _2 = MEM[(int *)0B]; 

before DCE sees it and then even if _2 is never used anywhere, apparently the
statement is kept there as an intended trap (I guess).

I have adjusted my patch to make DCE for removed returnd part of IPA edge
redirection so that it does not have compare-debug problems and submitted it
for review in: https://inbox.sourceware.org/gcc-patches/ri6cyu1e9kw.fsf@/T/#u

[Bug ipa/108007] [11/12/13/14 Regression] wrong code at -Os and above with "-fno-dce -fno-tree-dce" on x86_64-linux-gnu since r10-3311-gff6686d2e5f797

2024-01-16 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108007

--- Comment #20 from Martin Jambor  ---
I have submitted a slightly modified patch to the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6cyu1e9kw.fsf@/T/#u

[Bug target/113296] [14 Regression] SPEC 2006 434.zeusmp segfaults on Aarch64 when built with -Ofast -march=native -flto

2024-01-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

Martin Jambor  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Martin Jambor  ---
According to our buildbot results, this has resolved itself somewhen between 1
and 2 days ago.

I assume nobody wants to go an investigate what issue it was if it does not
reappear, so let me close the bug.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-01-12 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 113296, which changed state.

Bug 113296 Summary: [14 Regression] SPEC 2006 434.zeusmp segfaults on Aarch64 
when built with -Ofast -march=native -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113296

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

  1   2   3   4   5   6   7   8   9   10   >