[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #2 from nightstrike  ---
Test 16e uses double instead of float, which also crashes.

[Bug c++/109642] False Positive -Wdangling-reference with std::span-like classes

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109642

--- Comment #15 from GCC Commits  ---
The trunk branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:c596ce03120cc22e141186401c6656009ddebdaa

commit r14-8339-gc596ce03120cc22e141186401c6656009ddebdaa
Author: Marek Polacek 
Date:   Mon Jan 22 16:12:33 2024 -0500

c++: extend Wdangling-reference17.C [PR109642]

This patch extends g++.dg/warn/Wdangling-reference17.C with code
from PR109642.  I'm not creating a new test because this one
already #includes the required headers.

PR c++/109642

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference17.C: Additional testing.

[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #3 from Andrew Pinski  ---
Either the stack size or the stack alignment issue.

I am suspecting a stack alignement issue.

[Bug testsuite/113548] gcc.dg/vect/vect-ifcvt-19.c ICEs on LLP64 target

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

--- Comment #3 from nightstrike  ---
Seeing as how this is a testsuite issue, it seems that the crash in the same
location applies to the following:

FAIL: gcc.dg/vect/vect-ifcvt-19.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/vect-ifcvt-19.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal
compiler error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in build2, at
tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal compiler
error: in build2, at tree.cc:5097)

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #10 from JuzheZhong  ---
(In reply to Tamar Christina from comment #9)
> So on SVE the change is cost modelling.
> 
> Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed
> the compiler's defaults to using the new throughput matched cost modelling
> used be newer cores.
> 
> It looks like this changes which mode the compiler picks for when using a
> fixed register size.
> 
> This is because the new cost model (correctly) models the costs for FMAs and
> promotions.
> 
> Before:
> 
> array1[0][_1] 1 times scalar_load costs 1 in prologue
> int) _2 1 times scalar_stmt costs 1 in prologue
> 
> after:
> 
> array1[0][_1] 1 times scalar_load costs 1 in prologue 
> (int) _2 1 times scalar_stmt costs 0 in prologue 
> 
> and the cost goes from:
> 
> Vector inside of loop cost: 125
> 
> to
> 
> Vector inside of loop cost: 83 
> 
> so far, nothing sticks out, and in fact the profitability for VNx4QI drops
> from
> 
> Calculated minimum iters for profitability: 5
> 
> to
> 
> Calculated minimum iters for profitability: 3
> 
> This causes a clash, as this is now exactly the same cost as VNx2QI which
> used to be what it preferred before.
> 
> Which then leads it to pick the higher VF.
> 
> In the end smaller VF shows:
> 
> ;; Guessed iterations of loop 4 is 0.500488. New upper bound 1.
> 
> and now we get:
> 
> Vectorization factor 16 seems too large for profile prevoiusly believed to
> be consistent; reducing.  
> ;; Guessed iterations of loop 4 is 0.500488. New upper bound 0.
> ;; Scaling loop 4 with scale 66.6% (guessed) to reach upper bound 0
> 
> which I guess is the big difference.
> 
> There is a weird costing going on in the PHI nodes though:
> 
> m_108 = PHI  1 times vector_stmt costs 0 in body 
> m_108 = PHI  2 times scalar_to_vec costs 0 in prologue
> 
> they have collapsed to 0. which can't be right..

I don't think this change makes the regression since the regression not only
happens on ARM SVE but also on RVV.
It should be middle-end.

I believe you'd better use -fno-vect-cost-model.

[Bug c++/113547] [13 Regression] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread csfore at posteo dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

--- Comment #3 from Christopher Fore  ---
Backtrace:

In function ‘std::vector package_b_info()’:
cc1plus: internal compiler error: Segmentation fault
0xe4dfcf crash_signal
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/toplev.cc:314
0x759446 error_operand_p(tree_node const*)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/tree.h:4501
0x759446 cp_gimplify_expr(tree_node**, gimple**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/cp/cp-gimplify.cc:550
0xb9c8d7 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16292
0xba3f45 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xba3f45 gimplify_compound_expr
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:6412
0xb9eaba gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16373
0xb9d3a2 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xb9d3a2 gimplify_and_add(tree_node*, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:492
0xb9d3a2 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16751
0xb9e065 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xb9e065 gimplify_statement_list
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:2019
0xb9e065 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16828
0xba576b gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xba576b gimplify_bind_expr
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:1430
0xb9dcd1 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
int)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:16584
0xba1629 gimplify_stmt(tree_node**, gimple**)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:7226
0xba1629 gimplify_body(tree_node*, bool)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:17645
0xba1a02 gimplify_function_tree(tree_node*)
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/gimplify.cc:17844
0xa10d77 cgraph_node::analyze()
   
/var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/cgraphunit.cc:684
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/113547] [13 Regression] c++: In function ‘std::vector package_b_info()’: cc1plus: internal compiler error: Segmentation fault

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113547

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andrew Pinski  ---
(In reply to Christopher Fore from comment #3)
> Backtrace:
> 
> In function ‘std::vector package_b_info()’:
> cc1plus: internal compiler error: Segmentation fault
> 0xe4dfcf crash_signal
>   
> /var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/
> toplev.cc:314
> 0x759446 error_operand_p(tree_node const*)
>   
> /var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/
> tree.h:4501
> 0x759446 cp_gimplify_expr(tree_node**, gimple**, gimple**)
>   
> /var/tmp/portage/sys-devel/gcc-13.2.1_p20240113-r1/work/gcc-13-20240113/gcc/
> cp/cp-gimplify.cc:550


Yep, a dup of bug 113347.

*** This bug has been marked as a duplicate of bug 113347 ***

[Bug c++/113347] [12/13 Regression] ICE during gimplification building TVM since r13-8079-gd237e7b291ff52

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113347

Andrew Pinski  changed:

   What|Removed |Added

 CC||csfore at posteo dot net

--- Comment #9 from Andrew Pinski  ---
*** Bug 113547 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1

2024-01-22 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546

--- Comment #5 from Alex Coplan  ---
FWIW the original preprocessed testcase (regex.i) also started failing with the
same commit (as the reduced testcase).

[Bug target/113550] New: data512_t initializers dereference a clobbered register

2024-01-22 Thread ianthompson at microsoft dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113550

Bug ID: 113550
   Summary: data512_t initializers dereference a clobbered
register
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ianthompson at microsoft dot com
  Target Milestone: ---
Target: aarch64

Created attachment 57189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57189&action=edit
Additional non-minimal failing cases

When initializing or copying a data512_t, the compiler is generating code which
clobbers the register containing the source pointer of the copy. Initially
observed on Arm GNU Toolchain 12.2.Rel1, but this also reproduces on trunk.

Minimal reproduction, hits a segfault when compiled with "aarch64-none-elf-gcc
-march=armv9-a+ls64":

#include 
void test_data512_init() {
data512_t my_value = {};
}

This code generates this assembly snippet for initializing my_value:
adrpx0, .LC0
add x0, x0, :lo12:.LC0
ldp x0, x1, [x0]
ldp x2, x3, [x0, 16]
ldp x4, x5, [x0, 32]
ldp x6, x7, [x0, 48]

Notice that the first ldp clobbers x0, redirecting the remaining 3 loads to
whatever address happens to be in val[0] of the initializer.

Similar incorrect code is generated in many other situations that involve
copying a data512_t (passing a global variable to a function, dereferencing a
data512_t*, etc). See the attached source file for the other failing cases I'm
seeing.

[Bug target/113549] float simd crash on windows in gcc.dg/vect/vect-simd-clone-16b.c

2024-01-22 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113549

--- Comment #4 from nightstrike  ---
(In reply to Andrew Pinski from comment #3)
> Either the stack size or the stack alignment issue.
> 
> I am suspecting a stack alignement issue.

Possibly related: PR110273

[Bug target/113550] data512_t initializers dereference a clobbered register

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113550

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-22
 Ever confirmed|0   |1
   Keywords||wrong-code
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #1 from Andrew Pinski  ---
Should be an easy fix.

The pattern:
(define_insn "*aarch64_movv8di"
  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r")
(match_operand:V8DI 1 "general_operand" " r,r,m"))]
  "(register_operand (operands[0], V8DImode)
|| register_operand (operands[1], V8DImode))"
  "#"
  [(set_attr "type" "multiple,multiple,multiple")
   (set_attr "length" "32,16,16")]
)

Is missing a & on the r/m case.

Or the split could be improved such that the one that gets loadded last is the
one that might conflict:
(define_split
  [(set (match_operand:V8DI 0 "nonimmediate_operand")
(match_operand:V8DI 1 "general_operand"))]
  "reload_completed"
  [(const_int 0)]

aarch64_simd_emit_reg_reg_move handles this case already too.

I am going to go with the improving the define_split ...

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #8 from Roger Sayle  ---
Created attachment 57190
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57190&action=edit
proposed patch

Proposed patch to provide a sane/saner set of rtx_costs for SH.  There's plenty
more that could be done, but these changes are (more than) sufficient to
resolve the code quality regression caused by improved fwprop.  If someone
could try this out on SH, and report back the results, that would be great.

[Bug c++/113531] [14 Regression] AddressSanitizer: stack-use-after-scope when iterating over initializer list

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113531

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |14.0
Summary|AddressSanitizer:   |[14 Regression]
   |stack-use-after-scope when  |AddressSanitizer:
   |iterating over initializer  |stack-use-after-scope when
   |list|iterating over initializer
   ||list

[Bug c++/113531] [14 Regression] AddressSanitizer: stack-use-after-scope when iterating over initializer list since r14-1500-g4d935f52b0d5c0

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113531

--- Comment #1 from Andrew Pinski  ---
It would be useful to get a reduced testcase without the use of the Catch2Main
library.

[Bug c++/90463] Documentation: -Wunused not listed among the options enabled by -Wall

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90463

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Sandra Loosemore :

https://gcc.gnu.org/g:7e758890a4c86db790a5f9aef0191eef77047f65

commit r14-8342-g7e758890a4c86db790a5f9aef0191eef77047f65
Author: Sandra Loosemore 
Date:   Mon Jan 22 22:38:49 2024 +

Correct lists of options enabled by -Wall and -Wextra [PR90463]

gcc/ChangeLog
PR c++/90463
* doc/invoke.texi (Warning Options): Correct lists of options
enabled by -Wall and -Wextra by checking against common.opt
and c-family/c.opt.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #9 from Oleg Endo  ---
(In reply to Roger Sayle from comment #8)
> Created attachment 57190 [details]
> proposed patch
> 
> Proposed patch to provide a sane/saner set of rtx_costs for SH.  There's
> plenty more that could be done, but these changes are (more than) sufficient
> to resolve the code quality regression caused by improved fwprop.  If
> someone could try this out on SH, and report back the results, that would be
> great.


You've added differentiation for 'speed ?' in 'sh_address_cost'.  Like this
one.

   /* 'GBR + 0'.  Account one more because of R0 restriction.  */
   if (REG_P (x) && REGNO (x) == GBR_REG)
-return 2;
+return speed ? 2 : 0;

What's the intention here?  Why does the cost of the address computation
reduced when not optimizing for speed?  It distorts the address costs and makes
them all equal.

[Bug c/89180] [meta-bug] bogus/missing -Wunused warnings

2024-01-22 Thread sandra at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89180
Bug 89180 depends on bug 90463, which changed state.

Bug 90463 Summary: Documentation: -Wunused not listed among the options enabled 
by -Wall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90463

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug c++/90463] Documentation: -Wunused not listed among the options enabled by -Wall

2024-01-22 Thread sandra at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90463

sandra at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from sandra at gcc dot gnu.org ---
Marking this fixed now.

[Bug target/53929] [meta-bug] -masm=intel with global symbol

2024-01-22 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929

--- Comment #25 from LIU Hao  ---
Created attachment 57191
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57191&action=edit
Draft patch

This is a draft patch, bootstrapped on {i686,x86_64}-w64-mingw32 successfully.
Haven't run tests though.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #10 from Roger Sayle  ---
Hi Oleg.  Great question.  The "speed" parameter passed to rtx_costs, and
address_cost indicates whether the middle-end is optimizing for peformance, and
interested in the nummber of cycles taken by each instruction, or optimizing
for size, and interested in the number of bytes used to encode the instruction.
 Previously, this speed parameter was ignored by the SH backend, so the costs
were the same independent of the objective function.

In my proposed patch, the address cost (1) when optimizing for size attempts to
return the additional size of an instruction based on the addressing mode.  For
register, and reg+reg addressing modes there is no size increase (overhead),
and for adressing modes with displacements, and displacements to address
pointers, there is a cost.  (2) when optimizing for speed, address cost remains
between 0 and 3, and is used to prioritize between (equivalent numbers of)
instructions.  Normally, rtx_costs are defined in terms of COST_N_INSNS, which
multiplies by 4.  Hence on many platforms a single instruction that references
memory may be encoded as COSTS_N_INSNS(1)+1 (or a more complex addressing mode
as COSTS_N_INSNS(1)+2) to show that this is disfavored to a single instruction
that doesn't reference memory, COSTS_N_INSNS(1)+0.

This is the fix for this particular regression; SIGN_EXTEND of a register now
costs COSTS_N_INSNS(1), and SIGN_EXTEND of a MEM now costs COSTS_N_INSNS(1)+1.

A useful way to debug rtx_costs is to use the -dP command line option, and then
look at the [c=X, l=Y] annotations in the assembly language file.  One way to
check/confirm that these are sensible is that ideally they should be correlated
when optimizing for size (with -Os or -Oz).

I've found an interesting table of SH cycle counts (for different CPUs) at
http://www.shared-ptr.com/sh_insns.html and these could be used to improve
sh_rtx_costs further.  For example, SH currently reports multiplications as
a single cycle operation, which doesn't match the hardware specs, and prevents
GCC from using synth_mult to produce faster (or shorter) sequences using shifts
and additions.  Likewise, sh_rtx_costs doesn't distinguish the machine mode,
so the costs of SImode multiplications are the same as DImode multiplications.

In comment #5 you mention GCC's defaults; it turns out that for rtx_costs the
default values that would be provided by the middle-end, may be more accurate
than the values (currently) specified by the backend.

I hope this answers your question.

[Bug rtl-optimization/113551] New: Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Bug ID: 113551
   Summary: Miscompilation with -O1 -funswitch-loops
-fno-strict-overflow
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yshuiv7 at gmail dot com
  Target Milestone: ---

Code:

struct obj {
int __pad;
int i;
};

/* aborts when called with NULL */
int assert_not_null(void *); 

void bug(struct obj **root, struct obj *dso) {
while (1) {
struct obj *this = *root;

if (dso == (void *)0)
// should return here
return;

if (dso == this)
return;

// shouldn't reach here
assert_not_null(dso);

if (!&dso->i)
break;
}
}

// call like this: bug(&obj, NULL);

Result:

* -O1: ok
* -O1 -funswitch-loops: ok
* -O1 -fno-strict-overflow: ok
* -O1 -funswitch-loops -fno-strict-overflow: abort

[Bug rtl-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

--- Comment #1 from Yuxuan Shui  ---
code is reduced from perf, source file util/dsos.c

[Bug rtl-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

--- Comment #2 from Yuxuan Shui  ---
regression from 12.3 -> 13.2

[Bug tree-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|tree-optimization
   Keywords||wrong-code

--- Comment #3 from Andrew Pinski  ---
Looks like the unswitch is happening when it should not be ...

[Bug tree-optimization/113551] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||5.4.0

--- Comment #4 from Andrew Pinski  ---
The incorrect unswitch has been happening since at least GCC 5 ...

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-01-22 Thread dje at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

--- Comment #4 from David Edelsohn  ---
rs6000-ibm-aix doesn't exist anymore.  This should have been configured as
powerpc-ibm-aix7.2 .  Maybe there is some magic about the "powerpc" name?

Those variables are provided by generated files and apparently something is not
generating them when building a cross compiler.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

2024-01-22 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #11 from Oleg Endo  ---
(In reply to Roger Sayle from comment #10)

> I've found an interesting table of SH cycle counts (for different CPUs) at
> http://www.shared-ptr.com/sh_insns.html

Yeah, I know.  I did that ;)

> In my proposed patch, the address cost (1) when optimizing for size attempts
> to return the additional size of an instruction based on the addressing
> mode.  For register, and reg+reg addressing modes there is no size increase
> (overhead), and for adressing modes with displacements, and displacements to
> address pointers, there is a cost.

AFAIR, I've added the 'sh_address_cost' function.  The intention was/is to
encourage/discourage usage of certain address modes based on the side effects
and impact on the surrounding code.  All insns/addr modes have the same length
and basically same execution time.  However, e.g. @(reg+reg) has a constraint
on 'r0' usage, so I weighted that heavier.  If there's anything that could use
@(reg+disp) as an alternative, that'd be better in some cases. (not sure if
such optimizations actually are done...)

> (2) when optimizing for speed, address
> cost remains between 0 and 3, and is used to prioritize between (equivalent
> numbers of) instructions.  Normally, rtx_costs are defined in terms of
> COST_N_INSNS, which multiplies by 4.  Hence on many platforms a single
> instruction that references memory may be encoded as COSTS_N_INSNS(1)+1 (or
> a more complex addressing mode as COSTS_N_INSNS(1)+2) to show that this is
> disfavored to a single instruction that doesn't reference memory,
> COSTS_N_INSNS(1)+0.

That's actually what sh_rtx_costs was supposed to do as well.  I think in usual
cases it does that, only that apparently I've screwed up the {SIGN|ZERO}_EXTEND
for the case of the mem load and it shows up only now, many years later.

It's still not entirely clear to me why we would want to squash the costs of
addresses to 0 when optimizing for size?  What does effect does it have on the
generated code?  I can't imagine how it would be possibly making any smaller
code?

With your patch, in case of the SIGN_EXTEND with mem operand, it would make the
address cost 0 with -Os, which would return COSTS_N_INSNS(1) for reg operand as
well as mem operand.  So both insns are equally weighted and could be
considered interchangeable.  And we might bump into this type of regression
again, if some (future) optimization decides that it can interchange/substitute
insns of the same cost... 


> For example, SH currently reports multiplications as a single cycle operation,

That doesn't seem to be the case.  It's supposed to be using the function
'multcosts' in sh.cc, which returns at least a cost of '2'.  Note that on SH1
and SH2 there is no dynamic (barrel) shift.  So actually some multiplications
could be faster than stitched shifts.


> sh_rtx_costs doesn't distinguish the machine mode, so the costs of SImode 
> multiplications are the same as DImode multiplications.

I guess this is because SH doesn't have real DImode multiplication (64 x 64 ->
64/128 bit).  It can only do 32 x 32 -> 64 bit widening multiplication.  Any
real DImode multiplication will result in either expanded sequence to calculate
sum of particial products or a libcall, AFAIR

[Bug tree-optimization/113551] [13/14 Regression] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Summary|Miscompilation with -O1 |[13/14 Regression]
   |-funswitch-loops|Miscompilation with -O1
   |-fno-strict-overflow|-funswitch-loops
   ||-fno-strict-overflow
   Last reconfirmed||2024-01-23
 Status|UNCONFIRMED |NEW
  Known to fail|5.4.0   |13.2.0
   Target Milestone|--- |13.3

--- Comment #5 from Andrew Pinski  ---
Confirmed at least for the bad unswitch which causes the other wrong code to
happen.

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-01-22 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||segher at gcc dot gnu.org
   Last reconfirmed||2024-01-23
 Ever confirmed|0   |1

--- Comment #5 from Kewen Lin  ---
(In reply to H.J. Lu from comment #3)
> (In reply to Kewen Lin from comment #2)
> > Guessing /usr/local/bin/ld is a gnu ld? Based on what I heard before, gnu ld
> > has some problems on aix, people pass object files to aix system and use aix
> > ld there. Not sure if the understanding still holds.
> 
> I am building a cross compiler.  No AIX tools are involved.

Thanks for clarifying, I was dull and misunderstood it.

Confirmed, some symbols are from rs6000-builtin.cc (which is not generated) but
it requires some symbols in rs6000-builtins.cc (which is generated). Both
object files are not included in linking. The below diff can fix it:

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..6b62e4fe56c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -557,8 +557,10 @@ rs6000*-*-*)
 extra_options="${extra_options} g.opt fused-madd.opt
rs6000/rs6000-tables.opt"
 extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
 extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+extra_objs="${extra_objs} rs6000-builtin.o rs6000-builtins.o"
 target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-logue.cc
\$(srcdir)/config/rs6000/rs6000-call.cc"
 target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
+target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
 ;;
 sparc*-*-*)
 cpu_type=sparc

According to David's comments "rs6000-ibm-aix doesn't exist any more" and I
vaguely remembered Segher also mentioned rs6000*-*-*) becomes stale, maybe we
can aggressively drop the whole rs6000*-*-*) case handling?

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #11 from Richard Biener  ---
(In reply to Tamar Christina from comment #9)
> There is a weird costing going on in the PHI nodes though:
> 
> m_108 = PHI  1 times vector_stmt costs 0 in body 
> m_108 = PHI  2 times scalar_to_vec costs 0 in prologue
> 
> they have collapsed to 0. which can't be right..

Note this is likely because of the backend going wrong.

bool
vectorizable_phi (vec_info *,
  stmt_vec_info stmt_info, gimple **vec_stmt,
  slp_tree slp_node, stmt_vector_for_cost *cost_vec)
{
..

  /* For single-argument PHIs assume coalescing which means zero cost
 for the scalar and the vector PHIs.  This avoids artificially
 favoring the vector path (but may pessimize it in some cases).  */
  if (gimple_phi_num_args (as_a  (stmt_info->stmt)) > 1)
record_stmt_cost (cost_vec, SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node),
  vector_stmt, stmt_info, vectype, 0, vect_body);

You could check if we call this with sane values.

[Bug tree-optimization/113476] [14 Regression] irange::maybe_resize leaks memory via IPA VRP

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113476

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-23
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #5 from Richard Biener  ---
(In reply to Martin Jambor from comment #4)
> The right place where to free stuff in lattices post-IPA would be in
> ipa_node_params::~ipa_node_params() where we should iterate over lattices
> and deinitialize them or perhaps destruct the array because since
> ipcp_vr_lattice directly contains Value_Range which AFAIU directly contains
> int_range_max which has a virtual destructor... does not look like a POD
> anymore.  This has escaped me when I was looking at the IPA-VR changes but
> hopefully it should not be too difficult to deal with.

OK, that might work for the IPA side.

It's quite unusual to introduce a virtual DTOR in the middle of the class
hierarchy though.  Grepping I do see quite some direct uses of 'irange'
and also 'vrange' which do not have the DTOR visible but 'irange' already
exposes and uses 'maybe_resize'.  I think those should only be introduced
in the class exposing the virtual DTOR (but why virtual?!).

Would be nice to have a picture of the range class hierarchies with
pointers on which types to use in which circumstances ...

For example:

  Value_Range vr (parm_type);
...
   irange &r = as_a  (vr);
   irange_bitmask bm = r.get_bitmask ();
...

should that really use 'irange'?  Why not int_range&?

All the complication might be because of GC (irange is GTY but int_range is
not), but re-allocation would happen with 'new', not ggc_alloc, so ...

But yes, please try to fix IPA CP, I'll see if this pops up elsewhere as well
then.

[Bug debug/112718] [11/12/13/14 Regression] ICE: in add_dwarf_attr, at dwarf2out.cc:4501 with -g -fdebug-types-section -flto -ffat-lto-objects

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112718

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:7218f5050cb7163edae331f54ca163248ab48bfa

commit r14-8345-g7218f5050cb7163edae331f54ca163248ab48bfa
Author: Richard Biener 
Date:   Mon Jan 22 15:42:59 2024 +0100

debug/112718 - reset all type units with -ffat-lto-objects

When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.

PR debug/112718
* dwarf2out.cc (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.

* gcc.dg/debug/pr112718.c: New testcase.

[Bug debug/112718] [11/12/13 Regression] ICE: in add_dwarf_attr, at dwarf2out.cc:4501 with -g -fdebug-types-section -flto -ffat-lto-objects

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112718

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0
   Priority|P3  |P2
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE:
   |ICE: in add_dwarf_attr, at  |in add_dwarf_attr, at
   |dwarf2out.cc:4501 with -g   |dwarf2out.cc:4501 with -g
   |-fdebug-types-section -flto |-fdebug-types-section -flto
   |-ffat-lto-objects   |-ffat-lto-objects

--- Comment #4 from Richard Biener  ---
Fixed on trunk sofar.

[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8

2024-01-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:a98d5130a6dcff2ed4db371e500550134777b8cf

commit r14-8346-ga98d5130a6dcff2ed4db371e500550134777b8cf
Author: Richard Biener 
Date:   Mon Jan 15 12:55:20 2024 +0100

rtl-optimization/113255 - base_alias_check vs. pointer difference

When the x86 backend generates code for cpymem with the rep_8byte
strathegy for the 8 byte aligned main rep movq it needs to compute
an adjusted pointer to the source after doing a prologue aligning
the destination.  It computes that via

  src_ptr + (dest_ptr - orig_dest_ptr)

which is perfectly fine.  On RTL this is then

8: r134:DI=const(`g'+0x44)
9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
  REG_UNUSED flags:CC
   56: r129:DI=const(`g'+0x4c)
   57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x4c)&0xfff8
   58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
  REG_DEAD r134:DI
  REG_UNUSED flags:CC
  REG_EQUAL const(`g'+0x44)-r129:DI
   59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
  REG_DEAD r133:DI
  REG_UNUSED flags:CC

but as written find_base_term happily picks the first candidate
it finds for the MINUS which means it picks const(`g') rather
than the correct frame:DI.  This way find_base_term (but also
the unfixed find_base_value used by init_alias_analysis to
initialize REG_BASE_VALUE) performs pointer analysis isn't
sound.  The following restricts the handling of multi-operand
operations to the case we know only one can be a pointer.

This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
RTL PRE (I've opened PR113395 for this).  A more drastic patch,
removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
regressing (so testsuite coverage is bad).  I've looked at
gcc.dg/tree-ssa tests and mostly scheduling changes are present,
the cc1plus .text size is only 230 bytes worse.  With the this
less drastic patch below most scheduling changes are gone.

x86_64 might not the very best target to test for impact, but
test coverage on other targets is unlikely to be very much better.

PR rtl-optimization/113255
* alias.cc (find_base_term): Remove PLUS/MINUS handling
when both operands are not CONST_INT_P.

* gcc.dg/torture/pr113255.c: New testcase.

[Bug target/113255] [11/12/13 Regression] wrong code with -O2 -mtune=k8

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
  Known to work||14.0
 Status|NEW |ASSIGNED
Summary|[11/12/13/14 Regression]|[11/12/13 Regression] wrong
   |wrong code with -O2 |code with -O2 -mtune=k8
   |-mtune=k8   |

--- Comment #13 from Richard Biener  ---
Fixed on trunk sofar.

[Bug middle-end/113540] missing -Warray-bounds warning with malloc and a simple loop

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113540

Richard Biener  changed:

   What|Removed |Added

 Blocks||56456
   Keywords||diagnostic
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-23

--- Comment #1 from Richard Biener  ---
If you remove the volatile, like

#include 

char *foo (void)
{
  char *t;
  t = malloc (4);
  for (int i = 0; i <= 4; i++)
t[i] = 0;
  return t;
}

you get

t.c: In function 'foo':
t.c:8:10: warning: '__builtin_memset' writing 5 bytes into a region of size 4
[-Wstringop-overflow=]
8 | t[i] = 0;
  | ~^~~
t.c:6:7: note: destination object of size 4 allocated by 'malloc'
6 |   t = malloc (4);
  |   ^~

note this is because we then unroll the loop.  If you change it like

#include 

short *foo (void)
{
  short *t;
  t = malloc (8);
  for (int i = 0; i <= 4; i++)
t[i] = 13;
  return t;
}

you get

t.c: In function 'foo':
t.c:8:6: warning: array subscript 4 is outside array bounds of 'short int[4]'
[-Warray-bounds=]
8 | t[i] = 13;
  | ~^~~
t.c:6:7: note: at offset 8 into object of size 8 allocated by 'malloc'
6 |   t = malloc (8);
  |   ^~

because we unroll the loop.  Upping the bounds like

#include 

short *foo (void)
{
  short *t;
  t = malloc (64);
  for (int i = 0; i <= 32; i++)
t[i] = 13;
  return t;
}

no longer warns because we hit unroll limits.  This is also the reason
we do not diagnose the original testcase - there's currently no analysis
done to compute the set of values 'i' must reach for the purpose of
array-bound diagnostics.  Instead we use value-ranges which are
conservative, aka [-INF, INF] is "correct".  But that means we only
diagnose cases where _all_ values of the range fall outside of the
array.

Using niter analysis and SCEV we could do a better job in cases like the
one in this bug.

I'm quite sure we have related/duplicate bugreports for this already.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
[Bug 56456] [meta-bug] bogus/missing -Warray-bounds

[Bug c++/113541] Rejects __attribute__((section)) on explicit instantiation declaration of ctor/dtor

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113541

Richard Biener  changed:

   What|Removed |Added

   Keywords||rejects-valid
Version|unknown |14.0
  Known to work||4.9.4
  Known to fail||5.1.0

--- Comment #2 from Richard Biener  ---
It sounds like an issue with the C++ mandated aliases.

But I'll note that the template instantiations have to adhere to certain
linkage so I wonder if simply putting them into a different section isn't going
to break the ABI.

[Bug testsuite/113548] gcc.dg/vect/vect-ifcvt-19.c ICEs on LLP64 target

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113548

--- Comment #4 from Richard Biener  ---
Note for 'sizetype' you want to use '__SIZETYPE__', not '__SIZE_TYPE__'

[Bug middle-end/113552] New: [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Bug ID: 113552
   Summary: [11/12/13/14 Regression] vectorizer generates calls to
vector math routines with 1 simd lane.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: link-failure
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64-*

In GCC 7 the Arm vector PCS was implemented to support libmvec but the libmvec
component never made it into glibc until now.

GLIBC 2.39 which will be paired with GCC 14 now implements the vector math
routines.

However consider this function:

> cat cosmo.fppized3.f
  SUBROUTINE a(b)
  DIMENSION b(3,0)
  COMMON c
  DO 4 m=1,c
 DO 4 d=1,3
 b(d,m)=b(d,m)+COS(5.0D00*m)
   4  CONTINUE
  END
  DIMENSION e(53)
  DIMENSION f(6,91),g(6,91),h(6,91),
 *  i(6,91),j(6,91),k(6,86)
  DIMENSION l(107)
  END

and compiled with headers from a glibc 2.39:

> aarch64-unknown-linux-gnu-gfortran -S -o - -Ofast 
> -L/data/repro/glibc/usr/lib64 -I/data/repro/glibc/include 
> --sysroot=/data/repro/glibc -w cosmo.fppized3.f

produces:

fmulv13.2d, v13.2d, v19.2d
fmovd0, d13
bl  _ZGVnN1v_cos
fmovd12, d0
dup d0, v13.d[1]
bl  _ZGVnN1v_cos
fmovd31, d0
stp d12, d31, [sp, 96]

which has deconstructed the vector to scalar and performs a vector call with 1
element.
This is not just inefficient but _ZGVnN1v_cos does not exist in glibc as such
code is produced that we cannot link.

It looks like the vectorizer starts with 4 floats and widens to 2x 2 double. 
But then during vectorizable simd this is again split into multiple vectors,
even though the operation already fits in a vector:

cosmo.fppized3.f:4:13: note:   -->vectorizing SLP node starting from: _49 =
__builtin_cos (_48);
cosmo.fppized3.f:4:13: note:   vect_is_simple_use: operand _47 * 5.0e+0, type
of def: internal
cosmo.fppized3.f:4:13: note:   transform call.
cosmo.fppized3.f:4:13: note:   add new stmt: _132 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _133 = cos.simdclone.0 (_132);
cosmo.fppized3.f:4:13: note:   add new stmt: _134 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _135 = cos.simdclone.0 (_134);
cosmo.fppized3.f:4:13: note:   add new stmt: vect__49.27_136 = {_133, _135};
cosmo.fppized3.f:4:13: note:   add new stmt: _137 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _138 = cos.simdclone.0 (_137);
cosmo.fppized3.f:4:13: note:   add new stmt: _139 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _140 = cos.simdclone.0 (_139);
...

Because we happen to have a V1DF mode that is meant to only be used by some
intrinsics the operation succeeds.

So several issues here:

1. We should remove the new libmvec headers from glibc from applying to GCC
10,9,8,7 since we can't fix those anymore.  So we need a GCC version check on
them, however glibc is now frozen for release.
2. The vectorizer should not decompose a simd call if the input and result
don't require it.
3. We shouldn't generate a call with simdlen 1.  That said in theory this could
still be beneficial because it would allow the rest of the code to vectorize
and the vector pcs is cheaper to call.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P1
  Component|middle-end  |tree-optimization

[Bug tree-optimization/113551] [13/14 Regression] Miscompilation with -O1 -funswitch-loops -fno-strict-overflow

2024-01-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113551

Richard Biener  changed:

   What|Removed |Added

   Keywords||needs-bisection
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #6 from Richard Biener  ---
trunk doesn't unswitch for me (needs bisection).  Let me check what happens on
the branch.

<    1   2