date:20231013

[Bug c++/111806] g++ generates better code for variant at -Os compared to -O3

2023-10-13 Thread hiraditya at msn dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111806

--- Comment #1 from AK  ---
It seems like we could 'sink' the 4 common instructions (of .L2) at -O3

L2:
add rsp, 48
xor eax, eax
pop rbx
ret


Or is it due to some kind of tail duplication?

[Bug c++/111806] New: g++ generates better code for variant at -Os compared to -O3

2023-10-13 Thread hiraditya at msn dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111806

Bug ID: 111806
   Summary: g++ generates better code for variant at
-Os compared to -O3
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include
#include

int foo() {
std::variant v {"abc"};
std::cout << std::get<0>(v);
return 0;
}

g++ -O3 -std=c++20 -g0  -fno-exceptions

foo():
.LFB2484:
pushrbx
mov eax, 25185
mov edx, 3
mov edi, OFFSET FLAT:_ZSt4cout
sub rsp, 48
lea rbx, [rsp+16]
mov WORD PTR [rsp+16], ax
mov rsi, rbx
mov QWORD PTR [rsp], rbx
mov BYTE PTR [rsp+18], 99
mov QWORD PTR [rsp+8], 3
mov BYTE PTR [rsp+19], 0
mov BYTE PTR [rsp+32], 0
callstd::basic_ostream >&
std::__ostream_insert >(std::basic_ostream >&, char const*, long)
cmp BYTE PTR [rsp+32], 0
je  .L5
.L2:
add rsp, 48
xor eax, eax
pop rbx
ret
.L5:
mov rdi, QWORD PTR [rsp]
cmp rdi, rbx
je  .L2
mov rax, QWORD PTR [rsp+16]
lea rsi, [rax+1]
calloperator delete(void*, unsigned long)
add rsp, 48
xor eax, eax
pop rbx
ret
.LFE2484:


g++ -Os -std=c++20 -g0  -fno-exceptions


foo():
.LFB2463:
pushrbx
mov edx, 3
mov edi, OFFSET FLAT:_ZSt4cout
sub rsp, 48
lea rbx, [rsp+24]
mov WORD PTR [rsp+24], 25185
mov rsi, rbx
mov QWORD PTR [rsp+8], rbx
mov BYTE PTR [rsp+26], 99
mov QWORD PTR [rsp+16], 3
mov BYTE PTR [rsp+27], 0
mov BYTE PTR [rsp+40], 0
callstd::basic_ostream >&
std::__ostream_insert >(std::basic_ostream >&, char const*, long)
cmp BYTE PTR [rsp+40], 0
jne .L2
mov rdi, QWORD PTR [rsp+8]
cmp rdi, rbx
je  .L2
mov rax, QWORD PTR [rsp+24]
lea rsi, [rax+1]
calloperator delete(void*, unsigned long)
.L2:
add rsp, 48
xor eax, eax
pop rbx
ret
.LFE2463:


https://godbolt.org/z/3xKh35Mrv

[Bug c++/111805] New: suboptimal codegen of variant

2023-10-13 Thread hiraditya at msn dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111805

Bug ID: 111805
   Summary: suboptimal codegen of variant
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include

std::string foo() {
std::variant v {"abc"};
return std::get<0>(v);
}

g++-13.2 -O2 -std=c++20


foo[abi:cxx11]():
lea rdx, [rdi+16]
mov BYTE PTR [rdi+18], 99
mov rax, rdi
mov QWORD PTR [rdi], rdx
mov edx, 25185
mov WORD PTR [rdi+16], dx
mov QWORD PTR [rdi+8], 3
mov BYTE PTR [rdi+19], 0
ret


clang++ -O2 -std=c++20

foo():# @foo()
mov rax, rdi
mov byte ptr [rdi], 6
mov dword ptr [rdi + 1], 6513249
ret


https://godbolt.org/z/nTv5rYanM

[Bug c/111804] wrong code with '-O3 -fno-inline-functions-called-once -fno-inline-small-functions -fno-toplevel-reorder -fno-tree-fre'

2023-10-13 Thread 19373742 at buaa dot edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111804

--- Comment #1 from CTC <19373742 at buaa dot edu.cn> ---
Created attachment 56104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56104&action=edit
The compiler output

[Bug c/111804] New: wrong code with '-O3 -fno-inline-functions-called-once -fno-inline-small-functions -fno-toplevel-reorder -fno-tree-fre'

2023-10-13 Thread 19373742 at buaa dot edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111804

Bug ID: 111804
   Summary: wrong code with '-O3 -fno-inline-functions-called-once
-fno-inline-small-functions -fno-toplevel-reorder
-fno-tree-fre'
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 19373742 at buaa dot edu.cn
  Target Milestone: ---

Created attachment 56103
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56103&action=edit
The preprocessed file

***
OS and Platform:
Ubuntu 20.04.4 LTS
***
gcc version:
$ gcc -v
Using built-in specs.
COLLECT_GCC=/home/ctc/gcc-releases/gcc-14/bin/gcc
COLLECT_LTO_WRAPPER=/home/ctc/gcc-releases/gcc-14/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ./configure --prefix=/home/cuisk/ctc/gcc-releases/gcc-14
--disable-multilib --enable-language=c,c++
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20231008 (experimental) (GCC) 
***
Command Lines:
$ gcc -I ~/csmith/include/csmith-2.3.0/ -O3 -fno-inline-functions-called-once
-fno-inline-small-functions -fno-toplevel-reorder -fno-tree-fre a.c -o w
2>w.txt
$ ./w
w: a.c:292: func_69: Assertion `l_71 == &l_72 || l_71 == 0' failed.
Aborted (core dumped)

$ gcc -I ~/csmith/include/csmith-2.3.0/ -fsanitize=undefined a.c -o w
$ ./w
checksum = 0

[Bug tree-optimization/103216] missed optimization, phiopt/vrp?

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103216

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||TREE

--- Comment #10 from Andrew Pinski  ---
So this is almost fixed on the trunk at -O2:
  d_4 = (signed char) a_2(D);
  if (d_4 < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 633507680]:
  _6 = (signed char) a_2(D);

   [local count: 1073741824]:
  # prephitmp_8 = PHI <_6(3), d_4(2)>

The only thing missing is seeing that d_4 and _6 are the same ...

Maybe factor_out_conditional_operation could detect this ...

  d_4 = (signed char) a_2(D);
  if (d_4 < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 633507680]:
  _6 = (signed char) a_2(D);

   [local count: 1073741824]:
  # prephitmp_8 = PHI <_6(3), d_4(2)>

factor_out_conditional_operation could factor out the cast to the definition of
d_4 ...

Anyways the original code is now optimized (via RTL level) so maybe it is not
as important.

[Bug c++/111803] Template deduction failure for baseclass member pointer with template data type

2023-10-13 Thread grbrown93 at sbcglobal dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111803

--- Comment #3 from grbrown  ---
(In reply to grbrown from comment #2)
> From my tests though, removing the 'template' will allow it to
> work, so I don't know if you're explanation is entirely correct.

I.e. replace

template
void print(DataType ClassType::*member)
{
}

with

void print(float ClassType::*member)
{
}

[Bug c++/111803] Template deduction failure for baseclass member pointer with template data type

2023-10-13 Thread grbrown93 at sbcglobal dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111803

--- Comment #2 from grbrown  ---
(In reply to Andrew Pinski from comment #1)
> Note clang also rejects this code for the same reason as GCC.
> 
> Here is a reduced version which shows the difference between ICC/MSVC and
> GCC/clang:
> ```
> struct A
> {
>   int data;
> };
> struct B : A{};
> 
> struct Container1
> {
>   template
>   void print(DataType B::*member);
> };
> 
> void setup1(Container1& container)
> {
>   container.print(&A::data); // BROKEN IN GCC!
> }
> ```
> 
> Basically MSVC/ICC can cast between `float B::*` and `float A::*` ...

>From my tests though, removing the 'template' will allow it to work,
so I don't know if you're explanation is entirely correct.

[Bug c++/111803] Template deduction failure for baseclass member pointer with template data type

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111803

--- Comment #1 from Andrew Pinski  ---
Note clang also rejects this code for the same reason as GCC.

Here is a reduced version which shows the difference between ICC/MSVC and
GCC/clang:
```
struct A
{
  int data;
};
struct B : A{};

struct Container1
{
  template
  void print(DataType B::*member);
};

void setup1(Container1& container)
{
  container.print(&A::data); // BROKEN IN GCC!
}
```

Basically MSVC/ICC can cast between `float B::*` and `float A::*` ...

[Bug c++/111803] New: Template deduction failure for baseclass member pointer with template data type

2023-10-13 Thread grbrown93 at sbcglobal dot net via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111803

Bug ID: 111803
   Summary: Template deduction failure for baseclass member
pointer with template data type
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: grbrown93 at sbcglobal dot net
  Target Milestone: ---

Compiler error tested on versions 4.4, 6.1, 7.1, 10.1, 12.1, 13.2
Compiles fine on icpc and MSVC

> g++ test.cpp -o test
error: no matching function for call to 'Container::print(float A::*)'

It should be trying to call 'Container::print(float B::*)' but it's deducing
'A' from the 'data' member rather than 'B' from the 'ClassType' template arg.
Weirdly, it gets the correct call when using 'template print' though
(guess that makes it follow a simpler logic path?).


Reproducible sample:

template
class Container
{
public:
template
void print(DataType ClassType::*member)
{
}
};

class A
{
public:
float data;

template
static void setup(Container& container)
{
container.print(&ClassType::data); // BROKEN IN GCC!
container.template print(&ClassType::data); //
WORKAROUND
container.print((float ClassType::*)&ClassType::data); //
WORKAROUND
}
};

class B : public A
{
public:
double data2;

template
static void setup(Container& container)
{
A::setup(container);
container.print(&ClassType::data2); // WORKS FINE
}
};

int main()
{
Container container;
B::setup(container);
return 0;
}

[Bug fortran/111218] Conflict in BIND(C) INTERFACEs in two Modules leads to ICE.

2023-10-13 Thread kargl at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111218

--- Comment #7 from kargl at gcc dot gnu.org ---
An alternative patch that allows the original code to compile is

diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index a6078bc608a..34eb3a65e8f 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -3160,7 +3160,7 @@ static void
 ambiguous_symbol (const char *name, gfc_symtree *st)
 {

-  if (st->n.sym->error)
+  if (st->n.sym->error || (st->n.sym->module && !gfc_current_locus.lb))
 return;

   if (st->n.sym->module)

This one simply returns.  If one actually tries to reference the
conflicting entity an error message is generated.

[Bug fortran/111218] Conflict in BIND(C) INTERFACEs in two Modules leads to ICE.

2023-10-13 Thread kargl at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111218

kargl at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4

--- Comment #6 from kargl at gcc dot gnu.org ---

>if (st->n.sym->module)
> -gfc_error ("Name %qs at %C is an ambiguous reference to %qs "
> -"from module %qs", name, st->n.sym->name, st->n.sym->module);
> +{
> +  if (gfc_current_locus.lb)
> + gfc_error ("Name %qs at %C is an ambiguous reference to %qs from"

A space is missing after "... from".  Should be "... from "

> +"module %qs", name, st->n.sym->name, st->n.sym->module);
> +  else
> + gfc_error ("Name %qs is an ambiguous reference to %qs, which is "
> +"available through USE association from module %qs at %L",
> +name, st->n.sym->name, st->n.sym->module,
> +&st->n.sym->declared_at);
> +}
>else
>  gfc_error ("Name %qs at %C is an ambiguous reference to %qs "
>  "from current program unit", name, st->n.sym->name);

The original code is too painful to read (aka uppercase and no formatting).
The claim is that the code is valid.  Maybe it is as the conflicting 
names in the modules are not referenced in the program unit.  If OP
references SET_ABOR1_EXCEPTION_HANDLER, good luck.

[Bug fortran/111218] Conflict in BIND(C) INTERFACEs in two Modules leads to ICE.

2023-10-13 Thread kargl at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111218

--- Comment #5 from kargl at gcc dot gnu.org ---
Interesting bug.  If one puts a break point ...

0x75917d gfc_format_decoder
/home/toon/compilers/gcc/gcc/fortran/error.cc:1078
0x2153e1f pp_format(pretty_printer*, text_info*)
/home/toon/compilers/gcc/gcc/pretty-print.cc:1475
0x21315be diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
/home/toon/compilers/gcc/gcc/diagnostic.cc:1606
0x9b628e gfc_report_diagnostic
/home/toon/compilers/gcc/gcc/fortran/error.cc:890
0x9b628e gfc_error_opt
/home/toon/compilers/gcc/gcc/fortran/error.cc:1460
0x9b7470 gfc_error(char const*, ...)
/home/toon/compilers/gcc/gcc/fortran/error.cc:1489
0xa6205b ambiguous_symbol
/home/toon/compilers/gcc/gcc/fortran/symbol.cc:3167

here, one ends up at 

gfc_error ("Name %qs at %C is an ambiguous reference to %qs from"
   "module %qs", name, st->n.sym->name, st->n.sym->module);

%C means one should use gfc_current_locus to print the offending line.
If one looks are gfc_current_locus, one sees gfc_current_locus.lb = 0x0,
an infamous NULL pointer dereference then occurs.  Ouch.  This patch 
fixes the issue:

diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index a6078bc608a..5b726f8b715 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -3164,8 +3164,16 @@ ambiguous_symbol (const char *name, gfc_symtree *st)
 return;

   if (st->n.sym->module)
-gfc_error ("Name %qs at %C is an ambiguous reference to %qs "
-  "from module %qs", name, st->n.sym->name, st->n.sym->module);
+{
+  if (gfc_current_locus.lb)
+   gfc_error ("Name %qs at %C is an ambiguous reference to %qs from"
+  "module %qs", name, st->n.sym->name, st->n.sym->module);
+  else
+   gfc_error ("Name %qs is an ambiguous reference to %qs, which is "
+  "available through USE association from module %qs at %L",
+  name, st->n.sym->name, st->n.sym->module,
+  &st->n.sym->declared_at);
+}
   else
 gfc_error ("Name %qs at %C is an ambiguous reference to %qs "
   "from current program unit", name, st->n.sym->name);

[Bug tree-optimization/111432] `bool & (a|1)` is not optimized to just `bool`

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111432

Andrew Pinski  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-October
   ||/632988.html
   Keywords||patch

--- Comment #2 from Andrew Pinski  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632988.html

[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function

2023-10-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

JuzheZhong  changed:

   What|Removed |Added

 CC||juzhe.zhong at rivai dot ai

--- Comment #3 from JuzheZhong  ---
This is because RISC-V didn't vectorize popcount:

It just a scalar call popcount:

call__popcountdi2

I have a patch support popcount optab:

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625870.html

However, I don't think we should enable popcount optab if we don't have
a single vector popcount instruction.

An ideal way is to add fallback popcount in loop vectorizer if the target
doesn't
enable vector popcount optab, Robin is working on that:

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626567.html

So, that's why I drop my patch.

More details need Robin comments.

[Bug target/104698] Inefficient code for DI to TI sign extend on power10

2023-10-13 Thread meissner at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104698

Michael Meissner  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Michael Meissner  ---
The patch was committed to the GCC 13 branch on March 5th, 2022 and later
backported to GCC 12.

[Bug target/106271] Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug fortran/45129] I/O edit descriptors: Warn if the format field is too small for the E and F edit descriptor

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45129

Jerry DeLisle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org
 CC||jvdelisle at gcc dot gnu.org

--- Comment #8 from Jerry DeLisle  ---
Taking this one.

[Bug libfortran/53962] Tab handling with formatted stream output

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53962

Jerry DeLisle  changed:

   What|Removed |Added

 CC||jvdelisle at gcc dot gnu.org
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org

--- Comment #4 from Jerry DeLisle  ---
Assigning to myself. This may be a duplicate of other tabbing bugs.

[Bug fortran/66499] Letters with accents change format behavior for X and T descriptors.

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66499

Jerry DeLisle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||jvdelisle at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org

--- Comment #5 from Jerry DeLisle  ---
assigning to myself

[Bug libfortran/83282] missing comma in format changes output

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83282

Jerry DeLisle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug libfortran/83282] missing comma in format changes output

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83282

Jerry DeLisle  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org
 CC||jvdelisle at gcc dot gnu.org

--- Comment #4 from Jerry DeLisle  ---
Assigning to myself

[Bug fortran/83829] Implement runtime checks for DT format specifier and allignment with effective items

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83829

Jerry DeLisle  changed:

   What|Removed |Added

 CC||jvdelisle at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org

--- Comment #1 from Jerry DeLisle  ---
Taking

[Bug fortran/88052] Format contravening constraint C1002 permitted

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88052

Jerry DeLisle  changed:

   What|Removed |Added

 CC||jvdelisle at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #10 from Jerry DeLisle  ---
I will assign this one so I can eventually close it out.

[Bug libfortran/97017] The function determine_precision is called twice for each formatted real write

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97017

Jerry DeLisle  changed:

   What|Removed |Added

 CC||jvdelisle at gcc dot gnu.org
 Status|WAITING |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org

--- Comment #2 from Jerry DeLisle  ---
I will take another look before closing this.

[Bug fortran/104626] ICE in gfc_format_decoder, at fortran/error.cc:1071

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104626

Jerry DeLisle  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org
 CC||jvdelisle at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #3 from Jerry DeLisle  ---
I will take this to get the fix committed if it tests OK here.

[Bug fortran/109105] Error-prone format string building in resolve.cc

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109105

Jerry DeLisle  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1
   Last reconfirmed||2023-10-13

--- Comment #2 from Jerry DeLisle  ---
Roland, where do we stand on this PR?

[Bug fortran/104351] ICE in gfc_generate_initializer, at fortran/expr.cc:5140

2023-10-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104351

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:d78fef5371759849944966dec65d9e987efba509

commit r14-4632-gd78fef5371759849944966dec65d9e987efba509
Author: Harald Anlauf 
Date:   Wed Oct 11 21:29:35 2023 +0200

Fortran: name conflict between internal procedure and derived type
[PR104351]

gcc/fortran/ChangeLog:

PR fortran/104351
* decl.cc (get_proc_name): Extend name conflict detection between
internal procedure and previous declaration also to derived type.

gcc/testsuite/ChangeLog:

PR fortran/104351
* gfortran.dg/derived_function_interface_1.f90: Adjust pattern.
* gfortran.dg/pr104351.f90: New test.

[Bug fortran/110957] -ffpe-trap and -ffpe-summary options issues

2023-10-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110957

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:458c253ccdae9df439b9a452d04e325101e5756e

commit r14-4631-g458c253ccdae9df439b9a452d04e325101e5756e
Author: Harald Anlauf 
Date:   Fri Oct 6 22:21:56 2023 +0200

fortran: fix handling of options -ffpe-trap and -ffpe-summary [PR110957]

gcc/fortran/ChangeLog:

PR fortran/110957
* invoke.texi: Update documentation to reflect '-ffpe-trap=none'.
* options.cc (gfc_handle_fpe_option): Fix mixup up of error
messages
for options -ffpe-trap and -ffpe-summary.  Accept '-ffpe-trap=none'
to clear FPU traps previously set on command line.

[Bug libfortran/93550] Implement control of leading zero in formatted numeric output

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93550

Jerry DeLisle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||jvdelisle at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org

--- Comment #3 from Jerry DeLisle  ---
I am going to add this to my list since we are now going through the 2018
compliance matrix.  I will update this as I get these sorted out.

[Bug fortran/110644] Error in gfc_format_decoder

2023-10-13 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110644

Jerry DeLisle  changed:

   What|Removed |Added

 CC||jvdelisle at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2023-10-13

--- Comment #2 from Jerry DeLisle  ---
Need more information.

[Bug analyzer/111802] New: New analyser diagram failures since commit b365e9d57ad4

2023-10-13 Thread thiago.bauermann at linaro dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111802

Bug ID: 111802
   Summary: New analyser diagram failures since commit
b365e9d57ad4
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: thiago.bauermann at linaro dot org
  Target Milestone: ---
Target: armv8l-linux-gnueabihf

Created attachment 56102
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56102&action=edit
Testsuite logs of analyser.exp from today's trunk.

After commit b365e9d57ad4 ("analyzer: improvements to out-of-bounds diagrams
[PR55]"), the following failures started appearing on
armv8l-linux-gnueabihf:

=== g++ tests ===

Running g++:g++.dg/analyzer/analyzer.exp ...
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++14  2
blank line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++14 
expected multiline pattern lines 50-73
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++14 (test
for excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++17  2
blank line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++17 
expected multiline pattern lines 50-73
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++17 (test
for excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++20  2
blank line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++20 
expected multiline pattern lines 50-73
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++20 (test
for excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++98  2
blank line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++98 
expected multiline pattern lines 50-73
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c -std=c++98 (test
for excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++14  2 blank
line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++14  expected
multiline pattern lines 42-65
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++14 (test for
excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++17  2 blank
line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++17  expected
multiline pattern lines 42-65
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++17 (test for
excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++20  2 blank
line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++20  expected
multiline pattern lines 42-65
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++20 (test for
excess errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++98  2 blank
line(s) in output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++98  expected
multiline pattern lines 42-65
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c -std=c++98 (test for
excess errors)
=== gcc tests ===

Running gcc:gcc.dg/analyzer/analyzer.exp ...
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c (test for excess
errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c 2 blank line(s) in
output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c expected multiline
pattern lines 23-46
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c (test for excess
errors)
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c 2 blank line(s) in
output
FAIL: c-c++-common/analyzer/out-of-bounds-diagram-strcat.c expected multiline
pattern lines 15-38
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-17.c expected multiline pattern
lines 14-35
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-18.c expected multiline pattern
lines 14-43
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-19.c expected multiline pattern
lines 25-46

They appear on builds configured with ‘--with-build-config=bootstrap-lto’, but
not on builds with just ‘--enable-bootstrap’, nor in builds configured with
‘--disable-bootstrap’, so perhaps it's an issue with LTO?

The failures happen because the string literal indexes in the diagrams are all
“[1]” instead of the expected “[0]”, “[1]”, “[2]”, etc. E.g.:

FAIL: gcc.dg/analyzer/out-of-bounds-diagram-17.c expected multiline pattern
lines 14-35
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-17.c 2 blank line(s) in output
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-17.c (test for excess errors)
Excess errors:
 ┌┬┬┬┬┐┌─┬─┬─┐
 │[1] │[1] │[1] │[1] │[1] ││ [1] │ [1] │ [1] │

[Bug c++/111222] ICE on basic_string_view and alias templates with missing template argument

2023-10-13 Thread ppalka at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111222

Patrick Palka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

--- Comment #8 from Jakub Jelinek  ---
Even in the gcc 13 -fdump-tree-ccp1-details I see
PHI node value: CONSTANT
0xffe2
(0x19)
so that isn't exactly readable either, but my changes made it much worse.

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

--- Comment #7 from Jakub Jelinek  ---
Created attachment 56101
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56101&action=edit
gcc14-pr111800.patch

Untested patch to avoid the buffer overflows when printing wide_int/widest_int.

The more important question is if we don't want to change how we print the
stuff.
As mentioned earlier, perhaps teach print_dec{s,u} to print even larger
constants decimally rather than hexadecimally.
And consider whether we shouldn't have print_hex with signop where we'd decide
whether to print negative values if SIGNED as -0x.. rather than some huge
0x..
Because, -fdump-tree-ccp1-details on the testcase from this PR used to print
say -1 as
0x
but now it prints it as
0xf

[Bug c++/111222] ICE on basic_string_view and alias templates with missing template argument

2023-10-13 Thread mpolacek at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111222

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-10-13
 Ever confirmed|0   |1
 CC||mpolacek at gcc dot gnu.org,
   ||ppalka at gcc dot gnu.org

--- Comment #3 from Marek Polacek  ---
I think the ICE below started with r11-3261.


In file included from 111222.C:1:
/home/mpolacek/x/gcc11/x86_64-pc-linux-gnu/libstdc++-v3/include/string_view:158:16:
internal compiler error: in add_outermost_template_args, at cp/pt.c:601
  158 |   && (!requires (_DRange& __d) {
  |^
  159 | __d.operator ::std::basic_string_view<_CharT,
_Traits>();
  |
~
  160 |   })
  |   ~ 
0xd52027 add_outermost_template_args(tree_node*, tree_node*)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:601
0xd8838e add_extra_args(tree_node*, tree_node*, int, tree_node*)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:13103
0xb1d061 tsubst_requires_expr
/home/mpolacek/src/gcc11/gcc/cp/constraint.cc:2310
0xb1d30c tsubst_requires_expr(tree_node*, tree_node*, int, tree_node*)
/home/mpolacek/src/gcc11/gcc/cp/constraint.cc:2353
0xdb6406 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:21116
0xdb09af tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:20035
0xdb0c91 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:20102
0xdb0c57 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:20101
0xdadc55 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:19382
0xb1e33e tsubst_constraint(tree_node*, tree_node*, int, tree_node*)
/home/mpolacek/src/gcc11/gcc/cp/constraint.cc:2761
0xb1d390 tsubst_constraint_info(tree_node*, tree_node*, int, tree_node*)
/home/mpolacek/src/gcc11/gcc/cp/constraint.cc:2366
0xdd8ae3 alias_ctad_tweaks
/home/mpolacek/src/gcc11/gcc/cp/pt.c:29355
0xdd93f8 deduction_guides_for
/home/mpolacek/src/gcc11/gcc/cp/pt.c:29488
0xdd9efe do_class_deduction
/home/mpolacek/src/gcc11/gcc/cp/pt.c:29611
0xddae07 do_auto_deduction(tree_node*, tree_node*, tree_node*, int,
auto_deduction_context, tree_node*, int)
/home/mpolacek/src/gcc11/gcc/cp/pt.c:29781
0xb91d12 cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int)
/home/mpolacek/src/gcc11/gcc/cp/decl.c:7899
0xd05934 cp_parser_init_declarator
/home/mpolacek/src/gcc11/gcc/cp/parser.c:21931
0xcf750b cp_parser_simple_declaration
/home/mpolacek/src/gcc11/gcc/cp/parser.c:14476
0xcf70c7 cp_parser_block_declaration
/home/mpolacek/src/gcc11/gcc/cp/parser.c:14302
0xcf6d80 cp_parser_declaration
/home/mpolacek/src/gcc11/gcc/cp/parser.c:14173

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

Jakub Jelinek  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Ah, I see what's going on.
tree-ssa-ccp.cc uses widest_int extensively (I believe it is a mistake and
should be using wide_int instead) and in this case has a widest_int with value
-30 (get_len () == 1, get_precision () before my changes 576, with those
changes 32640 bits).
Now, print_hex (but also print_decu/print_decs/print_dec as they always print
larger precision (and when get_len () > 1) using print_hex) prints all numbers
according to the precision, so my checks to ensure the buffer is big enough:
  char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
  unsigned len = wi.get_len ();
  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
are not correct, they'd need to check precision for wi::neg_p (wi) values.

[Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer

2023-10-13 Thread paulf at free dot fr via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

--- Comment #3 from Paul Floyd  ---
With clang 17.0.2 (also tried 14.0) I get

 :
   0:   55  push   %rbp
   1:   41 57   push   %r15
   3:   41 56   push   %r14
   5:   41 55   push   %r13
   7:   41 54   push   %r12
   9:   53  push   %rbx
   a:   48 81 ec c8 23 00 00sub$0x23c8,%rsp
  11:   c5 f9 28 c1 vmovapd %xmm1,%xmm0
  15:   4c 89 8c 24 98 21 00mov%r9,0x2198(%rsp)

With GCC if I add -mno-avx then I get back the base pointer. I presume that
this will turn off all vector extensions from avx onwards.

[Bug d/111537] ICE: in set_cell_span, at text-art/table.cc:148 with D front-end and -fanalyzer

2023-10-13 Thread ibuclaw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111537

ibuclaw at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ibuclaw at gcc dot gnu.org
   Assignee|dmalcolm at gcc dot gnu.org|ibuclaw at gcc dot 
gnu.org
  Component|analyzer|d

--- Comment #10 from ibuclaw at gcc dot gnu.org ---
Nice, thanks for pointing me to specifically where analyzer was tripping over
David.

I think I can take over this PR from here.

---

oob.d: In function ‘D main’:
oob.d:5:11: warning: stack-based buffer overflow [CWE-121]
[-Wanalyzer-out-of-bounds]
5 | strcpy(arr.ptr, "hello world");
  |   ^
  ‘D main’: events 1-4
|
|4 | char[5] arr;
|  | ^
|  | |
|  | (1) capacity: 5 bytes
|  | (2) following ‘false’ branch...
|  | (3) ...to here
|5 | strcpy(arr.ptr, "hello world");
|  |   ~  
|  |   |
|  |   (4) out-of-bounds write from byte 5 till byte 11 but
‘arr’ ends at byte 5
|
oob.d:5:11: note: write of 7 bytes to beyond the end of ‘arr’
5 | strcpy(arr.ptr, "hello world");
  |   ^
oob.d:5:11: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[4]’

  ┌─┬─┬─┬─┬───┐┌─┬─┬─┬┬┬┬┐
  │ [0] │ [1] │ [2] │ [3] │  [4]  ││ [5] │ [6] │ [7] │[8] │[9] │[10]│[11]│
  ├─┼─┼─┼─┼───┤├─┼─┼─┼┼┼┼┤
  │ ‘h’ │ ‘e’ │ ‘l’ │ ‘l’ │  ‘o’  ││ ‘ ’ │ ‘w’ │ ‘o’ │‘r’ │‘l’ │‘d’ │NUL │
  ├─┴─┴─┴─┴───┴┴─┴─┴─┴┴┴┴┤
  │   string literal (type: ‘const char[12]’)│
  └──┘
 │ │ │ │  │   │ │ │││││
 │ │ │ │  │   │ │ │││││
 v v v v  v   v v vvvvv
  ┌─┬─┬───┐┌─┐
  │ [0] │   ...   │  [4]  ││ │
  ├─┴─┴───┤│  after valid range  │
  │‘arr’ (type: ‘char[5]’)││ │
  └───┘└─┘
  ├───┬───┤├──┬──┤
  │   │
 ╭┴╮  ╭───┴──╮
 │capacity: 5 bytes│  │⚠️  overflow of 7 bytes│
 ╰─╯  ╰──╯

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
   Last reconfirmed||2023-10-13
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #5 from Jakub Jelinek  ---
I'll debug this.

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

--- Comment #4 from Andrew Pinski  ---
There looks to be some stack corruption going on; valgrind didn't catch it
though.

[Bug analyzer/111537] ICE: in set_cell_span, at text-art/table.cc:148 with D front-end and -fanalyzer

2023-10-13 Thread ibuclaw at gdcproject dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111537

--- Comment #9 from Iain Buclaw  ---
(In reply to Iain Buclaw from comment #8)
> I see in the olden days when D sat outside of GCC, this is what was done too.
> 
> https://github.com/D-Programming-GDC/gdc/commit/
> b9d36fc9d71ec4122d1c986599d87c6cb91ca55c
Although thinking it over, that did not take into consideration wide character
literals.  The number of 0's at the end of the raw string in STRING_CST nodes
would need to be the same as the width of the type it's representing - one,
two, or four 0's for char, wchar and dchar respectively.

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

--- Comment #3 from Andrew Pinski  ---
(gdb) p this->get_val()
$2 = (const long *) 0x303030303030

That seem wrong.

[Bug tree-optimization/111801] [14 Regression] Missed Dead Code Elimination since r14-4141-gbf6b107e2a3

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111801

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-13
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed.
What happens after the patch is `if (j == &g == 0)` still is kept around and we
decide to jump thread into that block and duplicate the `if (f)` there (I
think) and then there is a missing PRE (I think) where we could see that f is
stored and we load from f.

[Bug tree-optimization/111801] [14 Regression] Missed Dead Code Elimination since r14-4141-gbf6b107e2a3

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111801

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Keywords||missed-optimization

[Bug tree-optimization/111801] New: [14 Regression] Missed Dead Code Elimination since r14-4141-gbf6b107e2a3

2023-10-13 Thread theodort at inf dot ethz.ch via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111801

Bug ID: 111801
   Summary: [14 Regression] Missed Dead Code Elimination since
r14-4141-gbf6b107e2a3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

https://godbolt.org/z/47fbssWd3

Given the following code:

void foo(void);
static struct {
signed a;
} b, c, *e = &b;
static struct {
unsigned d;
} g = {109};
static int f;
static int *h;
static int **i = &h;
int main() {
f = 1;
if (b.a) f = 8;
int *j = *i;
if (j == &g == 0)
if (!(((g.d) >= 109) && ((g.d) <= 109))) {
__builtin_unreachable();
}
if (f)
;
else {
c = *e;
foo();
}
}

gcc-trunk -O2 does not eliminate the call to foo:

main:
movl$1, f(%rip)
movlb(%rip), %esi
testl   %esi, %esi
je  .L10
movl$8, f(%rip)
.L10:
cmpq$g, h(%rip)
je  .L7
movlf(%rip), %ecx
testl   %ecx, %ecx
je  .L12
.L7:
xorl%eax, %eax
ret
.L12:
pushq   %rax
callfoo
xorl%eax, %eax
popq%rdx
ret

gcc-13.2.0 -O2 eliminates the call to foo:

main:
movl$1, f(%rip)
movlb(%rip), %eax
testl   %eax, %eax
je  .L2
movl$8, f(%rip)
.L2:
xorl%eax, %eax
ret

Bisects to r14-4141-gbf6b107e2a3

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

--- Comment #2 from Andrew Pinski  ---
Worked at r14-4561-ge8d418df3dc609f27 .

[Bug tree-optimization/111799] [14 Regression] Missed Dead Code Elimination since r14-2365-g2e406f0753e

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111799

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> Right now I get an ICE while doing -fdump-tree-ccp1-details even:during

Filed PR 111800 for that ...

[Bug tree-optimization/111800] [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

Andrew Pinski  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |14.0

--- Comment #1 from Andrew Pinski  ---
ICE:
```
during GIMPLE pass: ccp
dump file: /app/output.cpp.034t.ccp1
: In function 'q':
:42:1: internal compiler error: Segmentation fault
   42 | }
  | ^
0x22fdf0e internal_error(char const*, ...)
???:0
0x14e8779 print_hex(generic_wide_int >
const&, char*)
???:0
0x14e8beb print_hex(generic_wide_int >
const&, _IO_FILE*)
???:0
g++: internal compiler error: Segmentation fault signal terminated program cc1
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
See  for instructions.
Compiler returned: 4
```

Maybe related to some of Jakub's recent wide_int changes ...

[Bug tree-optimization/111800] New: [14 Regression] ICE with -fdump-tree-ccp1-details

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111800

Bug ID: 111800
   Summary: [14 Regression] ICE with -fdump-tree-ccp1-details
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Testcase at -O3 -fdump-tree-ccp1-details:
```

void foo(void);
static struct d {
short e;
} g = {205}, *h = &g;
static int i;
static int *j, *a = &i;
static int **k = &j;
static int l;
static int *m;
static char(n)(char b, char c) { return b + c; }
static char(o)(char b, char c) { return b * c; }
static short(p)(short f) {
if (!(((f) >= 1) && ((f) <= 65459))) {
__builtin_unreachable();
}
return 0;
}
static int *q(short);
static void s(struct d) { *k = q(i); }
static int *q(short ad) {
int b = *a;
ad = -21;
for (; ad; ad = n(ad, 7)) p((ad ^ b && *a) <= *a);
return *k;
}
int main() {
i = 0;
for (;; i = 1) {
q(3);
char r = o(126 | 1, g.e);
p(r);
s(*h);
if (i) break;
m = &l;
}
if (m)
;
else
foo();
;
}
```

[Bug tree-optimization/111799] [14 Regression] Missed Dead Code Elimination since r14-2365-g2e406f0753e

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111799

--- Comment #1 from Andrew Pinski  ---
Right now I get an ICE while doing -fdump-tree-ccp1-details even:during GIMPLE
pass: ccp
dump file: /app/output.cpp.034t.ccp1
: In function 'q':
:42:1: internal compiler error: Segmentation fault
   42 | }
  | ^
0x22fdf0e internal_error(char const*, ...)
???:0
0x14e8779 print_hex(generic_wide_int >
const&, char*)
???:0
0x14e8beb print_hex(generic_wide_int >
const&, _IO_FILE*)
???:0
g++: internal compiler error: Segmentation fault signal terminated program cc1
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
See  for instructions.
Compiler returned: 4

[Bug tree-optimization/111799] [14 Regression] Missed Dead Code Elimination since r14-2365-g2e406f0753e

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111799

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Keywords||missed-optimization

[Bug tree-optimization/111799] New: [14 Regression] Missed Dead Code Elimination since r14-2365-g2e406f0753e

2023-10-13 Thread theodort at inf dot ethz.ch via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111799

Bug ID: 111799
   Summary: [14 Regression] Missed Dead Code Elimination since
r14-2365-g2e406f0753e
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

https://godbolt.org/z/bqcvMcbqn

Given the following code:

void foo(void);
static struct d {
short e;
} g = {205}, *h = &g;
static int i;
static int *j, *a = &i;
static int **k = &j;
static int l;
static int *m;
static char(n)(char b, char c) { return b + c; }
static char(o)(char b, char c) { return b * c; }
static short(p)(short f) {
if (!(((f) >= 1) && ((f) <= 65459))) {
__builtin_unreachable();
}
return 0;
}
static int *q(short);
static void s(struct d) { *k = q(i); }
static int *q(short ad) {
int b = *a;
ad = -21;
for (; ad; ad = n(ad, 7)) p((ad ^ b && *a) <= *a);
return *k;
}
int main() {
i = 0;
for (;; i = 1) {
q(3);
char r = o(126 | 1, g.e);
p(r);
s(*h);
if (i) break;
m = &l;
}
if (m)
;
else
foo();
;
}

gcc-trunk -O2 does not eliminate the call to foo:

main:
movl$2, %esi
xorl%ecx, %ecx
xorl%r9d, %r9d
movl$8, %edi
movl$0, i(%rip)
.p2align 4,,10
.p2align 3
.L10:
movl$-21, %eax
.L2:
testl   %ecx, %ecx
je  .L29
leal7(%rax), %edx
movsbw  %dl, %r8w
testb   %dl, %dl
je  .L23
.L5:
movl$8, %eax
cmpb$1, %dl
je  .L2
movl%r8d, %eax
leal7(%rax), %edx
movsbw  %dl, %r8w
testb   %dl, %dl
jne .L5
.p2align 4,,10
.p2align 3
.L23:
movl$-21, %eax
addb$7, %al
je  .L30
.L9:
movsbl  %al, %edx
cmpl%edx, %ecx
cmove   %edi, %eax
cbtw
addb$7, %al
jne .L9
.L30:
cmpl$1, %esi
je  .L31
movl$1, %esi
movl$1, %ecx
movl$1, %r9d
jmp .L10
.p2align 4,,10
.p2align 3
.L29:
testb   $1, %al
je  .L4
leal7(%rax), %edx
movsbw  %dl, %ax
testb   %dl, %dl
je  .L23
.L4:
leal14(%rax), %edx
movsbw  %dl, %ax
testb   %dl, %dl
jne .L4
jmp .L23
.p2align 4,,10
.p2align 3
.L31:
testb   %r9b, %r9b
je  .L11
movq$l, m(%rip)
movl%ecx, i(%rip)
.L25:
xorl%eax, %eax
ret
.L11:
cmpq$0, m(%rip)
jne .L25
pushq   %rax
callfoo
xorl%eax, %eax
popq%rdx
ret

gcc-13.2.0 -O2 eliminates the call to foo:

main:
movq$l, m(%rip)
xorl%eax, %eax
movl$1, i(%rip)
ret

Bisects to r14-2365-g2e406f0753e

[Bug tree-optimization/111798] New: [14 Regression] Recent change causing testsuite regression and poor code on mcore-elf

2023-10-13 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111798

Bug ID: 111798
   Summary: [14 Regression] Recent change causing testsuite
regression and poor code on mcore-elf
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This change:

commit 6decda1a35be5764101987c210b5693a0d914e58
Author: Richard Biener 
Date:   Thu Oct 12 11:34:57 2023 +0200

tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

The following handles byte-aligned, power-of-two and byte-multiple
sized BIT_FIELD_REF reads in SRA.  In particular this should cover
BIT_FIELD_REFs created by optimize_bit_field_compare.

For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
appearing there leading to more DSE, fully eliding the aggregates.

This results in the same false positive -Wuninitialized as the
older attempt to remove the folding from optimize_bit_field_compare,
fixed by initializing part of the aggregate unconditionally.

PR tree-optimization/111779
gcc/
* tree-sra.cc (sra_handled_bf_read_p): New function.
(build_access_from_expr_1): Handle some BIT_FIELD_REFs.
(sra_modify_expr): Likewise.
(make_fancy_name_1): Skip over BIT_FIELD_REF.

gcc/fortran/
* trans-expr.cc (gfc_trans_assignment_1): Initialize
lhs_caf_attr and rhs_caf_attr codimension flag to avoid
false positive -Wuninitialized.

gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
* gcc.dg/vect/vect-pr111779.c: New testcase.

Causes execute/20040709-2.c to fail on mcore-elf at -O2.  It also results in
what appears to be significantly poorer code generation.

Note I haven't managed to get mcore-elf-gdb to work, so debugging is, umm,
painful.  And I wouldn't put a lot of faith in the simulator correctness.

I have simplified the test to this:
extern void abort (void);
extern void exit (int);

unsigned int
myrnd (void)
{
  static unsigned int s = 1388815473;
  s *= 1103515245;
  s += 12345;
  return (s / 65536) % 2048;
}

struct __attribute__((packed)) K
{
  unsigned int k:6, l:1, j:10, i:15;
};

struct K sK;

unsigned int
fn1K (unsigned int x)
{
  struct K y = sK;
  y.k += x;
  return y.k;
}

void
testK (void)
{
  int i;
  unsigned int mask, v, a, r;
  struct K x;
  char *p = (char *) &sK;
  for (i = 0; i < sizeof (sK); ++i)
*p++ = myrnd ();
  v = myrnd ();
  a = myrnd ();
  sK.k = v;
  x = sK;
  r = fn1K (a);
  if (x.j != sK.j || x.l != sK.l)
abort ();
}

int
main (void)
{
  testK ();
  exit (0);
}


Which should at least make the poor code gen obvious.  I don't expect to have
time to debug this further anytime in the near future.

[Bug target/111778] [14 Regression] PowerPC constant code change uses an undefined shift

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778

Andrew Pinski  changed:

   What|Removed |Added

 CC||zsojka at seznam dot cz

--- Comment #5 from Andrew Pinski  ---
*** Bug 111746 has been marked as a duplicate of this bug. ***

[Bug target/111746] [14 Regression] ICE: infinite recursion in try_split (emit-rtl.cc:3972) at -O2

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111746

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Andrew Pinski  ---
(In reply to Zdenek Sojka from comment #2)
> Maybe fixed by the PR111778 patch

Yes it is a dup .

*** This bug has been marked as a duplicate of bug 111778 ***

[Bug target/111778] [14 Regression] PowerPC constant code change uses an undefined shift

2023-10-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Keywords||ice-on-valid-code
Summary|PowerPC constant code   |[14 Regression] PowerPC
   |change uses an undefined|constant code change uses
   |shift   |an undefined shift

[Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself

2023-10-13 Thread patrick.begou--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

--- Comment #3 from Patrick Bégou  ---
Hi Tobias,

thanks for this information.
- yes removing the "finalize" make this small test case working. In my 
mind it should only remove the allocated attribute from the GPU with 
setting the count to zero. Is it because the attribute is an allocatable 
and not a pointer ? Is it the same behaviour with a pointer as attribute ?

- unfortunately this modification doesn't make some significant progress 
with porting my large code (things are much more complex) but with Gnu 
compilers Gdb is working. So it is a big step for investigating. I have 
isolated the low level data management to test this module 
independently. It works with ftn and Nvidia but not with Gnu at this 
time. I have to investigate.

Patrick

Le 13/10/2023 à 10:45, burnus at gcc dot gnu.org a écrit :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661
>
> --- Comment #2 from Tobias Burnus  ---
> @Patrick: It seems to work fine without "finalize".
>
> Can you check whether the big program then works for you?
> Usually, one should be able to do proper ref counting such that a
>   'finalize' -> force setting refcounts to zero
> shouldn't be needed.
>
> * * *
>
> Looking at the code more closely, the problem is:
>
>#pragma omp target oacc_exit_data map(delete:tab.val.data [len: 88])
>
> this tries to 'delete' the array descriptor - but as tab.val.data is part of
> 'tab', this deletes all of "tab".
>
>
> Compare the C example:
>
> struct t { int *a; int n; };
> void f() {
>struct t s;
>#pragma acc enter data copyin(s.a[:s.n])
>#pragma acc exit data delete(s.a[:s.n])
>// for completeness, not relevant here:
>#pragma acc exit data detach(s.a)
>#pragma acc exit data delete(s.a)
> }
>
>
> GCC does:
>
>   #pragma omp target oacc_enter_data map(struct:s [len: 1]) \
>   map(alloc:s.a [len: 8]) map(to:*_4 [len: _3]) map(attach:s.a [bias: 0])
>
>   #pragma omp target oacc_exit_data map(release:s.a [len: 8]) \
>   map(release:*_8 [len: _7]) map(detach:s.a [bias: 0])
>
>   #pragma omp target oacc_exit_data map(detach:s.a [bias: 0])
>   #pragma omp target oacc_exit_data map(release:s.a [len: 8])
>
> which seems to be at least consistent. Again, here a 'finalize' would force 
> the
> reference counts to zero and, hence, also delete 's' and not only the
> pointee/pointer target *s.a / s.a[0:.n] but also the pointer 's.a' itself.
>
> (BTW: Same result since GCC 10; GCC 9 rejects that code.)
>
>   * * *
>
> QUESTION: Is the current code for C (and Fortran) correct according to the
> OpenACC specification or not?
>
> FOLLOW UP QUESTION: If GCC's result is incorrect, what should the compiler do
> instead?
> And if it is correct, the question is: why do both ftn and nvfortran work in
> the same way?
>

[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc

2023-10-13 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #9 from Andrew Macleod  ---
Fixed.

[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc

2023-10-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622

--- Comment #8 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Andrew Macleod
:

https://gcc.gnu.org/g:1c2bd0839e3574ab2a76ec18faf906b2f64b5f81

commit r13-7949-g1c2bd0839e3574ab2a76ec18faf906b2f64b5f81
Author: Andrew MacLeod 
Date:   Thu Oct 12 17:06:36 2023 -0400

Do not add partial equivalences with no uses.

PR tree-optimization/111622
* value-relation.cc (equiv_oracle::add_partial_equiv): Do not
register a partial equivalence if an operand has no uses.

[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc

2023-10-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:8be20f3b0bded7f9b690b27cbee58b283dbe827b

commit r14-4630-g8be20f3b0bded7f9b690b27cbee58b283dbe827b
Author: Andrew MacLeod 
Date:   Thu Oct 12 17:06:36 2023 -0400

Do not add partial equivalences with no uses.

PR tree-optimization/111622
* value-relation.cc (equiv_oracle::add_partial_equiv): Do not
register a partial equivalence if an operand has no uses.

[Bug tree-optimization/111796] OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

--- Comment #6 from Richard Biener  ---
GCN handles this fine using simdlen(64).

   [local count: 939524096]:
  # ivtmp.31_10 = PHI 
  vectp_x.27_17 = (vector(64) int *) ivtmp.31_10;
  vect__4.24_14 = MEM  [(int *)vectp_x.27_17];
  vect__5.25_15 = (vector(64) short int) vect__4.24_14;
  vect__6.26_16 = foo.simdclone.0 (vect__4.24_14, vect__5.25_15);
  MEM  [(int *)vectp_x.27_17] = vect__6.26_16;
  ivtmp.31_11 = ivtmp.31_10 + 256;
  if (_8 != ivtmp.31_11)
goto ; [85.71%]

[Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer

2023-10-13 Thread paulf at free dot fr via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

--- Comment #2 from Paul Floyd  ---
(In reply to Richard Biener from comment #1)
> I think it's easiest to use a frame pointer when custom stack alignment is
> needed both for the return path and accessing arguments on the stack.

But is it faster, the same or slower?

[Bug tree-optimization/111796] OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*

--- Comment #5 from Richard Biener  ---
On aarch64 I see

t.c:5:1: warning: GCC does not currently support mixed size types for 'simd'
functions
5 | foo (int a, short b)
  | ^~~

simdlen(8) is also not supported, but simdlen(4) isn't diagnosed.  The above
still remains though.

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #19 from Richard Biener  ---
So maybe it's the same issue as PR90348 (you can verify the RTL expansion dump
on whether the two involved decls are coalesced and see whether that's valid).

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 111795, which changed state.

Bug 111795 Summary: OMP SIMD inbranch call vectorization missing for AVX512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111795

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/111795] OMP SIMD inbranch call vectorization missing for AVX512

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111795

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Biener  ---
Fixed with similar restrictions as for non-AVX512 for now.

[Bug tree-optimization/111795] OMP SIMD inbranch call vectorization missing for AVX512

2023-10-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111795

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:3179ad72f67f31824c444ef30ef171ad7495d274

commit r14-4629-g3179ad72f67f31824c444ef30ef171ad7495d274
Author: Richard Biener 
Date:   Fri Oct 13 12:32:51 2023 +0200

OMP SIMD inbranch call vectorization for AVX512 style masks

The following teaches vectorizable_simd_clone_call to handle
integer mode masks.  The tricky bit is to second-guess the
number of lanes represented by a single mask argument - the following
uses simdlen and the number of mask arguments to calculate that,
assuming ABIs have them uniform.

Similar to the VOIDmode handling there's a restriction on not
supporting splitting/merging of incoming vector masks to
more/less SIMD call arguments.

PR tree-optimization/111795
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
integer mode mask arguments.

* gcc.target/i386/vect-simd-clone-avx512-1.c: New testcase.
* gcc.target/i386/vect-simd-clone-avx512-2.c: Likewise.
* gcc.target/i386/vect-simd-clone-avx512-3.c: Likewise.

[Bug tree-optimization/111796] OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

--- Comment #4 from Richard Biener  ---
(In reply to Richard Biener from comment #3)
> (In reply to Richard Biener from comment #1)
> > It's the promote_prototypes hook btw.
> 
> Of the major targets (x86, arm, aarch64, powerpc, s390, riscv) only x86
> defines the hook to true.  But there are a lot of embedded archs doing so,
> and ia64 and the pa.

Interestingly the Ada frontend has

tree
create_param_decl (tree name, tree type)
{
  tree param_decl = build_decl (input_location, PARM_DECL, name, type);

  /* Honor TARGET_PROMOTE_PROTOTYPES like the C compiler, as not doing so
 can lead to various ABI violations.  */
  if (targetm.calls.promote_prototypes (NULL_TREE)
  && INTEGRAL_TYPE_P (type)
  && TYPE_PRECISION (type) < TYPE_PRECISION (integer_type_node))
{
...
type = integer_type_node;
}
...
  DECL_ARG_TYPE (param_decl) = type;

but the hook is nowhere called in other frontends besides C, C++ and Ada.

[Bug tree-optimization/111796] OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

--- Comment #3 from Richard Biener  ---
(In reply to Richard Biener from comment #1)
> It's the promote_prototypes hook btw.

Of the major targets (x86, arm, aarch64, powerpc, s390, riscv) only x86
defines the hook to true.  But there are a lot of embedded archs doing so,
and ia64 and the pa.

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-10-13 Thread linkw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #18 from Kewen Lin  ---
(In reply to Richard Biener from comment #17)
> it stores to a different object - but seeing the CLOBBERs, does
> -fstack-reuse=none fix the issue?  Is r1 the stack pointer?

Just tried with -fstack-reuse=none, it can make it pass! Yes, r1 is stack
pointer.

> 
> ref-all is carried to RTL by MEM_ALIAS_SET == 0.

Got it, thanks!

[Bug tree-optimization/111796] OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

--- Comment #2 from Richard Biener  ---
We could also decide to only apply promote_prototype at RTL expansion time?

[Bug tree-optimization/111796] OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

--- Comment #1 from Richard Biener  ---
It's the promote_prototypes hook btw.

[Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*

--- Comment #1 from Richard Biener  ---
I think it's easiest to use a frame pointer when custom stack alignment is
needed both for the return path and accessing arguments on the stack.

[Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer

2023-10-13 Thread paulf at free dot fr via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

Bug ID: 111797
   Summary: Code generation of -march=znver2 -O3 includes frame
pointer
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: paulf at free dot fr
  Target Milestone: ---

I was a bit surprised recently when I (unintentinally) ran perf record on the
exe that I work on with an -O3 build without -fno-omit-frame-pointer and I
could see the callstacks.

The function prolog that I see is

 :
   0:   4c 8d 54 24 08  lea0x8(%rsp),%r10
   5:   48 83 e4 e0 and$0xffe0,%rsp
   9:   41 ff 72 f8 push   -0x8(%r10)
   d:   55  push   %rbp
   e:   48 89 e5mov%rsp,%rbp
  11:   41 57   push   %r15
  13:   41 56   push   %r14
  15:   41 55   push   %r13
  17:   41 54   push   %r12
  19:   41 52   push   %r10
  1b:   53  push   %rbx
  1c:   49 89 cemov%rcx,%r14
  1f:   48 81 ec 40 10 00 00sub$0x1040,%rsp

I asked on SO and got pointed to this post

https://stackoverflow.com/questions/45423338/whats-up-with-gcc-weird-stack-manipulation-when-it-wants-extra-stack-alignment

That problem seems to be fixed

https://godbolt.org/z/qc6fqb5hn

I can't post the source code as it is proprietary, and it doesn't seem to
reproduce with trivial examples (the function that I tried is 23kloc plus it
#includes other stuff).

I was able to reproduce the problem with the following steps (Valgrind chosen
because I'm one of the maintainers and I'm in the habit of building it).

git clone https://sourceware.org/git/valgrind.git march_zen2
cd march_zen2
./autogen.sh
./configure CFLAGS=-march=znver2
make -j 16
objdump -d --disassemble=mc_pre_clo_init mc_pre_clo_init
.in_place/memcheck-amd64-linux | less

That shows

5800c220 :
5800c220:   41 55   push   %r13
5800c222:   bf 8c 65 1d 58  mov$0x581d658c,%edi
5800c227:   4c 8d 6c 24 10  lea0x10(%rsp),%r13
5800c22c:   48 83 e4 e0 and$0xffe0,%rsp
5800c230:   41 ff 75 f8 push   -0x8(%r13)
5800c234:   55  push   %rbp
5800c235:   48 89 e5mov%rsp,%rbp
5800c238:   41 55   push   %r13
5800c23a:   48 83 ec 08 sub$0x8,%rsp

which I believe illustrates the same problem.

mc_pre_clo_init looks like this


static void mc_pre_clo_init(void)
{
   VG_(details_name)("Memcheck");
   VG_(details_version) (NULL);
   VG_(details_description) ("a memory error detector");
   VG_(details_copyright_author)(
  "Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.");
   VG_(details_bug_reports_to)  (VG_BUGS_TO);

VG_ is a macro that implements a kind of C namespace. The functions are all
outputting the memcheck startup banner.

I think that I understand that there is a need for a 32byte-aligned stack and
also to shuffle the return address. Is it really necessary to also use the
frame pointer?

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #17 from Richard Biener  ---
it stores to a different object - but seeing the CLOBBERs, does
-fstack-reuse=none fix the issue?  Is r1 the stack pointer?

ref-all is carried to RTL by MEM_ALIAS_SET == 0.

[Bug target/111746] [14 Regression] ICE: infinite recursion in try_split (emit-rtl.cc:3972) at -O2

2023-10-13 Thread zsojka at seznam dot cz via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111746

--- Comment #2 from Zdenek Sojka  ---
Maybe fixed by the PR111778 patch

[Bug target/111746] [14 Regression] ICE: infinite recursion in try_split (emit-rtl.cc:3972) at -O2

2023-10-13 Thread zsojka at seznam dot cz via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111746

--- Comment #1 from Zdenek Sojka  ---
This seems to have got fixed between gdfb40855a08 (BAD) and g0f40e59f193
(GOOD).

[Bug tree-optimization/111796] New: OMP SIMD call vectorization fails for arguments subject to integer promotion rules

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111796

Bug ID: 111796
   Summary: OMP SIMD call vectorization fails for arguments
subject to integer promotion rules
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

For example

int x[1024];

#pragma omp declare simd simdlen(8)
__attribute__((noinline)) int
foo (int a, short b)
{
  return a + b;
}

void __attribute__((noipa))
bar (void)
{
#pragma omp simd
  for (int i = 0; i < 1024; i++)
x[i] = foo (x[i], x[i]);
}

fails to vetorize because for the scalar code we see

  _4 = x[i_12];
  _5 = (short int) _4;
  _6 = (int) _5;
  _7 = foo (_4, _6);

thus the second argument to 'foo' is promoted to 'int', but the SIMD clone
at least on x86_64 expects vector(8) short int simd.6 as argument.

vectorizable_simd_clone_call has the following, which will result in
rejecting the call.

for (i = 0; i < nargs; i++)
  {
switch (n->simdclone->args[i].arg_type)
  {
  case SIMD_CLONE_ARG_TYPE_VECTOR:
if (!useless_type_conversion_p
(n->simdclone->args[i].orig_type,
 TREE_TYPE (gimple_call_arg (stmt, i + arg_offset
  i = -1;

This argument promotion is exposed by the frontend, controlled by a target
hook.  IIRC it is intended to allow more optimization, so maybe it can be
disabled for calls to OMP SIMD functions.

Alternatively the vectorizer needs to deal with this somehow, for example
in vectorizable_simd_clone_call by allowing this and instead peeking 
through the conversion.  Possibly also done via pattern recognizing the call
itself.

[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-10-13 Thread linkw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #16 from Kewen Lin  ---
Tracing down it with template specialization, the aborting happens on

  auto vn_b = Load(dn, in_b.get());
  HWY_ASSERT_VEC_EQ(
  dw, vw_signed_max,
  SatWidenMulPairwiseAdd(
  dw, InterleaveLower(dn_u, BitCast(dn_u, vn_b), vn_unsigned_max),
  InterleaveLower(dn, vn_b, vn_signed_max)));

with "void operator()(int8_t, CappedTag dn)"

by isolating, it doesn't get the expected result on "b0" for function

template 
HWY_API Vec SatWidenMulPairwiseAdd(DI16 di16, VU8 a, VI8 b) {
  RebindToUnsigned du16;
  auto a0 = And(BitCast(di16, a), Set(di16, 255));
  auto b0 = ShiftRight<8>(ShiftLeft<8>(BitCast(di16, b)));
  auto a1 = BitCast(di16, ShiftRight<8>(BitCast(du16, a)));
  auto b1 = ShiftRight<8>(BitCast(di16, b));
  return SaturatedAdd(Mul(a0, b0), Mul(a1, b1));
}

specialized with 
template <> HWY_API Vec128 SatWidenMulPairwiseAdd(Simd di16, Vec128 a, Vec128 b)

further found that the unexpected values are from ShiftLeft<8>, the tree
optimized code looks expected but the final insn sequence look in wrong order.
Either -fdisable-rtl-sched2 or -fdisable-rtl-sched1 can make it pass. With
counter, I see an unexpected insn movement in sched2 on insn 395.

...

 1436: %10:DI=0x70
  REG_EQUIV 0x70
 1438: %9:DI=0xc0
  REG_EQUIV 0xc0
 1437: %8:DI=0x1e0
  REG_EQUIV 0x1e0
 1441: %7:DI=0xd0
  REG_EQUIV 0xd0
  389: %0:V2DI=[%1:DI+%9:DI]
  REG_DEAD %9:DI
  REG_EQUAL [sfp:DI+0xc0]
 1445: %5:DI=0xb0
  REG_EQUIV 0xb0
 1714: %9:DI=0xff
  REG_EQUIV 0xff
  373: [%1:DI+0x70]=%4:DI
  REG_DEAD %4:DI
  375: [%1:DI+0x78]=%6:DI
  REG_DEAD %6:DI
 1715: %9:DI=%9:DI|0xff
 1785: %25:DI=high(unspec[`*.LC8',%2:DI] 47)
 1716: %9:DI=%9:DI&0x|%9:DI<<0x20
  REG_EQUIV 0xff00ff00ff00ff
  410: %28:DI=%1:DI+0xae
  REG_EQUAL sfp:DI+0xae
6: %31:SI=0
  REG_EQUAL 0
 1786: %25:DI=%25:DI+low(unspec[`*.LC8',%2:DI] 47)
  REG_DEAD %2:DI
  REG_EQUAL `*.LC8'
  392: [%1:DI+%7:DI]=%0:V2DI
  REG_DEAD %7:DI
 // unexpected version having insn 395
moved here.
 1738: %12:V2DI=[%1:DI+%10:DI]
  376: [%1:DI+%8:DI]=%12:V2DI
  REG_DEAD %12:V2DI
  REG_DEAD %8:DI
  REG_EQUIV [sfp:DI+%8:DI]
  REG_EQUAL [sfp:DI+0x70]
  390: [%1:DI+%10:DI]=%0:V2DI// since this store updates
[%1:DI+0x70] in 16 bytes, so the read
 // can't pass this  
  REG_DEAD %0:V2DI
  395: %4:DI=zero_extend([%1:DI+0x70])   //  <-- this is expected
  398: %6:DI=zero_extend([%1:DI+0x72])
  401: %7:DI=zero_extend([%1:DI+0x74])
  404: %8:DI=zero_extend([%1:DI+0x76])
  396: %4:SI=%4:SI<<0x8
  399: %6:SI=%6:SI<<0x8
  402: %7:SI=%7:SI<<0x8
  405: %8:SI=%8:SI<<0x8

 

the tree optimized IR for this part looks expected?

   [local count: 119292722]:
  v = a;
  MEM  [(char * {ref-all})&D.38735] = MEM  [(char * {ref-all})&v];
  v ={v} {CLOBBER(eol)};
  vect_a_raw_0_1121.562_722 = MEM  [(short int
*)&D.38735];
  _215 = VIEW_CONVERT_EXPR(vect_a_raw_0_1121.562_722);
  _830 = _215 & 71777214294589695;
  _1549 = BIT_FIELD_REF <_830, 16, 32>;
  _1537 = BIT_FIELD_REF <_830, 16, 16>;
  _323 = BIT_FIELD_REF <_830, 16, 0>;
  v = b;
  MEM  [(char * {ref-all})&b00] = MEM 
[(char * {ref-all})&v]; 

  ==> ref-all here, so should be executed before any
reads below?

  v ={v} {CLOBBER(eol)};
  v = b00;
  raw_u_1323 = v.raw[0];
  _1324 = raw_u_1323 << 8;
  v.raw[0] = _1324;
  raw_u_1403 = v.raw[1];
  _1404 = raw_u_1403 << 8;
  v.raw[1] = _1404;
  raw_u_1447 = v.raw[2];
  _1448 = raw_u_1447 << 8;
  v.raw[2] = _1448;
  raw_u_128 = v.raw[3];
  _129 = raw_u_128 << 8;
  v.raw[3] = _129;
  b01 = v;
  v ={v} {CLOBBER(eol)};
  ivtmp.577_734 = (unsigned long) &MEM  [(void *)&b01 + -2B];

...

I guess there is some way to keep this kind of aliasing information after
expanding, need more investigations why sched considers it's safe to move.

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793

--- Comment #5 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #4)
> So, shouldn't we match.pd (or something else) pattern match
>   vect_cst__50 = {mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
> mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
> mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
> mask.48_7(D), mask.48_7(D)};
>   vect__8.132_51 = vect_cst__50 >> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> 12, 13, 14, 15 };
>   vect__9.133_53 = vect__8.132_51 & { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1 };
>   mask__39.139_60 = vect__9.133_53 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0 };
> back into mask__39.139_60 = mask.48_7(D); ?

Yes, that's a possibility.  I wonder if it's possible to arrange things in the
vectorizer itself so that costing gets more accurate (probably not that
important for OMP SIMD though).

Maybe it works a bit better if we did mask & (1 << iv), but I guess we
canonicalize that back.

I've opened this for tracking for now, working on PR111795 first.

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

2023-10-13 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793

--- Comment #4 from Jakub Jelinek  ---
So, shouldn't we match.pd (or something else) pattern match
  vect_cst__50 = {mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D)};
  vect__8.132_51 = vect_cst__50 >> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15 };
  vect__9.133_53 = vect__8.132_51 & { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1 };
  mask__39.139_60 = vect__9.133_53 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0 };
back into mask__39.139_60 = mask.48_7(D); ?

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

2023-10-13 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #2)
> which is much more reasonable - I'm not sure whether the compare against 0
> is required or if the ABI guarantees either 0 or -1 in the elements.

Yes, it does.  All ones elements that is, the element type can be unsigned
integral or floating point as well...

[Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself

2023-10-13 Thread burnus at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

--- Comment #2 from Tobias Burnus  ---
@Patrick: It seems to work fine without "finalize".

Can you check whether the big program then works for you?
Usually, one should be able to do proper ref counting such that a
 'finalize' -> force setting refcounts to zero
shouldn't be needed.

* * *

Looking at the code more closely, the problem is:

  #pragma omp target oacc_exit_data map(delete:tab.val.data [len: 88])

this tries to 'delete' the array descriptor - but as tab.val.data is part of
'tab', this deletes all of "tab".


Compare the C example:

struct t { int *a; int n; };
void f() {
  struct t s;
  #pragma acc enter data copyin(s.a[:s.n])
  #pragma acc exit data delete(s.a[:s.n])
  // for completeness, not relevant here:
  #pragma acc exit data detach(s.a)
  #pragma acc exit data delete(s.a)
}


GCC does:

 #pragma omp target oacc_enter_data map(struct:s [len: 1]) \
 map(alloc:s.a [len: 8]) map(to:*_4 [len: _3]) map(attach:s.a [bias: 0])

 #pragma omp target oacc_exit_data map(release:s.a [len: 8]) \
 map(release:*_8 [len: _7]) map(detach:s.a [bias: 0])

 #pragma omp target oacc_exit_data map(detach:s.a [bias: 0])
 #pragma omp target oacc_exit_data map(release:s.a [len: 8])

which seems to be at least consistent. Again, here a 'finalize' would force the
reference counts to zero and, hence, also delete 's' and not only the
pointee/pointer target *s.a / s.a[0:.n] but also the pointer 's.a' itself.

(BTW: Same result since GCC 10; GCC 9 rejects that code.)

 * * *

QUESTION: Is the current code for C (and Fortran) correct according to the
OpenACC specification or not?

FOLLOW UP QUESTION: If GCC's result is incorrect, what should the compiler do
instead?
And if it is correct, the question is: why do both ftn and nvfortran work in
the same way?

[Bug tree-optimization/111795] OMP SIMD inbranch call vectorization missing for AVX512

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111795

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Keywords||missed-optimization
   Last reconfirmed||2023-10-13
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
 Target||x86_64-*-*

--- Comment #1 from Richard Biener  ---
Mine.

[Bug tree-optimization/111795] New: OMP SIMD inbranch call vectorization missing for AVX512

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111795

Bug ID: 111795
   Summary: OMP SIMD inbranch call vectorization missing for
AVX512
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

We don't currently handle the following

int x[1024];

#pragma omp declare simd simdlen(16) inbranch
__attribute__((noinline)) int
foo (int a, int b)
{
  return a + b;
}

void __attribute__((noipa))
bar (void)
{
#pragma omp simd
  for (int i = 0; i < 1024; i++)
{
  if (x[i] < 10)
x[i] = foo (x[i], x[i]);
}
}

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #4 from Robin Dapp  ---
Just to mention here as well.  As this seems ninstance++ where the
adjust_precision thing comes back to bite us, I'm going to go back and check if
the issue why it was introduced (DCE?) cannot be solved differently.  I'd
rather have us not deviate from other backends at such a central part as mode
precisions.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #3 from Richard Biener  ---
I'm failing to see the issue as with -march=rv64gcv I run into

t.c:4:8: missed:   not vectorized: relevant stmt not supported: *x_50(D) = _6;
t.c:4:8: note:   removing SLP instance operations starting from: *x_50(D) = _6;
t.c:4:8: missed:  not vectorized: bad operation in basic block.

but just guessing, the issue is bool pattern recognition and

t.c:12:1: note:   using normal nonmask vectors for _2 = _1 == 1;
t.c:12:1: note:   using normal nonmask vectors for _4 = _3 == 2;
t.c:12:1: note:   using normal nonmask vectors for _5 = _2 & _4;
...

?  To vectorize you'd want to see

t.c:12:1: note:   using boolean precision 32 for _2 = _1 == 1;
t.c:12:1: note:   using boolean precision 16 for _4 = _3 == 2;
t.c:12:1: note:   using boolean precision 16 for _5 = _2 & _4;
...

and a pattern used for the value use:

t.c:12:1: note:   extra pattern stmt: patt_62 = _5 ? 1 : 0;

You need to see why this doesn't work (it's a very delicate area).

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-13 Thread rguenther at suse dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600

--- Comment #27 from rguenther at suse dot de  ---
On Fri, 13 Oct 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
> 
> --- Comment #26 from Robin Dapp  ---
> So insn-opinit.cc still takes 2-3 minutes to compile here, even though the 
> file
> is not gigantic.
> With the same GCC 13.1 x86 host compiler I see:
> 
>  phase opt and generate : 170.28 ( 99%)   0.75 ( 48%) 171.43 ( 
> 99%)
>   561M ( 60%)
>   callgraph functions expansion  :  35.04 ( 20%)   0.50 ( 32%)  35.65 (
> 21%)   356M ( 38%)
>  callgraph ipa passes   : 135.02 ( 79%)   0.24 ( 15%) 135.55 ( 
> 78%)
>96M ( 10%)
>  tree Early VRP : 127.83 ( 75%)   0.00 (  0%) 128.11 ( 
> 74%)
>  3141k (  0%)
> 
> Would need to re-check if this still happens with a GCC 14 host compiler.  If
> so, that might be worth investigating as it seems pretty localized.

See PR111622

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600

--- Comment #26 from Robin Dapp  ---
So insn-opinit.cc still takes 2-3 minutes to compile here, even though the file
is not gigantic.
With the same GCC 13.1 x86 host compiler I see:

 phase opt and generate : 170.28 ( 99%)   0.75 ( 48%) 171.43 ( 99%)
  561M ( 60%)
  callgraph functions expansion  :  35.04 ( 20%)   0.50 ( 32%)  35.65 (
21%)   356M ( 38%)
 callgraph ipa passes   : 135.02 ( 79%)   0.24 ( 15%) 135.55 ( 78%)
   96M ( 10%)
 tree Early VRP : 127.83 ( 75%)   0.00 (  0%) 128.11 ( 74%)
 3141k (  0%)

Would need to re-check if this still happens with a GCC 14 host compiler.  If
so, that might be worth investigating as it seems pretty localized.

[Bug c++/111776] ICE on delete expression with multiple viable destroying operator delete

2023-10-13 Thread leni536 at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111776

--- Comment #2 from Lénárd Szolnoki  ---
Same ICE without destroying delete:

```
struct A {
void operator delete(void *);
};

struct B {
void operator delete(void *);
};

struct C : A, B {
using A::operator delete;
using B::operator delete;
};

void f(C* ptr) {
delete ptr;
}
```

This goes back to GCC 7.

GCC 6 accepts it and calls `A::operator delete`, which is not much better.

https://godbolt.org/z/cczfdKoqb

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #2 from JuzheZhong  ---
Note that the reason we adjust the mask mode precision here is because 
the DSE bug for "small mask mode"


https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f

This is the commit show why we adjust precision.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #1 from JuzheZhong  ---
This is RISC-V target specific issue.

ARM SVE can vectorize it.

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-10-13
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
For AVX2 and simdlen(8) we get

   [local count: 1073741824]:
  vect__35.98_51 = VIEW_CONVERT_EXPR(simd.25_11(D));
  vect__36.99_52 = VIEW_CONVERT_EXPR(simd.26_13(D));
  vect__37.100_53 = vect__35.98_51 + vect__36.99_52;
  vect__3.101_54 = VIEW_CONVERT_EXPR(vect__37.100_53);
  mask__38.102_55 = mask.27_15(D) != { 0, 0, 0, 0, 0, 0, 0, 0 };
  if (mask__38.102_55 == { 0, 0, 0, 0, 0, 0, 0, 0 })
goto ; [20.00%]
  else
goto ; [80.00%]

   [local count: 858993419]:
  .MASK_STORE (&retval.21, 256B, mask__38.102_55, vect__3.101_54);

   [local count: 1073741824]:
  _8 = VIEW_CONVERT_EXPR(retval.21);
  return _8;

which is much more reasonable - I'm not sure whether the compare against 0
is required or if the ABI guarantees either 0 or -1 in the elements.  That
we end up with memory due to the use of a .MASK_STORE unfortunately persists
to the final assembly:

_ZGVdM8vv_foo:  
.LFB3:
.cfi_startproc
vpaddd  %ymm1, %ymm0, %ymm0
vpxor   %xmm1, %xmm1, %xmm1
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
vpcmpeqd%ymm1, %ymm2, %ymm2
movq%rsp, %rbp
.cfi_def_cfa_register 6
andq$-32, %rsp
vpcmpeqd%ymm1, %ymm2, %ymm2
vptest  %ymm2, %ymm2
je  .L12
vpmaskmovd  %ymm0, %ymm2, -32(%rsp)
.L12:
vmovdqa -32(%rsp), %ymm0
leave
.cfi_def_cfa 7, 8
ret

there's the opportunity to maybe rewrite .MASK_STORE to a VEC_COND_EXPR
and rewriting retval.21 to SSA.  Maybe if-conversion can see this already
given retval.21 is local automatic and not address taken.

[Bug c/111794] New: RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

Bug ID: 111794
   Summary: RISC-V: Missed SLP optimization due to mask mode
precision
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

void
f (int *restrict x, short *restrict y)
{
  x[0] = x[0] == 1 & y[0] == 2;
  x[1] = x[1] == 1 & y[1] == 2;
  x[2] = x[2] == 1 & y[2] == 2;
  x[3] = x[3] == 1 & y[3] == 2;
  x[4] = x[4] == 1 & y[4] == 2;
  x[5] = x[5] == 1 & y[5] == 2;
  x[6] = x[6] == 1 & y[6] == 2;
  x[7] = x[7] == 1 & y[7] == 2;
}

Realize that we failed to vectorize this case:

https://godbolt.org/z/rWz9fjM4r

The root cause is the mask bit precision of "small mask mode" (Potentially has 
bitsize smaller than 1 bytes).

If we remove this following adjust precision:

ADJUST_PRECISION (V1BI, 1);
ADJUST_PRECISION (V2BI, 2);
ADJUST_PRECISION (V4BI, 4);

ADJUST_PRECISION (RVVMF16BI, riscv_v_adjust_precision (RVVMF16BImode, 4));
ADJUST_PRECISION (RVVMF32BI, riscv_v_adjust_precision (RVVMF32BImode, 2));
ADJUST_PRECISION (RVVMF64BI, riscv_v_adjust_precision (RVVMF64BImode, 1));

It can vectorize such case but will cause bugs in other situations.

Is it possible to fix that in GCC?

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX are highly sub-optimal

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947
 Target||x86_64-*-*
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
#pragma omp declare simd simdlen(16) inbranch
__attribute__((noinline)) int
foo (int a, int b)
{
  return a + b;
}

we present if-conversion with the following which I think is good:

   [local count: 1073741824]:
  # iter.49_5 = PHI <0(2), iter.49_6(7)>
  # ivtmp_19 = PHI <16(2), ivtmp_18(7)>
  b_2 = b.45[iter.49_5];
  a_1 = a.44[iter.49_5];
  _8 = mask.48_7(D) >> iter.49_5;
  _9 = _8 & 1;
  if (_9 == 0)
goto ; [20.00%]
  else
goto ; [80.00%]

   [local count: 214748360]:
  goto ; [100.00%]

   [local count: 858993464]:
  _3 = a_1 + b_2;
  retval.43[iter.49_5] = _3;

   [local count: 1073741824]:
  iter.49_6 = iter.49_5 + 1;
  ivtmp_18 = ivtmp_19 - 1;
  if (ivtmp_18 != 0)
goto ; [100.00%]
  else
goto ; [0.00%]

and we if-convert the body to

   [local count: 1073741824]:
  # iter.49_5 = PHI 
  # ivtmp_19 = PHI 
  b_2 = b.45[iter.49_5];
  a_1 = a.44[iter.49_5];
  _8 = mask.48_7(D) >> iter.49_5;
  _9 = _8 & 1; 
  _22 = _9 == 0;
  _36 = (unsigned int) a_1;
  _37 = (unsigned int) b_2;
  _38 = _36 + _37;
  _3 = (int) _38;
  _39 = ~_22;
  _40 = &retval.43[iter.49_5];
  .MASK_STORE (_40, 32B, _39, _3);
  iter.49_6 = iter.49_5 + 1;
  ivtmp_18 = ivtmp_19 - 1;
  if (ivtmp_18 != 0)
goto ; [100.00%]
  else
goto ; [0.00%]

but rather than recovering the mask in its original form we vectorize this as

  vect_cst__50 = {mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D)};

   [local count: 10631108]:
  # vect_vec_iv_.125_42 = PHI <_43(7), { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15 }(2)>
  # vectp_b.126_44 = PHI 
  # vectp_a.129_47 = PHI 
  # vectp_retval.140_61 = PHI 
  # ivtmp_64 = PHI 
  _43 = vect_vec_iv_.125_42 + { 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16 };
  vect_b_2.128_46 = MEM  [(int *)vectp_b.126_44];
  vect_a_1.131_49 = MEM  [(int *)vectp_a.129_47];
  vect__8.132_51 = vect_cst__50 >> vect_vec_iv_.125_42;
  vect__9.133_53 = vect__8.132_51 & { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1 };
  mask__22.134_55 = vect__9.133_53 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0 };
  vect__36.135_56 = VIEW_CONVERT_EXPR(vect_a_1.131_49);
  vect__37.136_57 = VIEW_CONVERT_EXPR(vect_b_2.128_46);
  vect__38.137_58 = vect__36.135_56 + vect__37.136_57;
  vect__3.138_59 = VIEW_CONVERT_EXPR(vect__38.137_58);
  mask__39.139_60 = ~mask__22.134_55;
  if (mask__39.139_60 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 })
goto ; [20.00%]
  else
goto ; [80.00%]

   [local count: 8504886]:
  .MASK_STORE (vectp_retval.140_61, 512B, mask__39.139_60, vect__3.138_59);

   [local count: 10631108]:
  vectp_b.126_45 = vectp_b.126_44 + 64;
  vectp_a.129_48 = vectp_a.129_47 + 64;
  vectp_retval.140_62 = vectp_retval.140_61 + 64;
  ivtmp_65 = ivtmp_64 + 1;
  if (ivtmp_65 < 1)
goto ; [0.00%]
  else
goto ; [100.00%]

in the end leaving us with

   [local count: 1073741824]:
  vect_cst__50 = {mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
mask.48_7(D), mask.48_7(D)};
  vect__8.132_51 = vect_cst__50 >> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15 };
  vect__9.133_53 = vect__8.132_51 & { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1 };
  vect__36.135_56 = VIEW_CONVERT_EXPR(simd.46_13(D));
  vect__37.136_57 = VIEW_CONVERT_EXPR(simd.47_15(D));
  vect__38.137_58 = vect__36.135_56 + vect__37.136_57;
  vect__3.138_59 = VIEW_CONVERT_EXPR(vect__38.137_58);
  mask__39.139_60 = vect__9.133_53 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0 };
  if (mask__39.139_60 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 })
goto ; [20.00%]
  else
goto ; [80.00%]

   [local count: 858993419]:
  .MASK_STORE (&retval.43, 512B, mask__39.139_60, vect__3.138_59);

   [local count: 1073741824]:
  _10 = VIEW_CONVERT_EXPR(retval.43);
  return _10;

and

_ZGVeM16vv_foo:
.LFB4:
.cfi_startproc
movl$1, %eax
pushq   %rbp
.cfi_def_cfa_offset 16 
.cfi_offset 6, -16
vpbroadcastd%edi, %zmm2
vpaddd  %zmm1, %zmm0, %zmm0
vpbroadcastd%eax, %zmm3
movq%rsp, %rbp
.cfi_def_cfa_register 6
andq$-64, %rsp
vpsrlvd .LC0(%rip), %zmm2, %zmm2
vpxor   %xmm1, %xmm1, %xmm1
vpandd  %zmm3, %zmm2, %zmm2
vpcmpd  $4, %z

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600

--- Comment #25 from Robin Dapp  ---
At least here locally the maximum I saw was 1.4 GB of RES for insn-emit-10.cc. 
That's still not ideal (especially when 8 or 10 of those files compile in
parallel) but at least no 8 GB for a single file anymore.

[Bug tree-optimization/111793] New: OpenMP SIMD inbranch clones for AVX are highly sub-optimal

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793

Bug ID: 111793
   Summary: OpenMP SIMD inbranch clones for AVX are highly
sub-optimal
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

[Bug c++/111788] g++ DWARF for void foo(...) missing unspecified parameters DIE

2023-10-13 Thread gprocida at google dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111788

--- Comment #3 from Giuliano Procida  ---
https://en.cppreference.com/w/cpp/language/variadic_arguments (see introduction
and Notes)

It's been allowed for longer than in C, but there is no portable way of
accessing the arguments.

99 matches

Mail list logo