[Bug tree-optimization/105216] [12/13 regression] 8% regression for m-queens compared to gcc11 O2 on CLX. since r12-3876-g4a960d548b7d7d94

2022-12-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216

--- Comment #15 from Andrew Pinski  ---
Might be interesting to test it again to see if it has been fixed on the trunk.

[Bug target/105010] [12/13 regression] GCC 12 after 20220227 fails to build on powerpc64-freebsd with Error: invalid mfcr mask

2022-12-30 Thread pkubaj at anongoth dot pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105010

--- Comment #17 from Piotr Kubaj  ---
This also affects GCC 10.4 and the snapshots of GCC 11.

[Bug target/108255] Repeated address-of (lea) not optimized for size.

2022-12-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108255

--- Comment #1 from Andrew Pinski  ---
I suspect r0-127773-g3e7291458b96 changed the behavior for GCC 4.9+
I have not figured out what changed the behavior for GCC 4.8 yet though.

I suspect it was just a mistake that GCC 4.8 cost model was incorrect really.

LLVM might be not tuning correctly anyways ...

Also note ICC (not ICX) does the same as GCC ...
So I think this is just a LLVM issue rather than a GCC issue.

Someone who knows more about the x86 processors behavior can explain more.

[Bug c/108255] New: Repeated address-of (lea) not optimized for size.

2022-12-30 Thread witold.baryluk+gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108255

Bug ID: 108255
   Summary: Repeated address-of (lea) not optimized for size.
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: witold.baryluk+gcc at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/q5sx9e49j


void f(int *);

int g(int of) {
int x = 13;
f();
f();
f();
f();
f();
f();
f();
f();
return 0;
}


Got:

g(int):
sub rsp, 24
lea rdi, [rsp+12]
mov DWORD PTR [rsp+12], 13
callf(int*)
lea rdi, [rsp+12] # compute, 5 bytes
callf(int*)
lea rdi, [rsp+12] # recompute, 5 bytes
callf(int*)
lea rdi, [rsp+12] # recompute, 5 bytes
callf(int*)
lea rdi, [rsp+12]
callf(int*)
lea rdi, [rsp+12]
callf(int*)
lea rdi, [rsp+12]
callf(int*)
lea rdi, [rsp+12]
callf(int*)
xor eax, eax
add rsp, 24
ret


But, note that lea is 5 bytes.

Expected (generated by clang 3.0 - 15.0):

g(int):  # @g(int)
pushrbx  # extra, but just 1 byte
sub rsp, 16
mov dword ptr [rsp + 12], 13 # CSE temp
lea rbx, [rsp + 12]
mov rdi, rbx # use
callf(int*)@PLT
mov rdi, rbx # reuse, 3 bytes
callf(int*)@PLT
mov rdi, rbx # reuse, 3 bytes
callf(int*)@PLT
mov rdi, rbx
callf(int*)@PLT
mov rdi, rbx
callf(int*)@PLT
mov rdi, rbx
callf(int*)@PLT
mov rdi, rbx
callf(int*)@PLT
mov rdi, rbx
callf(int*)@PLT
xor eax, eax
add rsp, 16
pop rbx  # extra, but just 1 byte
ret


Technically this is more instructions.

But

mov rdi, rbx is 3 bytes, which is shorter than 5 bytes of lea. This is at minor
expense of needing to save and restore rbx.

PS. Same happens when using temporary `int *const y = `

Also same when optimizing for size (`-Os`).

It looks like gcc 4.8.5 produced expected code, but gcc 4.9.0 does not.

It is possible that the code produced by gcc 4.9.0 is faster, but it is also
likely it contributes quite a bit to binary size.

clang uses CSE even if there are even just two uses of `` in the above
example. It is likely a bit higher threshold is (3 or 4) is actually optimal
(can be calculated knowing encoding sizes).


Weirdly tho, gcc -m32 does this:

g():
pushebp
mov ebp, esp
pushebx
lea ebx, [ebp-12]
sub esp, 32
mov DWORD PTR [ebp-12], 13
pushebx
callf(int*)
mov DWORD PTR [esp], ebx
callf(int*)
mov DWORD PTR [esp], ebx
callf(int*)
mov ebx, DWORD PTR [ebp-4]
xor eax, eax
leave
ret

Where, it does compute address and stores it in temporary. But does it on a
stack, instead in a register (my guess is there are no free register to store
it and it is spilled)., but in fact lea here would be likely faster (mov
DWORD PTR [esp], ebx, but requires memory/cache access, lea is 5 bytes, but
does not require memory access)

[Bug c++/104944] [9/10 Regression] incorrect alignas(void) accepted (with warning if templated)

2022-12-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104944

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail|12.0|11.2.0
   Target Milestone|9.5 |11.3
  Known to work||11.3.0, 12.1.0

[Bug middle-end/35560] Missing CSE/PRE for memory operations involved in virtual call.

2022-12-30 Thread witold.baryluk+gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560

Witold Baryluk  changed:

   What|Removed |Added

 CC||witold.baryluk+gcc at gmail 
dot co
   ||m

--- Comment #15 from Witold Baryluk  ---
I know this is a pretty old bug, but I was exploring some assembly of gcc and
clang on godbolt, and also stumbled into same issue.

https://godbolt.org/z/qPzMhWse1

class A {
public:
virtual int f7(int x) const;
};

int g(const A * const a, int x) {
int r = 0;
for (int i = 0; i < 1; i++)
r += a->f7(x);
return r;
}

(same happens without loop, when just calling a->f7 multiple times)



g(A const*, int):
pushr13
mov r13d, esi
pushr12
xor r12d, r12d
pushrbp
mov rbp, rdi
pushrbx
mov ebx, 1
sub rsp, 8
.L2:
mov rax, QWORD PTR [rbp+0]   # a vtable deref
mov esi, r13d
mov rdi, rbp
call[QWORD PTR [rax]]# f7 indirect call
add r12d, eax
dec ebx
jne .L2

add rsp, 8
pop rbx
pop rbp
mov eax, r12d
pop r12
pop r13
ret


I was expecting  mov rax, QWORD PTR [rbp+0] and call[QWORD PTR [rax]],
to be hoisted out of the loop (call converted to lea, and call register).


A bit sad.

Is there some recent work done on this optimization?

Are there at least some cases where it is valid to do CSE, or change code so it
is moved out of the loop?

[Bug c++/105200] user-defined operator <=> for enumerated types is ignored

2022-12-30 Thread barry.revzin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105200

Barry Revzin  changed:

   What|Removed |Added

 CC||barry.revzin at gmail dot com

--- Comment #6 from Barry Revzin  ---
This strikes me as a definite wording issue rather than actual design intent.
Patrick is correct as to what the wording says - it says non-member candidate,
so the rewritten candidates don't count. But I think really we should also
consider rewritten candidates. Clang and MSVC both do - which seems much more
in line with expectation and the original design.

For class types, you can just provide <=>, but for enums, you have to provide
<, >, <=, >=, and <=>?? 

I'm opening a Core issue for this: https://github.com/cplusplus/CWG/issues/205

[Bug c++/108254] New: Usage of requires expression with an immedietely invoked lambda expression results in compile error instead of evaluating to false

2022-12-30 Thread avr5309 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108254

Bug ID: 108254
   Summary: Usage of requires expression with an immedietely
invoked lambda expression results in compile error
instead of evaluating to false
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avr5309 at gmail dot com
  Target Milestone: ---

Substitution failure of an immedietely invoked lambda expression within an
requires expression results in a compile error instead of the requires
expression evaluating to 'false' (by my understanding of 
http://eel.is/c++draft/expr.prim.req#general-5 . Clang 15 comforms to my
expectation).

### SOURCE:

The single source file (bug.cpp):-

template 
concept Container = requires(T t) {
{
[](T const& t){
for(auto&& v : t)
;
}(t)
};
};

int main() {
static_assert(!Container);
}

### COMPILER INVOCATION:

g++ -fsyntax-only -std=c++20 -Wall -Wextra -pedantic-errors -xc++ -

### ACTUAL OUTPUT:

The following error message:-

bug.cpp: In lambda function:
bug.cpp:5:13: error: 'begin' was not declared in this scope
5 | for(auto&& v : t)
  | ^~~
bug.cpp:5:13: error: 'end' was not declared in this scope
bug.cpp: In function 'int main()':
bug.cpp:12:19: error: static assertion failed
   12 | static_assert(!Container);
  |   ^~~

### EXPECTED OUTPUT:

(clean compile)

### COMPILER VERSION INFO (g++ -v):

Reading specs from /usr/lib64/gcc/x86_64-unknown-linux-gnu/12.2.0/specs
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-unknown-linux-gnu/12.2.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /builddir/gcc-12.2.0/configure
--build=x86_64-unknown-linux-gnu --enable-gnu-unique-object
--enable-vtable-verify --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --libexecdir=/usr/lib64 --libdir=/usr/lib64
--enable-threads=posix --enable-__cxa_atexit --disable-multilib
--with-system-zlib --enable-shared --enable-lto --enable-plugins
--enable-linker-build-id --disable-werror --disable-nls --enable-default-pie
--enable-default-ssp --enable-checking=release --disable-libstdcxx-pch
--with-isl --with-linker-hash-style=gnu --disable-sjlj-exceptions
--disable-target-libiberty
--enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)

[Bug preprocessor/108244] [13 Regression] `pragma GCC diagnostic` and -E -fdirectives-only causes the preprocessor to become confused

2022-12-30 Thread lhyatt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108244

--- Comment #8 from Lewis Hyatt  ---
Here is the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609275.html

[Bug libstdc++/108235] FAIL: g++.dg/compat/abi/bitfield1 cp_compat_x_tst.o-cp_compat_y_tst.o link

2022-12-30 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108235

Jonathan Wakely  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-12-30
 Status|UNCONFIRMED |ASSIGNED

[Bug libstdc++/108235] FAIL: g++.dg/compat/abi/bitfield1 cp_compat_x_tst.o-cp_compat_y_tst.o link

2022-12-30 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108235

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org

[Bug pch/105858] MinGW-w64 64-bit build with --libstdcxx-pch: fatal error: cannot write PCH file: required memory segment unavailable

2022-12-30 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105858

--- Comment #8 from Brecht Sanders  
---
I seem to be having some success after applying patches based on:
https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-gcc/0010-Fix-using-large-PCH.patch
https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-gcc/0021-PR14940-Allow-a-PCH-to-be-mapped-to-a-different-addr.patch

My patch for GCC 12.2.0 looks like this:
patch -ulbf gcc/config/i386/host-mingw32.cc << EOF
@@ -46,5 +46,2 @@

-/* FIXME: Is this big enough?  */
-static const size_t pch_VA_max_size  = 128 * 1024 * 1024;
-
 /* Granularity for reserving address space.  */
@@ -90,5 +87,2 @@
   void* res;
-  size = (size + va_granularity - 1) & ~(va_granularity - 1);
-  if (size > pch_VA_max_size)
-return NULL;

@@ -102,3 +96,3 @@

-  res = VirtualAlloc (NULL, pch_VA_max_size,
+  res = VirtualAlloc (NULL, size,
  MEM_RESERVE | MEM_TOP_DOWN,
@@ -143,3 +137,2 @@
   OSVERSIONINFO version_info;
-  int r;

@@ -152,3 +145,3 @@
  this to work.  We can't change the offset. */
-  if ((offset & (va_granularity - 1)) != 0 || size > pch_VA_max_size)
+  if ((offset & (va_granularity - 1)) != 0)
 return -1;
@@ -177,21 +170,20 @@

-  /* Retry five times, as here might occure a race with multiple gcc's
- instances at same time.  */
-  for (r = 0; r < 5; r++)
-   {
-  mmap_addr = MapViewOfFileEx (mmap_handle, FILE_MAP_COPY, 0, offset,
-  size, addr);
-  if (mmap_addr == addr)
-   break;
-  if (r != 4)
-Sleep (500);
-   }
-
-  if (mmap_addr != addr)
+  /* Try mapping the file at \`addr\`.  */
+  mmap_addr = MapViewOfFileEx (mmap_handle, FILE_MAP_COPY, 0, offset,
+  size, addr);
+  if (mmap_addr == NULL)
 {
-  w32_error (__FUNCTION__, __FILE__, __LINE__, "MapViewOfFileEx");
-  CloseHandle(mmap_handle);
-  return  -1;
+  /* We could not map the file at its original address, so let the
+system choose a different one. The PCH can be relocated later.  */
+  mmap_addr = MapViewOfFileEx (mmap_handle, FILE_MAP_COPY, 0, offset,
+  size, NULL);
+  if (mmap_addr == NULL)
+   {
+ w32_error (__FUNCTION__, __FILE__, __LINE__, "MapViewOfFileEx");
+ CloseHandle(mmap_handle);
+ return  -1;
+   }
 }

+  addr = mmap_addr;
   return 1;
EOF

[Bug target/107714] MVE: Invalid addressing mode generated for VLD2

2022-12-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Stam Markianos-Wright
:

https://gcc.gnu.org/g:4269a6567eb991e6838f40bda5be9e3a7972530c

commit r13-4935-g4269a6567eb991e6838f40bda5be9e3a7972530c
Author: Stam Markianos-Wright 
Date:   Fri Dec 30 11:25:22 2022 +

Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]

In the M-Class Arm-ARM:

https://developer.arm.com/documentation/ddi0553/bu/?lang=en

these MVE instructions only have '!' writeback variant and at:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.

Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).

No regressions in arm-none-eabi with MVE and MVE.FP.

gcc/ChangeLog:
PR target/107714
* config/arm/arm-protos.h (mve_struct_mem_operand): New protoype.
* config/arm/arm.cc (mve_struct_mem_operand): New function.
* config/arm/constraints.md (Ug): New constraint.
* config/arm/mve.md (mve_vst4q): Change constraint.
(mve_vst2q): Likewise.
(mve_vld4q): Likewise.
(mve_vld2q): Likewise.
* config/arm/predicates.md (mve_struct_operand): New predicate.

gcc/testsuite/ChangeLog:
PR target/107714
* gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.

[Bug target/108232] Bus & Segmentation error on lz4_decompress.c while make linux-raspi

2022-12-30 Thread registration at filiproland dot hu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108232

--- Comment #10 from Filip Roland  ---
Hi!

The gcc-13 not released yet but i make 'with' gcc-12 and that went through at
lz4_decompress.o...

Thanks!