[Bug c/110910] New: weakref should allow incomplete array type

2023-08-04 Thread jrtc27 at jrtc27 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110910

Bug ID: 110910
   Summary: weakref should allow incomplete array type
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jrtc27 at jrtc27 dot com
  Target Milestone: ---

Consider:

extern char foo[];
static char weak_foo[] __attribute__((weakref("foo")));

Normally, being a tentative type with internal linkage, weak_foo would not be
allowed to have an incomplete type, and this is what GCC enforces today
("warning: array 'weak_foo' assumed to have one element" / "error: array size
missing in 'weak_foo'" depending on -pedantic). However, weakref is special,
and makes weak_foo not exist at all, but instead be an alias for foo, which can
legitimately have internal linkage. Therefore I believe this restriction,
namely C99 6.9.2p3, should be relaxed when weakref is used. This does mean the
diagnostic for something like:

static char weak_foo[];
...
static char weak_foo[] __attribute__((weakref("foo")));

needs to be delayed until the whole file has been parsed, but given GCC already
supports:

static char foo[];
...
static char foo[42];

as an extension that doesn't seem to be a problem.

[Bug libstdc++/110909] Missed optimization: Suboptimal codegen in vector copy assignment

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110909

Andrew Pinski  changed:

   What|Removed |Added

  Component|c++ |libstdc++

--- Comment #1 from Andrew Pinski  ---
_M_allocate_and_copy calls __uninitialized_copy_a which basically calls
std::copy (well kinda) which uses memmove.

[Bug c++/110909] New: Suboptimal codegen in vector copy assignment

2023-08-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110909

Bug ID: 110909
   Summary: Suboptimal codegen in vector copy assignment
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

using Container = std::vector;
int copy_assignment(const Container , Container ) {
  v2 = v1;
  return 0;
}


I'd expect this to only generate a memcpy. but i'm not sure why memmoves are
generated?

$ gcc -std=c++2a -O3  -fno-exceptions

copy_assignment(std::vector > const&, std::vector >&):
cmp rsi, rdi
je  .L21
pushr13
pushr12
pushrbp
mov rbp, rdi
pushrbx
mov rbx, rsi
sub rsp, 8
mov rax, QWORD PTR [rdi+8]
mov r13, QWORD PTR [rdi]
mov rdx, QWORD PTR [rsi+16]
mov rdi, QWORD PTR [rsi]
mov r12, rax
sub r12, r13
sub rdx, rdi
cmp rdx, r12
jb  .L25
mov rcx, QWORD PTR [rsi+8]
mov rdx, rcx
sub rdx, rdi
cmp rdx, r12
jnb .L26
cmp rdx, 4
jle .L12
mov rsi, r13
callmemmove
mov rcx, QWORD PTR [rbx+8]
mov rdi, QWORD PTR [rbx]
mov rax, QWORD PTR [rbp+8]
mov r13, QWORD PTR [rbp+0]
mov rdx, rcx
sub rdx, rdi
.L13:
lea rsi, [r13+0+rdx]
sub rax, rsi
mov rdx, rax
cmp rax, 4
jle .L14
mov rdi, rcx
callmemmove
mov rax, QWORD PTR [rbx]
add rax, r12
.L8:
mov QWORD PTR [rbx+8], rax
add rsp, 8
xor eax, eax
pop rbx
pop rbp
pop r12
pop r13
ret
.L21:
xor eax, eax
ret
.L25:
movabs  rax, 9223372036854775804
cmp rax, r12
jb  .L27
mov rdi, r12
calloperator new(unsigned long)
mov rbp, rax
cmp r12, 4
jle .L5
mov rdx, r12
mov rsi, r13
mov rdi, rax
callmemcpy
.L6:
mov rdi, QWORD PTR [rbx]
testrdi, rdi
je  .L7
mov rsi, QWORD PTR [rbx+16]
sub rsi, rdi
calloperator delete(void*, unsigned long)
.L7:
lea rax, [rbp+0+r12]
mov QWORD PTR [rbx], rbp
mov QWORD PTR [rbx+16], rax
jmp .L8
.L26:
cmp r12, 4
jle .L10
mov rdx, r12
mov rsi, r13
callmemmove
mov rax, QWORD PTR [rbx]
add rax, r12
jmp .L8
.L14:
lea rax, [rdi+r12]
jne .L8
mov edx, DWORD PTR [rsi]
mov DWORD PTR [rcx], edx
jmp .L8
.L12:
jne .L13
mov esi, DWORD PTR [r13+0]
mov DWORD PTR [rdi], esi
jmp .L13
.L10:
lea rax, [rdi+r12]
jne .L8
mov edx, DWORD PTR [r13+0]
mov DWORD PTR [rdi], edx
jmp .L8
.L5:
mov eax, DWORD PTR [r13+0]
mov DWORD PTR [rbp+0], eax
jmp .L6
.L27:
callstd::__throw_bad_array_new_length()



Ideally, the above C++ code should translate to an equivalent of the following
C++ code:

using Container = std::vector;
int copy_assignment(const Container , Container ) {
  v2.reserve(v1.size());
  std::memcpy([0], [0], v1.size()*sizeof(int));
  // change the size: v2.size() = v1.size()
  return 0;
}

[Bug tree-optimization/96695] Failure to optimize combination of pointer comparison to nullptr and another pointer

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96695

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #3 from Andrew Pinski  ---
We handle:
bool f2(unsigned x, unsigned y)
{
return (x == 0) && (x > y);
}

via:

/* x >  y  &&  x != XXX_MIN  -->  x > y
   x >  y  &&  x == XXX_MIN  -->  false . */

min_value is defined as:
(match min_value
 INTEGER_CST
 (if (INTEGRAL_TYPE_P (type)
  && wi::eq_p (wi::to_wide (t), wi::min_value (type)

I suspect we could add POINTER_TYPE_P there and it will work.

[Bug fortran/110888] Missing optimization for trivial MATMUL cases, requires -fno-signed-zeros

2023-08-04 Thread rimvydas.jas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110888

--- Comment #5 from Rimvydas (RJ)  ---
It is more like this problem:
$ cat foo.c
void foo_(double *x, double *y, double *z)
{
  int i;
  __builtin_memset(z, 0, 8); /* z[0] = 0.0; */
  for (i=0; i<1 ; i++)
z[0] += x[0] * y[0];
}

$ gcc -O2 -Wall -Wextra -c foo.c -S -fdump-tree-optimized
   [local count: 536870913]:
  __builtin_memset (z_9(D), 0, 8);
  _17 = *x_11(D);
  _18 = *y_12(D);
  _19 = _17 * _18;
  _20 = _19 + 0.0;
  *z_9(D) = _20;
  return;

It would be beneficial for all frontends if the use of __builtin_memset() to
zero out accumulators would be be considered as !HONOR_SIGNED_ZEROS at least
during PRE pass in the middle-end.  If that would complicate things, then
easier solution is to add special case in gfortran frontend-passes that simply
transforms expression to drop accumulator: z[0] = x[0] * y[0];

* side note, in C the redundant __builtin_memset() does not get optimized out,
unlike in gfortran "zero" expr version.  Might be middle-end optimization
passes ordering issue.  At least PRE pass does take accumulator zeroing into
account.

[Bug target/108743] [objective-c, NeXT runtime] -fconstant-cfstrings not supported

2023-08-04 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743

--- Comment #12 from Eric Gallager  ---
So I'm assuming this is staying open for backports to the 12 and 11 branches?

[Bug tree-optimization/56096] Sub-optimal code generated for conditional shift

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56096

--- Comment #4 from Andrew Pinski  ---
I think one way to improve the constant forming is during isel if we have
condition exec (e.g. arm or ia64) we should transform:
  _7 = _1 != 0;
  _8 = (int) _7;
  _9 = _8 << 3;
to:
_9 = _1 ? 8 : 0;
That will at least fix this part:
movne   r3, #1
moveq   r3, #0
lslsr3, r3, #3

To fix the other part, I don't know ...

[Bug tree-optimization/56096] Sub-optimal code generated for conditional shift

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56096

--- Comment #3 from Andrew Pinski  ---
GCC 7-10 produced:
movwr3, #32896
tst r1, r3
it  ne
lsrne   r0, r0, #8
bx  lr

But GCC 11 produces:
movwr3, #32896
tst r1, r3
ite ne
movne   r3, #1
moveq   r3, #0
lslsr3, r3, #3
lsrsr0, r0, r3

Which is definitely worse.

[Bug tree-optimization/70901] vectorized sin cos is wrongly optimized into scalar sincos

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70901

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/88387] Possible code optimization when right shift count >= width of type

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88387

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
   Target Milestone|--- |13.0
 Status|NEW |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Starting in GCC 13 we remove the branch fully and assume n >> 222 is always 0.

In ethread:
Checking profitability of path (backwards):  bb:2 (3 insns) bb:0
  Control statement insns: 2
  Overall: 1 insns

 Registering killing_def (path_oracle) _1
Checking profitability of path (backwards): 
  [1] Registering jump thread: (0, 2) incoming edge;  (2, 4) nocopy; 
path: 0->2->4 SUCCESS

[Bug tree-optimization/84997] Optimize integer operations on floating point constants without -ffast-math but with -fno-trapping-math

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84997

Andrew Pinski  changed:

   What|Removed |Added

Summary|Optimize integer operations |Optimize integer operations
   |on floating point constants |on floating point constants
   |without -ffast-math |without -ffast-math but
   ||with -fno-trapping-math

--- Comment #6 from Andrew Pinski  ---
>clang optimizes to the following assembly

Note clang defaults to -fno-trapping-math.
Using -ftrapping-math with clang produces what GCC produces currently:
cvtsi2sd%edi, %xmm0
addsd   .LCPI0_0(%rip), %xmm0
cvttsd2si   %xmm0, %eax
retq

[Bug tree-optimization/20514] hoisting of label out of jumptable would take place at cse, should happen at trees

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=20514

--- Comment #9 from Andrew Pinski  ---
in GCC 13+ we get now (for the testcase in comment #0) at -O3:
```
   [local count: 452984831]:
  # i_lsm.4_2 = PHI 
  if (i_lsm.4_2 == 5)
goto ; [25.00%]
  else
goto ; [75.00%]

   [local count: 452984832]:
  goto ; [100.00%]
```
This is an infinite no matter which branch is taken ...
Coming in we get either `3->4->4->5` or `3->4->5` no matter what we end up in 5
always.

[Bug tree-optimization/80740] Aliasing with the return value

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80740

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2021-05-25 00:00:00 |2023-8-4
   Severity|normal  |enhancement

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 49773, which changed state.

Bug 49773 Summary: use of class data members prevent vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49773

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/49773] use of class data members prevent vectorization

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49773

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |6.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
starting in GCC 6 can do it without aliasing checks.
Starting in GCC 8, the prologue code has improved.

So we can close this as fixed in GCC 6 really.

[Bug target/110908] [aarch64] Internal compiler error when using -ffixed-x30

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110908

--- Comment #3 from Andrew Pinski  ---
This is most likely the fix:
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 7cd230c4602..4d26c8789ce 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8470,7 +8470,8 @@ aarch64_layout_frame (void)
   /* ... and any callee saved register that dataflow says is live.  */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
-   && !fixed_regs[regno]
+   && (regno == R30_REGNUM
+   || !fixed_regs[regno])
&& (regno == R30_REGNUM
|| !crtl->abi->clobbers_full_reg_p (regno)))
   frame.reg_offset[regno] = SLOT_REQUIRED;

[Bug target/110908] [aarch64] Internal compiler error when using -ffixed-x30

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110908

--- Comment #2 from Andrew Pinski  ---
Does LLVM still save/restore LR with -ffixed-x30 ?

[Bug target/110908] [aarch64] Internal compiler error when using -ffixed-x30

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110908

--- Comment #1 from Andrew Pinski  ---
```
Treat the register named reg as a fixed register; generated code should never
refer to it (except perhaps as a stack pointer, frame pointer or in some other
fixed role).

```

[Bug tree-optimization/69489] missed vectorization for boolean loop, missed if-conversion

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69489

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2016-01-26 00:00:00 |2023-8-4

--- Comment #20 from Andrew Pinski  ---
Note one thing I noticed there is a slightly different IR between using the C
and C++ front-end. An extra cast when using the C front-end.

[Bug target/110908] New: [aarch64] Internal compiler error when using -ffixed-x30

2023-08-04 Thread zach-gcc at cs dot stanford.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110908

Bug ID: 110908
   Summary: [aarch64] Internal compiler error when using
-ffixed-x30
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zach-gcc at cs dot stanford.edu
  Target Milestone: ---

Example:

```
void bar();

int foo() {
bar();
return 0;
}
```

Compile:

```
$ aarch64-none-elf-gcc -ffixed-x30 -c test.c
during RTL pass: ira
test.c: In function 'foo':
test.c:6:1: internal compiler error: in aarch64_layout_frame, at
config/aarch64/aarch64.cc:8483
6 | }
  | ^
0x7f56c8d17d8f __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x7f56c8d17e3f __libc_start_main_impl
../csu/libc-start.c:392
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
```

I realize it's not clear exactly what GCC should do if the return address
register is fixed. Here's what LLVM/Clang does, and what I would like GCC to
do: it continues to use x30 as the return address, but never uses it as a
general-purpose register for any computations (I'm not sure if GCC will ever do
this to begin with, or if this behavior can be disabled via another mechanism).
At the very least, it probably shouldn't cause an internal compiler error.

Thanks!

Re: Re: cpymem for RISCV with v extension

2023-08-04 Thread 钟居哲
>> Umm, this patch has been queued up for at least a couple weeks now.

Oh. I am sorry I didn't see this patch since this patch doesn't CC me.
I didn't subscribe GCC-patch, so I may miss some patches that didn't explicitly 
CC me.

I just happen to see your reply email today then reply.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-05 07:17
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc; Joern Rennecke
Subject: Re: cpymem for RISCV with v extension
 
 
On 8/4/23 17:10, 钟居哲 wrote:
> Could you add testcases for this patch?
Testing what specifically?  Are you asking for correctness tests, 
performance/code quality tests?
 
 
> 
> +;; The (use (and (match_dup 1) (const_int 127))) is here to prevent the
> +;; optimizers from changing cpymem_loop_* into this.
> +(define_insn "@cpymem_straight"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "r,r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "r,r")))
> + (use (and (match_dup 1) (const_int 127)))
> +   (use (match_operand:P 2 "reg_or_int_operand" "r,K"))
> +   (clobber (match_scratch:V_WHOLE 3 "=,"))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +  "@vsetvli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)
> +   vsetivli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)"
> +)
> +
> +(define_insn "@cpymem_loop"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "+r")))
> +   (use (match_operand:P 2 "register_operand" "+r"))
> +   (clobber (match_scratch:V_WHOLE 3 "="))
> +   (clobber (match_scratch:P 4 "="))
> +   (clobber (match_dup 0))
> +   (clobber (match_dup 1))
> +   (clobber (match_dup 2))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +{ output_asm_insn ("\n0:\t" "vsetvli %4,%2,e,m8,ta,ma\;"
> +"vle.v %3,(%1)\;"
> +"sub %2,%2,%4", operands);
> +  if ( != 8)
> +{
> +  rtx xop[2];
> +  xop[0] = operands[4];
> +  xop[1] = GEN_INT (exact_log2 (/8));
> +  output_asm_insn ("slli %0,%0,%1", xop);
> +}
> +  output_asm_insn ("add %1,%1,%4\;"
> +"vse.v %3,(%0)\;"
> +"add %0,%0,%4\;"
> +"bnez %2,0b", operands);
> +  return "";
> +})
> +
> +;; This pattern (at bltu) assumes pointers can be treated as unsigned,
> +;; i.e.  objects can't straddle 0x / 0x .
> +(define_insn "@cpymem_loop_fast"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "+r")))
> +   (use (match_operand:P 2 "register_operand" "+r"))
> +   (clobber (match_scratch:V_WHOLE 3 "="))
> +   (clobber (match_scratch:P 4 "="))
> +   (clobber (match_scratch:P 5 "="))
> +   (clobber (match_scratch:P 6 "="))
> +   (clobber (match_dup 0))
> +   (clobber (match_dup 1))
> +   (clobber (match_dup 2))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +{
> +  output_asm_insn ("vsetvli %4,%2,e,m8,ta,ma\;"
> +"beq %4,%2,1f\;"
> +"add %5,%0,%2\;"
> +"sub %6,%5,%4", operands);
> +  if ( != 8)
> +{
> +  rtx xop[2];
> +  xop[0] = operands[4];
> +  xop[1] = GEN_INT (exact_log2 (/8));
> +  output_asm_insn ("slli %0,%0,%1", xop);
> +}
> +  output_asm_insn ("\n0:\t" "vle.v %3,(%1)\;"
> +"add %1,%1,%4\;"
> +"vse.v %3,(%0)\;"
> +"add %0,%0,%4\;"
>>>"bltu %0,%6,0b\;"
>>>"sub %5,%5,%0", operands);
>>>   if ( != 8)
>>> {
>>>   rtx xop[2];
>>>   xop[0] = operands[4];
>>>   xop[1] = GEN_INT (exact_log2 (/8));
>>>   output_asm_insn ("srli %0,%0,%1", xop);
>>>  }
>>>   output_asm_insn ("vsetvli %4,%5,e,m8,ta,ma\n"
>>> "1:\t" "vle.v %3,(%1)\;"
>>>"vse.v %3,(%0)", operands);
>>>   return "";
>>> })
> 
> I don't think they are necessary.
What specifically do you think is not necessary?
 
 
> 
>>> Just post the update for archival purposes and consider
>>> it pre-approved for the trunk.
> 
> I am so sorry that I disagree approve this patch too fast.
Umm, this patch has been queued up for at least a couple weeks now.
 
> 
> It should be well tested.
If you refer to Joern's message he indicated how it was tested.  Joern 
is a long time GCC developer and is well aware of how to test code.
 
 
It was tested on this set of multilibs without regressions:
 
>riscv-sim
> 
> riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
> 
> riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
> 
> riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
> 
> riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
> 
> 

Re: Re: cpymem for RISCV with v extension

2023-08-04 Thread 钟居哲
>> Testing what specifically?  Are you asking for correctness tests,
>> performance/code quality tests?

Add memcpy test using RVV instructions, just like we are adding testcases for 
auto-vectorization support.

For example:

#include 
#include 
#include 

void foo (int32_t * a, int32_t * b, int num)
{
  memcpy (a, b, num);
}


In my downstream LLVM/GCC codegen:
foo:
.L2:
vsetvli a5,a2,e8,m8,ta,ma
vle8.v  v24,(a1)
sub a2,a2,a5
vse8.v  v24,(a0)
add a1,a1,a5
add a0,a0,a5
bne a2,zero,.L2
ret

Another example:
void foo (int32_t * a, int32_t * b, int num)
{
  memcpy (a, b, 4);
}


My downstream LLVM/GCC assembly:

foo:
vsetvli zero,16,e8,m1,ta,ma
vle8.v v24,(a1)
vse8.v v24,(a0)
ret

>> What specifically do you think is not necessary?
> +(define_insn "@cpymem_loop"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "+r")))
> +   (use (match_operand:P 2 "register_operand" "+r"))
> +   (clobber (match_scratch:V_WHOLE 3 "="))
> +   (clobber (match_scratch:P 4 "="))
> +   (clobber (match_dup 0))
> +   (clobber (match_dup 1))
> +   (clobber (match_dup 2))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +{ output_asm_insn ("\n0:\t" "vsetvli %4,%2,e,m8,ta,ma\;"
> +"vle.v %3,(%1)\;"
> +"sub %2,%2,%4", operands);
> +  if ( != 8)
> +{
> +  rtx xop[2];
> +  xop[0] = operands[4];
> +  xop[1] = GEN_INT (exact_log2 (/8));
> +  output_asm_insn ("slli %0,%0,%1", xop);
> +}
> +  output_asm_insn ("add %1,%1,%4\;"
> +"vse.v %3,(%0)\;"
> +"add %0,%0,%4\;"
> +"bnez %2,0b", operands);
> +  return "";
> +})

For example, this pattern, we could simpilfy emit insn with:

emit_label ...
emit_insn (gen_add...)
emit_insn (gen_pred_store...)
emit_insn (gen_add...)
emit_branch()

I don't see why it is necessary we should use such explicit pattern with 
explict multiple assembly.
More details, you can see "rvv-next" (a little bit different from my downstream 
but generally idea same).



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-05 07:17
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc; Joern Rennecke
Subject: Re: cpymem for RISCV with v extension
 
 
On 8/4/23 17:10, 钟居哲 wrote:
> Could you add testcases for this patch?
Testing what specifically?  Are you asking for correctness tests, 
performance/code quality tests?
 
 
> 
> +;; The (use (and (match_dup 1) (const_int 127))) is here to prevent the
> +;; optimizers from changing cpymem_loop_* into this.
> +(define_insn "@cpymem_straight"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "r,r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "r,r")))
> + (use (and (match_dup 1) (const_int 127)))
> +   (use (match_operand:P 2 "reg_or_int_operand" "r,K"))
> +   (clobber (match_scratch:V_WHOLE 3 "=,"))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +  "@vsetvli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)
> +   vsetivli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)"
> +)
> +
> +(define_insn "@cpymem_loop"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "+r")))
> +   (use (match_operand:P 2 "register_operand" "+r"))
> +   (clobber (match_scratch:V_WHOLE 3 "="))
> +   (clobber (match_scratch:P 4 "="))
> +   (clobber (match_dup 0))
> +   (clobber (match_dup 1))
> +   (clobber (match_dup 2))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +{ output_asm_insn ("\n0:\t" "vsetvli %4,%2,e,m8,ta,ma\;"
> +"vle.v %3,(%1)\;"
> +"sub %2,%2,%4", operands);
> +  if ( != 8)
> +{
> +  rtx xop[2];
> +  xop[0] = operands[4];
> +  xop[1] = GEN_INT (exact_log2 (/8));
> +  output_asm_insn ("slli %0,%0,%1", xop);
> +}
> +  output_asm_insn ("add %1,%1,%4\;"
> +"vse.v %3,(%0)\;"
> +"add %0,%0,%4\;"
> +"bnez %2,0b", operands);
> +  return "";
> +})
> +
> +;; This pattern (at bltu) assumes pointers can be treated as unsigned,
> +;; i.e.  objects can't straddle 0x / 0x .
> +(define_insn "@cpymem_loop_fast"
> +  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
> + (mem:BLK (match_operand:P 1 "register_operand" "+r")))
> +   (use (match_operand:P 2 "register_operand" "+r"))
> +   (clobber (match_scratch:V_WHOLE 3 "="))
> +   (clobber (match_scratch:P 4 "="))
> +   (clobber (match_scratch:P 5 "="))
> +   (clobber (match_scratch:P 6 "="))
> +   (clobber (match_dup 0))
> +   (clobber (match_dup 1))
> +   (clobber (match_dup 2))
> +   (clobber (reg:SI VL_REGNUM))
> +   (clobber (reg:SI VTYPE_REGNUM))]
> +  "TARGET_VECTOR"
> +{
> +  output_asm_insn ("vsetvli %4,%2,e,m8,ta,ma\;"
> +"beq %4,%2,1f\;"
> +"add %5,%0,%2\;"
> +"sub %6,%5,%4", operands);
> +  if ( != 8)
> +{
> +  rtx xop[2];
> +

Re: [PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-08-04 Thread Jeff Law via Gcc-patches




On 7/20/23 18:59, HAO CHEN GUI wrote:

Hi Jeff,

在 2023/7/21 5:27, Jeff Law 写道:

Wouldn't it make more sense to just try rotate/mask in the original mode before 
trying a shift in a widened mode?  I'm not sure why we need a target hook here.


There is no change to try rotate/mask with the original mode when
expensive_optimizations is set. The subst widens the shift mode.

But we can add it before the attempt in the wider mode.



   if (flag_expensive_optimizations)
 {
   /* Pass pc_rtx so no substitutions are done, just
  simplifications.  */
   if (i1)
 {
   subst_low_luid = DF_INSN_LUID (i1);
   i1src = subst (i1src, pc_rtx, pc_rtx, 0, 0, 0);
 }

   subst_low_luid = DF_INSN_LUID (i2);
   i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);
 }

I don't know if the wider mode is helpful to other targets, so
I added the target hook.
In this scenario we're often better off relying on rtx_costs (even with 
all its warts) rather than adding yet another target hook.


I'd love to hear from Segher here to see if he's got other ideas.

jeff


Re: cpymem for RISCV with v extension

2023-08-04 Thread Jeff Law via Gcc-patches




On 8/4/23 17:10, 钟居哲 wrote:

Could you add testcases for this patch?
Testing what specifically?  Are you asking for correctness tests, 
performance/code quality tests?





+;; The (use (and (match_dup 1) (const_int 127))) is here to prevent the
+;; optimizers from changing cpymem_loop_* into this.
+(define_insn "@cpymem_straight"
+  [(set (mem:BLK (match_operand:P 0 "register_operand" "r,r"))
+   (mem:BLK (match_operand:P 1 "register_operand" "r,r")))
+   (use (and (match_dup 1) (const_int 127)))
+   (use (match_operand:P 2 "reg_or_int_operand" "r,K"))
+   (clobber (match_scratch:V_WHOLE 3 "=,"))
+   (clobber (reg:SI VL_REGNUM))
+   (clobber (reg:SI VTYPE_REGNUM))]
+  "TARGET_VECTOR"
+  "@vsetvli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)
+   vsetivli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)"
+)
+
+(define_insn "@cpymem_loop"
+  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
+   (mem:BLK (match_operand:P 1 "register_operand" "+r")))
+   (use (match_operand:P 2 "register_operand" "+r"))
+   (clobber (match_scratch:V_WHOLE 3 "="))
+   (clobber (match_scratch:P 4 "="))
+   (clobber (match_dup 0))
+   (clobber (match_dup 1))
+   (clobber (match_dup 2))
+   (clobber (reg:SI VL_REGNUM))
+   (clobber (reg:SI VTYPE_REGNUM))]
+  "TARGET_VECTOR"
+{ output_asm_insn ("\n0:\t" "vsetvli %4,%2,e,m8,ta,ma\;"
+  "vle.v %3,(%1)\;"
+  "sub %2,%2,%4", operands);
+  if ( != 8)
+{
+  rtx xop[2];
+  xop[0] = operands[4];
+  xop[1] = GEN_INT (exact_log2 (/8));
+  output_asm_insn ("slli %0,%0,%1", xop);
+}
+  output_asm_insn ("add %1,%1,%4\;"
+  "vse.v %3,(%0)\;"
+  "add %0,%0,%4\;"
+  "bnez %2,0b", operands);
+  return "";
+})
+
+;; This pattern (at bltu) assumes pointers can be treated as unsigned,
+;; i.e.  objects can't straddle 0x / 0x .
+(define_insn "@cpymem_loop_fast"
+  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
+   (mem:BLK (match_operand:P 1 "register_operand" "+r")))
+   (use (match_operand:P 2 "register_operand" "+r"))
+   (clobber (match_scratch:V_WHOLE 3 "="))
+   (clobber (match_scratch:P 4 "="))
+   (clobber (match_scratch:P 5 "="))
+   (clobber (match_scratch:P 6 "="))
+   (clobber (match_dup 0))
+   (clobber (match_dup 1))
+   (clobber (match_dup 2))
+   (clobber (reg:SI VL_REGNUM))
+   (clobber (reg:SI VTYPE_REGNUM))]
+  "TARGET_VECTOR"
+{
+  output_asm_insn ("vsetvli %4,%2,e,m8,ta,ma\;"
+  "beq %4,%2,1f\;"
+  "add %5,%0,%2\;"
+  "sub %6,%5,%4", operands);
+  if ( != 8)
+{
+  rtx xop[2];
+  xop[0] = operands[4];
+  xop[1] = GEN_INT (exact_log2 (/8));
+  output_asm_insn ("slli %0,%0,%1", xop);
+}
+  output_asm_insn ("\n0:\t" "vle.v %3,(%1)\;"
+  "add %1,%1,%4\;"
+  "vse.v %3,(%0)\;"
+  "add %0,%0,%4\;"

   "bltu %0,%6,0b\;"
   "sub %5,%5,%0", operands);
  if ( != 8)
{
  rtx xop[2];
  xop[0] = operands[4];
  xop[1] = GEN_INT (exact_log2 (/8));
  output_asm_insn ("srli %0,%0,%1", xop);
 }
  output_asm_insn ("vsetvli %4,%5,e,m8,ta,ma\n"
"1:\t" "vle.v %3,(%1)\;"
   "vse.v %3,(%0)", operands);
  return "";
})


I don't think they are necessary.

What specifically do you think is not necessary?





Just post the update for archival purposes and consider
it pre-approved for the trunk.


I am so sorry that I disagree approve this patch too fast.

Umm, this patch has been queued up for at least a couple weeks now.



It should be well tested.
If you refer to Joern's message he indicated how it was tested.  Joern 
is a long time GCC developer and is well aware of how to test code.



It was tested on this set of multilibs without regressions:


   riscv-sim

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32

riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f

riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d

riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d








We should at least these 2 following situations:

1. an unknown number bytes to be memcpy, this codegen should be as follows:

vsetvl a5,a2,e8,m8,ta,ma

vle

vse

bump counter


cpymem for RISCV with v extension

2023-08-04 Thread 钟居哲
Could you add testcases for this patch?

+;; The (use (and (match_dup 1) (const_int 127))) is here to prevent the
+;; optimizers from changing cpymem_loop_* into this.
+(define_insn "@cpymem_straight"
+  [(set (mem:BLK (match_operand:P 0 "register_operand" "r,r"))
+   (mem:BLK (match_operand:P 1 "register_operand" "r,r")))
+   (use (and (match_dup 1) (const_int 127)))
+   (use (match_operand:P 2 "reg_or_int_operand" "r,K"))
+   (clobber (match_scratch:V_WHOLE 3 "=,"))
+   (clobber (reg:SI VL_REGNUM))
+   (clobber (reg:SI VTYPE_REGNUM))]
+  "TARGET_VECTOR"
+  "@vsetvli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)
+   vsetivli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)"
+)
+
+(define_insn "@cpymem_loop"
+  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
+   (mem:BLK (match_operand:P 1 "register_operand" "+r")))
+   (use (match_operand:P 2 "register_operand" "+r"))
+   (clobber (match_scratch:V_WHOLE 3 "="))
+   (clobber (match_scratch:P 4 "="))
+   (clobber (match_dup 0))
+   (clobber (match_dup 1))
+   (clobber (match_dup 2))
+   (clobber (reg:SI VL_REGNUM))
+   (clobber (reg:SI VTYPE_REGNUM))]
+  "TARGET_VECTOR"
+{ output_asm_insn ("\n0:\t" "vsetvli %4,%2,e,m8,ta,ma\;"
+  "vle.v %3,(%1)\;"
+  "sub %2,%2,%4", operands);
+  if ( != 8)
+{
+  rtx xop[2];
+  xop[0] = operands[4];
+  xop[1] = GEN_INT (exact_log2 (/8));
+  output_asm_insn ("slli %0,%0,%1", xop);
+}
+  output_asm_insn ("add %1,%1,%4\;"
+  "vse.v %3,(%0)\;"
+  "add %0,%0,%4\;"
+  "bnez %2,0b", operands);
+  return "";
+})
+
+;; This pattern (at bltu) assumes pointers can be treated as unsigned,
+;; i.e.  objects can't straddle 0x / 0x .
+(define_insn "@cpymem_loop_fast"
+  [(set (mem:BLK (match_operand:P 0 "register_operand" "+r"))
+   (mem:BLK (match_operand:P 1 "register_operand" "+r")))
+   (use (match_operand:P 2 "register_operand" "+r"))
+   (clobber (match_scratch:V_WHOLE 3 "="))
+   (clobber (match_scratch:P 4 "="))
+   (clobber (match_scratch:P 5 "="))
+   (clobber (match_scratch:P 6 "="))
+   (clobber (match_dup 0))
+   (clobber (match_dup 1))
+   (clobber (match_dup 2))
+   (clobber (reg:SI VL_REGNUM))
+   (clobber (reg:SI VTYPE_REGNUM))]
+  "TARGET_VECTOR"
+{
+  output_asm_insn ("vsetvli %4,%2,e,m8,ta,ma\;"
+  "beq %4,%2,1f\;"
+  "add %5,%0,%2\;"
+  "sub %6,%5,%4", operands);
+  if ( != 8)
+{
+  rtx xop[2];
+  xop[0] = operands[4];
+  xop[1] = GEN_INT (exact_log2 (/8));
+  output_asm_insn ("slli %0,%0,%1", xop);
+}
+  output_asm_insn ("\n0:\t" "vle.v %3,(%1)\;"
+  "add %1,%1,%4\;"
+  "vse.v %3,(%0)\;"
+  "add %0,%0,%4\;"
>> "bltu %0,%6,0b\;"
>> "sub %5,%5,%0", operands);
>>   if ( != 8)
>> {
>>   rtx xop[2];
>>   xop[0] = operands[4];
>>   xop[1] = GEN_INT (exact_log2 (/8));
>>   output_asm_insn ("srli %0,%0,%1", xop);
>>  }
>>   output_asm_insn ("vsetvli %4,%5,e,m8,ta,ma\n"
>>  "1:\t" "vle.v %3,(%1)\;"
>> "vse.v %3,(%0)", operands);
>>   return "";
>>  })
I don't think they are necessary.

>>  considering that this code is usually memory-constrainted, limit this
>>  to -O3.  ??? It would make sense to differentiate here between in-order
>> and OOO microarchitectures.  */
>> else if (!size_p && optimize >= 3)
>>   emit_insn (gen_cpymem_loop_fast (Pmode, vmode, dst, src, end));
>>  else
>>   emit_insn (gen_cpymem_loop (Pmode, vmode, dst, src, end));
Why not just emit RVV pattern.
>> Just post the update for archival purposes and consider 
>> it pre-approved for the trunk.I am so sorry that I disagree approve this 
>> patch too fast.It should be well tested.
We should at least these 2 following situations:1. an unknown number bytes to 
be memcpy, this codegen should be as follows:   vsetvl a5,a2,e8,m8,ta,mavle 
   vsebump counterbranch2. a known number bytes to be memcpy, and the 
number bytes allow us to fine a VLS modes to hold it.For example, memcpy 16 
bytes QImode.Then, we can use V16QImode directly, the codegen should be:
vsetvli zero,16, vle vseSimple 3 instructions are enough. 
This patch should be well tested with these 2 situations before approved since 
LLVM does the same thing.We should be able to have the same behavior as LLVM.


juzhe.zh...@rivai.ai


[Bug c++/110905] GCC rejects constexpr code that may re-initialize union member

2023-08-04 Thread danakj at orodu dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905

--- Comment #4 from danakj at orodu dot net ---
Ok it only happens if the VecIntoIter class has a base class, even when it's
empty the bug reproduces. But when I remove the IteratorBase base class, then
GCC compiles it correctly.

It's getting hard to remove anything and keep it to repro at this point.
Replacing Option with std::optional also made it stop, for whatever reason.

https://godbolt.org/z/a9PcsKTMf

```
#include 
#include 
#include 
#include 

template 
class Vec;

constexpr int from_sum(auto&& it) noexcept {
auto p = int(0);
while (true) {
auto i = it.next();
if (!i.has_value()) break;
p += *i;
}
return p;
}

template 
struct Storage final {
constexpr ~Storage()
requires(std::is_trivially_destructible_v)
= default;
constexpr ~Storage()
requires(!std::is_trivially_destructible_v)
{}

constexpr Storage(const Storage&)
requires(std::is_trivially_copy_constructible_v)
= default;
constexpr Storage& operator=(const Storage&)
requires(std::is_trivially_copy_assignable_v)
= default;
constexpr Storage(Storage&&)
requires(std::is_trivially_move_constructible_v)
= default;
constexpr Storage& operator=(Storage&&)
requires(std::is_trivially_move_assignable_v)
= default;

constexpr Storage() {}
constexpr Storage(const std::remove_cvref_t& t)
: val_(t), state_(true) {}
constexpr Storage(std::remove_cvref_t& t) : val_(t), state_(true) {}
constexpr Storage(std::remove_cvref_t&& t)
: val_(std::move(t)), state_(true) {}

__attribute__((pure)) constexpr const T& val() const { return val_; }
__attribute__((pure)) constexpr T& val_mut() { return val_; }

__attribute__((pure)) constexpr inline bool state() const noexcept {
return state_;
}

constexpr inline void construct_from_none(const T& t) noexcept
requires(std::is_copy_constructible_v)
{
std::construct_at(_, t);
state_ = true;
}
constexpr inline void construct_from_none(T&& t) noexcept {
std::construct_at(_, std::move(t));
state_ = true;
}

constexpr inline void set_some(const T& t) noexcept
requires(std::is_copy_constructible_v)
{
if (state_ == false)
construct_from_none(t);
else
val_ = t;
state_ = true;
}
constexpr inline void set_some(T&& t) noexcept {
if (state_ == false)
construct_from_none(std::move(t));
else
val_ = std::move(t);
state_ = true;
}

[[nodiscard]] constexpr inline T replace_some(T&& t) noexcept {
return std::exchange(val_, std::move(t));
}

[[nodiscard]] constexpr inline T take_and_set_none() noexcept {
state_ = false;
auto taken = T(static_cast(val_));
val_.~T();
return taken;
}

constexpr inline void set_none() noexcept {
state_ = false;
val_.~T();
}

constexpr inline void destroy() noexcept { val_.~T(); }

   private:
union {
T val_;
};
bool state_ = false;
};

template 
class Option final {
static_assert(!std::is_reference_v);
static_assert(!std::is_const_v);

   public:
inline constexpr Option() noexcept = default;

static inline constexpr Option with(const T& t) noexcept
requires(std::is_copy_constructible_v)
{
return Option(t);
}

static inline constexpr Option with(T&& t) noexcept {
if constexpr (std::is_move_constructible_v) {
return Option(std::move(t));
} else {
return Option(t);
}
}

constexpr ~Option() noexcept
requires(std::is_trivially_destructible_v)
= default;

constexpr inline ~Option() noexcept
requires(!std::is_trivially_destructible_v)
{
if (t_.state()) t_.destroy();
}

constexpr Option(Option&& o)
requires(std::is_trivially_move_constructible_v)
= default;

constexpr Option(Option&& o) noexcept
requires(!std::is_trivially_move_constructible_v)
{
if (o.t_.state()) t_.construct_from_none(o.t_.take_and_set_none());
}

constexpr Option(Option&& o)
requires(!std::is_move_constructible_v)
= delete;

constexpr Option& operator=(Option&& o)
requires(std::is_trivially_move_assignable_v)
= default;

constexpr Option& operator=(Option&& o) noexcept
requires(!std::is_trivially_move_assignable_v)
{
if (o.t_.state())
t_.set_some(o.t_.take_and_set_none());
else if (t_.state())
t_.set_none();
return *this;
}

constexpr Option& operator=(Option&& o)
requires(!std::is_move_constructible_v)
= delete;

__attribute__((pure)) constexpr bool has_value() const noexcept {
return t_.state();
}


[Bug tree-optimization/57650] Suboptimal code after TRUTH_AND_EXPR is changed into BIT_AND_EXPR

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57650

--- Comment #3 from Andrew Pinski  ---
Note I have seen differences like this due to just different FREE_SSANAMES
happening too.

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-04 Thread David Malcolm via Gcc
On Fri, 2023-08-04 at 18:42 -0400, David Malcolm wrote:
> On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> > On Fri, Aug 4, 2023 at 11:39 AM David Malcolm 
> > wrote:
> > > 
> > > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > > Hi Dave,
> > > > 
> > > > Tests related to our plugin which depend on Python-specific
> > > > definitions have been run by including /* { dg-options "-
> > > > fanalyzer
> > > > -I/usr/include/python3.9" } */. This is undoubtedly not ideal;
> > > > is
> > > > it
> > > > best to approach this problem by adapting a subset of relevant
> > > > definitions like in gil.h?
> > > 
> > > That might be acceptable in the very short-term, but to create a
> > > plugin
> > > that's useful to end-user (authors of CPython extension modules)
> > > we
> > > want to be testing against real Python headers.
> > > 
> > > As I understand it, https://peps.python.org/pep-0394/ allows for
> > > distributors of Python to symlink "python3-config" in the PATH to
> > > a
> > > python3.X-config script (for some X).
> > > 
> > > So on such systems running:
> > >   python3-config --includes
> > > should emit the correct -I option.  On my box it emits:
> > > 
> > > -I/usr/include/python3.8 -I/usr/include/python3.8
> > > 
> > > 
> > > It's more complicated, but I believe:
> > >   python3-config --cflags
> > > should emit the build flags that C/C++ extensions ought to use
> > > when
> > > building.  On my box this emits:
> > > 
> > > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-
> > > result -
> > > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -
> > > fstack-
> > > protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> > > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > > protection -
> > > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > > DNDEBUG  -
> > > O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> > > -
> > > Wp,-
> > > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > > grecord-
> > > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> > > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -
> > > fwrapv
> > > 
> > > and it's likely going to vary from distribution to distribution. 
> > > Some
> > > of those options *are* going to affect the gimple that -fanalyzer
> > > "sees".
> > > 
> > > Does your installation of Python have such a script?
> > > 
> > > So in the short term you could hack in a minimal subset of the
> > > decls/defns from Python.h, but I'd prefer it if target-
> > > supports.exp
> > > gained a DejaGnu directive that invokes python3-config, captures
> > > the
> > > result (or fails with UNSUPPORTED for systems without python3
> > > development headers), and then adds the result to the build flags
> > > of
> > > the file being tested.  The .exp files are implemented in Tcl,
> > > alas;
> > > let me know if you want help with that.
> > > 
> > > Dave
> > Sounds good; thanks! Following existing examples in
> > target-supports.exp, the following works as expected in terms of
> > extracting the build flags we are interested in.
> > 
> > In target-supports.exp:
> > proc check_python_flags { } {
> >     set result [remote_exec host "python3-config --cflags"]
> >     set status [lindex $result 0]
> >     if { $status == 0 } {
> >     return [lindex $result 1]
> >     } else {
> >     return "UNSUPPORTED"
> >     }
> > }
> > 
> > However, I'm having some trouble figuring out the specifics as to
> > how
> > we may add the build flags to our test cases. My intuition looks
> > like
> > something like the following:
> > 
> > In plugin.exp:
> > foreach plugin_test $plugin_test_list {
> >     if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
> >     set python_flags [check_python_flags]
> >     if { $python_flags ne "UNSUPPORTED" } {
> >    // append $python_flags to build flags here
> >     }
> >     }
> > 
> > }
> > 
> > How might we do so?
> 
> Good question.
> 
> Looking at plugin.exp I see it uses plugin-test-execute, which is
> defined in gcc/testsuite/lib/plugin-support.exp.
> 
> Looking there, I see it attempts to build the plugin, and then if it
> succeeds, it calls 
>   dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
> where $plugin_tests is the list of source files to be compiled using
> the plugin.  So one way to do this would be to modify that code from
> plugin.exp to pass in a different value, rather than $default_flags. 
> Though it seems hackish to special-case this.

Sorry, I think I misspoke here; that line that uses $default_flags is
from plugin-support.exp, not from plugin.exp; $default_flags is a
global variable.

So I think my 2nd approach below may be the one to try:

> 
> As another way, that avoids adding special-casing to plugin.exp,
> there's an existing directive:
>    dg-additional-options
> implemented in 

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-04 Thread David Malcolm via Gcc
On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> On Fri, Aug 4, 2023 at 11:39 AM David Malcolm 
> wrote:
> > 
> > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > Hi Dave,
> > > 
> > > Tests related to our plugin which depend on Python-specific
> > > definitions have been run by including /* { dg-options "-
> > > fanalyzer
> > > -I/usr/include/python3.9" } */. This is undoubtedly not ideal; is
> > > it
> > > best to approach this problem by adapting a subset of relevant
> > > definitions like in gil.h?
> > 
> > That might be acceptable in the very short-term, but to create a
> > plugin
> > that's useful to end-user (authors of CPython extension modules) we
> > want to be testing against real Python headers.
> > 
> > As I understand it, https://peps.python.org/pep-0394/ allows for
> > distributors of Python to symlink "python3-config" in the PATH to a
> > python3.X-config script (for some X).
> > 
> > So on such systems running:
> >   python3-config --includes
> > should emit the correct -I option.  On my box it emits:
> > 
> > -I/usr/include/python3.8 -I/usr/include/python3.8
> > 
> > 
> > It's more complicated, but I believe:
> >   python3-config --cflags
> > should emit the build flags that C/C++ extensions ought to use when
> > building.  On my box this emits:
> > 
> > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-
> > result -
> > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-
> > protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > protection -
> > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > DNDEBUG  -
> > O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -
> > Wp,-
> > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > grecord-
> > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv
> > 
> > and it's likely going to vary from distribution to distribution. 
> > Some
> > of those options *are* going to affect the gimple that -fanalyzer
> > "sees".
> > 
> > Does your installation of Python have such a script?
> > 
> > So in the short term you could hack in a minimal subset of the
> > decls/defns from Python.h, but I'd prefer it if target-supports.exp
> > gained a DejaGnu directive that invokes python3-config, captures
> > the
> > result (or fails with UNSUPPORTED for systems without python3
> > development headers), and then adds the result to the build flags
> > of
> > the file being tested.  The .exp files are implemented in Tcl,
> > alas;
> > let me know if you want help with that.
> > 
> > Dave
> Sounds good; thanks! Following existing examples in
> target-supports.exp, the following works as expected in terms of
> extracting the build flags we are interested in.
> 
> In target-supports.exp:
> proc check_python_flags { } {
>     set result [remote_exec host "python3-config --cflags"]
>     set status [lindex $result 0]
>     if { $status == 0 } {
>     return [lindex $result 1]
>     } else {
>     return "UNSUPPORTED"
>     }
> }
> 
> However, I'm having some trouble figuring out the specifics as to how
> we may add the build flags to our test cases. My intuition looks like
> something like the following:
> 
> In plugin.exp:
> foreach plugin_test $plugin_test_list {
>     if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
>     set python_flags [check_python_flags]
>     if { $python_flags ne "UNSUPPORTED" } {
>    // append $python_flags to build flags here
>     }
>     }
> 
> }
> 
> How might we do so?

Good question.

Looking at plugin.exp I see it uses plugin-test-execute, which is
defined in gcc/testsuite/lib/plugin-support.exp.

Looking there, I see it attempts to build the plugin, and then if it
succeeds, it calls 
  dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
where $plugin_tests is the list of source files to be compiled using
the plugin.  So one way to do this would be to modify that code from
plugin.exp to pass in a different value, rather than $default_flags. 
Though it seems hackish to special-case this.

As another way, that avoids adding special-casing to plugin.exp,
there's an existing directive:
   dg-additional-options
implemented in gcc/testsuite/lib/gcc-defs.exp which appends options to
the default options.  Unfortunately, it works via:
upvar dg-extra-tool-flags extra-tool-flags
which is a nasty Tcl hack meaning access the local variable named "dg-
extra-tool-flags" in *the frame above*, referring to it as "extra-tool-
flags".  (this is why I don't like Tcl)

So I think what could be done is to invoke your "check_python_flags"
test as a directive from the test case, so that in target-supports.exp
you'd have something like:

  proc dg-require-python-h {} {

which could do the 

[Bug tree-optimization/57650] Suboptimal code after TRUTH_AND_EXPR is changed into BIT_AND_EXPR

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57650

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-08-04

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> For aarch64-linux-gnu the same code is produced for both of these functions.

That is because of r6-6378-g078fe40a489e7a4f6255 .

Otherwise confirmed.

gcc-12-20230804 is now available

2023-08-04 Thread GCC Administrator via Gcc
Snapshot gcc-12-20230804 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20230804/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 913dc14bc63dd5d508ee7a2def1a1b3f035bc778

You'll find:

 gcc-12-20230804.tar.xz   Complete GCC

  SHA256=0289ed9f7e0b10abaae3bb99e97b702c426f05a5bc7e48d3906f1a36cc3f86a2
  SHA1=f45fecd15e488bf0209e68600943e7d288578043

Diffs from 12-20230728 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug tree-optimization/53806] Missed optimization (a<=b)&&(a>=b) with trapping

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=91425
   Last reconfirmed||2023-08-04
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed. very much related to PR 91425.

[Bug tree-optimization/96167] fails to detect ROL pattern in simple case, but succeeds when operand goes through memcpy

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96167

Andrew Pinski  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #6 from Andrew Pinski  ---
*** Bug 93721 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/93721] swapping adjacent scalars could be more efficient

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93721

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #8 from Andrew Pinski  ---
Dup of bug 96167.

*** This bug has been marked as a duplicate of bug 96167 ***

[Bug target/106265] RISC-V SPEC2017 507.cactu code bloat due to address generation

2023-08-04 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265

Vineet Gupta  changed:

   What|Removed |Added

 CC||vineetg at gcc dot gnu.org

--- Comment #11 from Vineet Gupta  ---
Revisited this with gcc-13.

The reduced test case no longer shows the extraneous LI 4096 (although the full
test still does). 

The key here is -funroll-loops which is needed for original issue to show as
well.

The was with middle-end update:

commit 19295e8607da2f743368fe6f5708146616aafa91
Author: Richard Biener 
Date:   Mon Oct 24 09:51:32 2022 +0200

tree-optimization/100756 - niter analysis and folding

niter analysis, specifically the part trying to simplify the computed
maybe_zero condition against the loop header copying condition, is
confused by us now simplifying

  _15 = n_8(D) * 4;
  if (_15 > 0)

to

  _15 = n_8(D) * 4;
  if (n_8(D) > 0)

which is perfectly sound at the point we do this transform.  One
solution might be to involve ranger in this simplification, another
is to be more aggressive when expanding expressions - the condition
we try to simplify is _15 > 0, so all we need is expanding that
to n_8(D) * 4 > 0.

[Bug c++/110905] GCC rejects constexpr code that may re-initialize union member

2023-08-04 Thread danakj at orodu dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905

--- Comment #3 from danakj at orodu dot net ---
Repro down from 37k to under 1000 lines now: https://godbolt.org/z/enMxaqjb6

[Bug tree-optimization/97225] Failure to optimize out conditional addition of zero

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97225

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org

--- Comment #2 from Andrew Pinski  ---
I noticed this also while creating a testcase for
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626117.html .

[Bug tree-optimization/96167] fails to detect ROL pattern in simple case, but succeeds when operand goes through memcpy

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96167

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||12.1.0
  Known to fail||10.5.0

--- Comment #5 from Andrew Pinski  ---
For x86_64, this was fixed in GCC 11+ at -O3 and for GCC 12+ for -O2 by the
vectorizer.

But I think we could do better at the code generation at the gimple level.
We have:
  vect__4.7_8 = MEM  [(int &)x_1(D)];
  vect_tmp_3.8_9 = VEC_PERM_EXPR ;
  MEM  [(int &)x_1(D)] = vect_tmp_3.8_9;

Which could just be:
tmp = MEM[(int&)x_1(D)];
tmp = tmp ror 32;
MEM[(int&)x_1(D)] = tmp;

Re: [PATCH 1/5] Middle-end _BitInt support [PR102989]

2023-08-04 Thread Joseph Myers
On Fri, 4 Aug 2023, Richard Biener via Gcc-patches wrote:

> > Sorry, I hoped it wouldn't take me almost 3 months and would be much shorter
> > as well, but clearly I'm not good at estimating stuff...
> 
> Well, it’s definitely feature creep with now the _Decimal and bitfield stuff …

I think feature creep would more be adding new features *outside the scope 
of the standard* (_BitInt bit-fields and conversions to/from DFP are 
within the standard, as are _BitInt atomic operations).  For example, 
features to help support type-generic operations on _BitInt, or 
type-generic versions of existing built-in functions (e.g. popcount) 
suitable for use on _BitInt - it's likely such features will be of use 
eventually, but they aren't needed for C23 (where the corresponding 
type-generic operations only support _BitInt types when they have the same 
width as some other type), so we can certainly get the standard features 
in first and think about additional features beyond that later (just as 
support for wider _BitInt can come later, not being required by the 
standard).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] match.pd: Implement missed optimization ((x ^ y) & z) | x -> (z & y) | x [PR109938]

2023-08-04 Thread Drew Ross via Gcc-patches
Adds a simplification for ((x ^ y) & z) | x to be folded into
(z & y) | x. Merges this simplification with ((x | y) & z) | x -> (z & y) | x
to prevent duplicate pattern. Tested successfully on x86_64 and x86 targets.

PR tree-opt/109938

gcc/ChangeLog:

* match.pd ((x ^ y) & z) | x -> (z & y) | x: New simplification.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr109938.c: New test.
* gcc.dg/tree-ssa/pr109938.c: New test.
---
 gcc/match.pd  |  10 +-
 .../gcc.c-torture/execute/pr109938.c  |  33 +
 gcc/testsuite/gcc.dg/tree-ssa/pr109938.c  | 125 ++
 3 files changed, 164 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109938.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109938.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ee6cef6b09d..884dc622b25 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1946,10 +1946,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bitop:c (rbitop:c (bit_not @0) @1) @0)
   (bitop @0 @1)))
 
-/* ((x | y) & z) | x -> (z & y) | x */
-(simplify
-  (bit_ior:c (bit_and:cs (bit_ior:cs @0 @1) @2) @0)
-  (bit_ior (bit_and @2 @1) @0))
+/* ((x |^ y) & z) | x -> (z & y) | x  */
+(for op (bit_ior bit_xor)
+ (simplify
+  (bit_ior:c (nop_convert1? (bit_and:c (nop_convert2? (op:c @0 @1)) @2)) @3)
+  (if (bitwise_equal_p (@0, @3))
+   (convert (bit_ior (bit_and @1 (convert @2)) (convert @0))
 
 /* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
 (simplify
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr109938.c 
b/gcc/testsuite/gcc.c-torture/execute/pr109938.c
new file mode 100644
index 000..a65d13b305d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr109938.c
@@ -0,0 +1,33 @@
+/* PR tree-opt/109938 */
+
+#include "../../gcc.dg/tree-ssa/pr109938.c"
+
+int 
+main ()
+{
+  if (t1 (29789, 29477, 23942) != 30045) __builtin_abort ();
+  if (t2 (-20196, 18743, -32901) != -1729) __builtin_abort ();
+  if (t3 (2136614690L, 1136698390L, 2123767997L) != 2145003318UL) 
__builtin_abort ();
+  if (t4 (-4878, 9977, 23313) != 61171) __builtin_abort ();
+  if (t5 (127, 99, 43) != 127) __builtin_abort ();
+  if (t6 (9176690219839792930LL, 3176690219839721234LL, 5671738468274920831LL)
+  != 9177833729112616754LL) __builtin_abort ();
+  if (t7 (29789, 29477, 23942) != 30045) __builtin_abort ();
+  if (t8 (23489, 99477, 87942) != 90053) __builtin_abort ();
+  if (t9 (10489, 66477, -73313) != 10749) __builtin_abort ();
+  if (t10 (2136614690L, -1136614690L, 4136614690UL) != 4284131106UL)
+__builtin_abort ();
+  if (t11 (29789, 29477, 12345) != 29821) __builtin_abort ();
+  if (t12 (-120, 98, -73) != 170) __builtin_abort ();
+  if (t13 (9176690219839792930ULL, -3176690219839721234LL, 
5671738468274920831ULL)
+  != 9221726284835125102ULL) __builtin_abort ();
+  v4si a1 = {29789, -20196, 23489, 10489};
+  v4si a2 = {29477, 18743, 99477, 66477}; 
+  v4si a3 = {23942, -32901, 87942, -73313};
+  v4si r1 = {30045, 63807, 90053, 10749}; 
+  v4si b1 = t14 (a1, a2, a3);
+  v4si b2 = t15 (a1, a2, a3);
+  if (__builtin_memcmp (,  ,  sizeof (b1) != 0)) __builtin_abort();  
+  if (__builtin_memcmp (,  ,  sizeof (b2) != 0)) __builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109938.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109938.c
new file mode 100644
index 000..0cae55886c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109938.c
@@ -0,0 +1,125 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof(int;
+
+/* Generic */
+__attribute__((noipa)) int
+t1 (int a, int b, int c)
+{
+  return ((a ^ c) & b) | a;
+}
+
+__attribute__((noipa)) unsigned int
+t2 (int a, unsigned int b, int c)
+{
+  return ((a ^ c) & b) | a;
+}
+
+__attribute__((noipa)) unsigned long
+t3 (unsigned long a, long b, unsigned long c)
+{
+  return ((a ^ c) & b) | a;
+}
+
+__attribute__((noipa)) unsigned short
+t4 (short a, unsigned short b, unsigned short c)
+{
+  return (unsigned short) ((a ^ c) & b) | a;
+}
+
+__attribute__((noipa)) unsigned char
+t5 (unsigned char a, signed char b, signed char c)
+{
+  return ((a ^ c) & b) | a;
+}
+
+__attribute__((noipa)) long long
+t6 (long long a, long long b, long long c)
+{
+  return ((a ^ c) & (unsigned long long) b) | a;
+}
+
+/* Gimple */
+__attribute__((noipa)) int
+t7 (int a, int b, int c)
+{
+  int t1 = a ^ c;
+  int t2 = t1 & b;
+  int t3 = t2 | a;
+  return t3;
+}
+
+__attribute__((noipa)) int
+t8 (int a, unsigned int b, unsigned int c)
+{
+  unsigned int t1 = a ^ c;
+  int t2 = t1 & b;
+  int t3 = t2 | a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned int
+t9 (unsigned int a, unsigned int b, int c)
+{
+  unsigned int t1 = a ^ c;
+  unsigned int t2 = t1 & b;
+  unsigned int t3 = t2 | a;
+  return t3;
+}
+
+__attribute__((noipa)) unsigned long
+t10 (unsigned long a, long b, unsigned long c)
+{
+  

[Bug c++/59883] Missed C++ front-end devirtualizations

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59883

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=110057

--- Comment #2 from Andrew Pinski  ---
The first example with arrays is PR 110057 also.

Re: [PATCH v3] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-04 Thread Jeff Law via Gcc-patches



On 8/1/23 19:38, Xiao Zeng wrote:

This patch recognizes Zicond patterns when the select pattern
with condition eq or neq to 0 (using eq as an example), namely:

1 rd = (rs2 == 0) ? non-imm : 0
2 rd = (rs2 == 0) ? non-imm : non-imm
3 rd = (rs2 == 0) ? reg : non-imm
4 rd = (rs2 == 0) ? reg : reg

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize
 Zicond patterns
 * config/riscv/riscv.md: Recognize Zicond patterns through movcc
So I've made minor adjustments to the remaining three cases.  First we 
need to check the code before optimizing the cases were one of the arms 
of the conditional move matches op0.


I slightly adjusted the case for out of range constants.  Its better to 
check SMALL_OPERAND rather than testing for specific constants.  And 
when that triggers, we can just force the value into a register and 
continue as-is rather than recursing.


The patch I'm committing fixes one comment typo (whitespace) and a bit 
of accidentally duplicated code I added in a prior commit.


Next up Raphael's patches to handle nontrival conditionals by emiting an 
scc insn :-)


Jeff

ps.  I'm deferrring the testsuite bits until we sort out the costing 
problems.  THey're definitely not forgotten and I still use them in my 
local tree.
commit 4e87c953d16377457b31b65b6c3268d932e462ab
Author: Xiao Zeng 
Date:   Fri Aug 4 17:23:56 2023 -0400

[PATCH v3] [RISC-V] Generate Zicond instruction for select pattern with 
condition eq or neq to 0

This patch recognizes Zicond patterns when the select pattern
with condition eq or neq to 0 (using eq as an example), namely:

1 rd = (rs2 == 0) ? non-imm : 0
2 rd = (rs2 == 0) ? non-imm : non-imm
3 rd = (rs2 == 0) ? reg : non-imm
4 rd = (rs2 == 0) ? reg : reg

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize
more Zicond patterns.  Fix whitespace typo.
(riscv_rtx_costs): Remove accidental code duplication.

Co-authored-by: Jeff Law 

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8b725610815..7728cd34569 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2522,20 +2522,6 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
-  else if (TARGET_ZICOND
-  && outer_code == SET
-  && ((GET_CODE (XEXP (x, 1)) == REG
-   && XEXP (x, 2) == CONST0_RTX (GET_MODE (XEXP (x, 1
-  || (GET_CODE (XEXP (x, 2)) == REG
-  && XEXP (x, 1) == CONST0_RTX (GET_MODE (XEXP (x, 2
-  || (GET_CODE (XEXP (x, 1)) == REG
-  && rtx_equal_p (XEXP (x, 1), XEXP (XEXP (x, 0), 0)))
-  || (GET_CODE (XEXP (x, 1)) == REG
-  && rtx_equal_p (XEXP (x, 2), XEXP (XEXP (x, 0), 0)
-   {
- *total = COSTS_N_INSNS (1);
- return true;
-   }
   else if (LABEL_REF_P (XEXP (x, 1)) && XEXP (x, 2) == pc_rtx)
{
  if (equality_operator (XEXP (x, 0), mode)
@@ -3583,7 +3569,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   /* The expander is a bit loose in its specification of the true
 arm of the conditional move.  That allows us to support more
 cases for extensions which are more general than SFB.  But
-   does mean we need to force CONS into a register at this point.  */
+does mean we need to force CONS into a register at this point.  */
   cons = force_reg (GET_MODE (dest), cons);
   emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (GET_MODE (dest),
  cond, cons, alt)));
@@ -3628,6 +3614,40 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
  riscv_emit_binary (PLUS, dest, dest, cons);
  return true;
}
+  /* imm, reg  */
+  else if (CONST_INT_P (cons) && cons != CONST0_RTX (mode) && REG_P (alt))
+   {
+ /* Optimize for register value of 0.  */
+ if (code == NE && rtx_equal_p (op0, alt) && op1 == CONST0_RTX (mode))
+   {
+ rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+ cons = force_reg (mode, cons);
+ emit_insn (gen_rtx_SET (dest,
+ gen_rtx_IF_THEN_ELSE (mode, cond,
+   cons, alt)));
+ return true;
+   }
+
+ riscv_emit_int_compare (, , , true);
+ rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+
+ rtx temp1 = gen_reg_rtx (mode);
+ rtx temp2 = gen_int_mode (-1 * INTVAL (cons), mode);
+
+ /* TEMP2 might not fit into a signed 12 bit immediate suitable
+for an addi 

[Bug analyzer/110907] New: ICE when using -fanalyzer-verbose-state-changes

2023-08-04 Thread vultkayn at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110907

Bug ID: 110907
   Summary: ICE when using -fanalyzer-verbose-state-changes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: vultkayn at gcc dot gnu.org
  Target Milestone: ---

Running the analyzer on testcase gcc.dg/analyzer/pr99193-1.c with line 54
commented out and flag -fanalyzer-verbose-state-changes results in an ICE on
gcc versions later than 13.1 (included) and trunk, tested on target
x86_64-linux-gnu.

x86_64 12.3 on godbolt doesn't reproduce the ICE.

Reproducer:

/* { dg-additional-options "-Wno-analyzer-too-complex" } */

/* Verify absence of false positive from -Wanalyzer-mismatching-deallocation
   on realloc(3).
   Based on
https://github.com/libguestfs/libguestfs/blob/f19fd566f6387ce7e4d82409528c9dde374d25e0/daemon/command.c#L115
   which is GPLv2 or later.  */

typedef __SIZE_TYPE__ size_t;
typedef __builtin_va_list va_list;

#define NULL ((void *)0)

extern void *malloc (size_t __size)
  __attribute__ ((__nothrow__ , __leaf__))
  __attribute__ ((__malloc__))
  __attribute__ ((__alloc_size__ (1)));
extern void perror (const char *__s);
extern void *realloc (void *__ptr, size_t __size)
  __attribute__ ((__nothrow__ , __leaf__))
  __attribute__ ((__warn_unused_result__))
  __attribute__ ((__alloc_size__ (2)));

extern void guestfs_int_cleanup_free (void *ptr);
extern int commandrvf (char **stdoutput, char **stderror, unsigned flags,
   char const* const *argv);
#define CLEANUP_FREE __attribute__((cleanup(guestfs_int_cleanup_free))) 

int
commandrf (char **stdoutput, char **stderror, unsigned flags,
   const char *name, ...)
{
  va_list args;
  CLEANUP_FREE const char **argv = NULL;
  char *s;
  int i, r;

  /* Collect the command line arguments into an array. */
  i = 2;
  argv = malloc (sizeof (char *) * i);

 if (argv == NULL) {
perror ("malloc");
return -1;
  }
  argv[0] = (char *) name;
  argv[1] = NULL;

  __builtin_va_start (args, name);

  while ((s = __builtin_va_arg (args, char *)) != NULL) {
const char **p = realloc (argv, sizeof (char *) * (++i)); /* { dg-bogus
"'free'" } */
if (p == NULL) {
  perror ("realloc");
  // __builtin_va_end (args);
  return -1;
}
argv = p;
argv[i-2] = s;
argv[i-1] = NULL;
  }

  __builtin_va_end (args);

  r = commandrvf (stdoutput, stderror, flags, argv);

  return r;
}

-

gcc -fanalyzer -fanalyzer-verbose-state-changes ./pr99193-1.leak.c
during IPA pass: analyzer
:33:29: internal compiler error: Segmentation fault
   33 |   CLEANUP_FREE const char **argv = NULL;
  | ^~~~
0x216ac2e internal_error(char const*, ...)
???:0
0x218afec pp_format(pretty_printer*, text_info*)
???:0
0x1f7bfb6 make_label_text(bool, char const*, ...)
???:0
0x1f88b0d ana::state_change_event::get_desc(bool) const
???:0
0x1f86c0c ana::checker_event::prepare_for_emission(ana::checker_path*,
ana::pending_diagnostic*, diagnostic_event_id_t)
???:0
0x1fa79f5 ana::diagnostic_manager::emit_saved_diagnostic(ana::exploded_graph
const&, ana::saved_diagnostic const&)
???:0
0x1fa82a0 ana::diagnostic_manager::emit_saved_diagnostics(ana::exploded_graph
const&)
???:0
0x149fc31 ana::impl_run_checkers(ana::logger*)
???:0
0x14a0bdf ana::run_checkers()
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1

[Bug rtl-optimization/110864] [14 Regression] ICE in combine.cc causes stage2 build failure on RISCV

2023-08-04 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110864

Patrick O'Neill  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Patrick O'Neill  ---
Resolved by PR110867 fix: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110867

[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906

--- Comment #4 from Andrew Pinski  ---
Well "math-errno" has no effect if -fno-math-errno is supplied on the command
line either, well can produce incorrect code too.

Take:
```
__attribute__((optimize("math-errno")))
void g(double x) {
  __builtin_sqrt(x);
}
```
With `-O2`, a call happens but with `-O2 -fno-math-errno` does not happen. That
was true even in GCC 5.

[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906

--- Comment #3 from Andrew Pinski  ---
But even:
```
__attribute__((optimize("no-math-errno")))
double g(double x) {
  return __builtin_sqrt(x);
}
```
Does not change here ...

[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906

--- Comment #2 from Andrew Pinski  ---
Created attachment 55692
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55692=edit
Full testcase

David Faust appointed BPF backend reviewer

2023-08-04 Thread David Edelsohn via Gcc
I am pleased to announce that the GCC Steering Committee has appointed
David Faust as BPF backend reviewer.

Please join me in congratulating David on his new role.

David, please update your listings in the MAINTAINERS file.

Happy hacking!
David


[Bug middle-end/110906] __attribute__((optimize("no-math-errno"))) has no effect.

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906

--- Comment #1 from Andrew Pinski  ---
well std::sqrt is not annotated with no-math-errno after all ...

[Bug middle-end/110906] New: __attribute__((optimize("no-math-errno"))) has no effect.

2023-08-04 Thread cassio.neri at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906

Bug ID: 110906
   Summary: __attribute__((optimize("no-math-errno"))) has no
effect.
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cassio.neri at gmail dot com
  Target Milestone: ---

Consider this C++ code compiled with -O3:

double g(double x) {
  return std::sqrt(x);
}

Usually this does call the library function std::sqrt because x might be
negative and errno needs to be set accordingly. Moreover, with -fno-math-errno
a single sqrtsd instruction is emitted. However, annotating g with

__attribute__((optimize("no-math-errno")))

has no effect. This attribute (and #pragma GCC optimize("no-math-errno") ) used
to work up to gcc 5.5.

https://godbolt.org/z/T1nb11bv5

Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-08-04 Thread Jeff Law via Gcc-patches




On 8/1/23 00:47, Robin Dapp via Gcc-patches wrote:

  I'm not against continuing with the more well-known approach for now
  but we should keep in mind that might still be potential for improvement.


No. I don't think it's faster.


I did a quick check on my x86 laptop and it's roughly 25% faster there.
That's consistent with the literature.  RISC-V qemu only shows 5-10%
improvement, though.


I have no ideal. I saw ARM SVE generate:
POP_COUNT
POP_COUNT
VEC_PACK_TRUNC.


I'd strongly suspect this happens because it's converting to int.
If you change dst to uint64_t there won't be any vec_pack_trunc.


I am gonna drop this patch since it's meaningless.


But why?  It can still help even if we can improve on the sequence.
IMHO you can go ahead with it and just change int -> uint64_t in the
tests.
It'd also be interesting to see if those popcounts in deepsjeng are 
vectorizable.  We got a major boost in deepsjeng at a prior employer, 
but I can't remember if it was from getting the pcounts vectorized or 
just not doing stupid stuff with them on the scalar side.



jeff


Re: cpymem for RISCV with v extension

2023-08-04 Thread Jeff Law via Gcc-patches




On 7/17/23 22:47, Joern Rennecke wrote:

Subject:
cpymem for RISCV with v extension
From:
Joern Rennecke 
Date:
7/17/23, 22:47

To:
GCC Patches 


As discussed on last week's patch call, this patch uses either a
straight copy or an opaque pattern that emits the loop as assembly to
optimize cpymem for the 'v' extension.
I used Ju-Zhe Zhong's patch - starting in git with:

Author: zhongjuzhe<66454988+zhongju...@users.noreply.github.com>
Date:   Mon Mar 21 14:20:42 2022 +0800

   PR for RVV support using splitted small chunks (#334)

as a starting point, even though not all that much of the original code remains.

Regression tested on x86_64-pc-linux-gnu X
 riscv-sim
 
riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
 
riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
 
riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f
 
riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32
 
riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
 
riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d
 
riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d


cpymem-diff-20230718.txt

2023-07-12  Ju-Zhe Zhong
 Joern Rennecke

* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
Declare.
* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
New function.
* config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move.
* config/riscv/vector.md (@cpymem_straight):
New define_insn patterns.
(@cpymem_loop): Likewise.
(@cpymem_loop_fast): Likewise.




diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b4884a30872..e61110fa3ad 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
  #include "tm-constrs.h"
  #include "rtx-vector-builder.h"
  #include "targhooks.h"
+#include "predict.h"
Not sure this is needed, but I didn't scan for it explicitly.  If it's 
not needed, then remove it.





+  if (CONST_INT_P (length_in))
+{
+  HOST_WIDE_INT length = INTVAL (length_in);
+
+/* By using LMUL=8, we can copy as many bytes in one go as there
+   are bits in a vector register.  If the entire block thus fits,
+   we don't need a loop.  */
+if (length <= TARGET_MIN_VLEN)
+  {
+   need_loop = false;
+
+   /* If a single scalar load / store pair can do the job, leave it
+  to the scalar code to do that.  */
+
+   if (pow2p_hwi (length) && length <= potential_ew)
+ return false;
+  }
We could probably argue over the threshold for doing the copy on the 
scalar side, but I don't think it's necessary.  Once we start seeing V 
hardware we can revisit.




+
+  /* Find the vector mode to use.  Using the largest possible element
+size is likely to give smaller constants, and thus potentially
+reducing code size.  However, if we need a loop, we need to update
+the pointers, and that is more complicated with a larger element
+size, unless we use an immediate, which prevents us from dynamically
+using the largets transfer size that the hart supports.  And then,
+unless we know the*exact*  vector size of the hart, we'd need
+multiple vsetvli / branch statements, so it's not even a size win.
+If, in the future, we find an RISCV-V implementation that is slower
+for small element widths, we might allow larger element widths for
+loops too.  */

s/largets/largest/

And a space is missing in "the*extact*"

Note that I think the proposed glibc copier does allow larger elements 
widths for this case.



+
+ /* Unless we get an implementation that's slow for small element
+size / non-word-aligned accesses, we assume that the hardware
+handles this well, and we don't want to complicate the code
+with shifting word contents around or handling extra bytes at
+the start and/or end.  So we want the total transfer size and
+alignemnt to fit with the element size.  */

s/alignemnt/alignment/

Yes, let's not try to support every uarch we can envision and instead do 
a good job on the uarches we know about.If a uarch with slow element 
or non-word aligned accesses comes along, they can propose changes at 
that time.





+
+ // The VNx*?I modes have a factor of riscv_vector_chunks for nunits.
Comment might need updating after the recent work to adjust 

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-08-04 Thread Eric Feng via Gcc
On Fri, Aug 4, 2023 at 11:39 AM David Malcolm  wrote:
>
> On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > Hi Dave,
> >
> > Tests related to our plugin which depend on Python-specific
> > definitions have been run by including /* { dg-options "-fanalyzer
> > -I/usr/include/python3.9" } */. This is undoubtedly not ideal; is it
> > best to approach this problem by adapting a subset of relevant
> > definitions like in gil.h?
>
> That might be acceptable in the very short-term, but to create a plugin
> that's useful to end-user (authors of CPython extension modules) we
> want to be testing against real Python headers.
>
> As I understand it, https://peps.python.org/pep-0394/ allows for
> distributors of Python to symlink "python3-config" in the PATH to a
> python3.X-config script (for some X).
>
> So on such systems running:
>   python3-config --includes
> should emit the correct -I option.  On my box it emits:
>
> -I/usr/include/python3.8 -I/usr/include/python3.8
>
>
> It's more complicated, but I believe:
>   python3-config --cflags
> should emit the build flags that C/C++ extensions ought to use when
> building.  On my box this emits:
>
> -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-result -
> Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-
> protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -
> D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG  -
> O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-
> D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-
> gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv
>
> and it's likely going to vary from distribution to distribution.  Some
> of those options *are* going to affect the gimple that -fanalyzer
> "sees".
>
> Does your installation of Python have such a script?
>
> So in the short term you could hack in a minimal subset of the
> decls/defns from Python.h, but I'd prefer it if target-supports.exp
> gained a DejaGnu directive that invokes python3-config, captures the
> result (or fails with UNSUPPORTED for systems without python3
> development headers), and then adds the result to the build flags of
> the file being tested.  The .exp files are implemented in Tcl, alas;
> let me know if you want help with that.
>
> Dave
Sounds good; thanks! Following existing examples in
target-supports.exp, the following works as expected in terms of
extracting the build flags we are interested in.

In target-supports.exp:
proc check_python_flags { } {
set result [remote_exec host "python3-config --cflags"]
set status [lindex $result 0]
if { $status == 0 } {
return [lindex $result 1]
} else {
return "UNSUPPORTED"
}
}

However, I'm having some trouble figuring out the specifics as to how
we may add the build flags to our test cases. My intuition looks like
something like the following:

In plugin.exp:
foreach plugin_test $plugin_test_list {
if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
set python_flags [check_python_flags]
if { $python_flags ne "UNSUPPORTED" } {
   // append $python_flags to build flags here
}
}

}

How might we do so?
>
>
> >
> > Best,
> > Eric
> >
> > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm 
> > wrote:
> > >
> > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > >
> > > > > My guess is that you were trying to do it from the
> > > > > PLUGIN_ANALYZER_INIT
> > > > > hook rather than from the plugin_init function, but it's hard
> > > > > to be
> > > > > sure without seeing the code.
> > > > >
> > > >
> > > > Thanks Dave, you are entirely right — I made the mistake of
> > > > trying to
> > > > do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> > > > function. After following your suggestion, the callbacks are
> > > > getting
> > > > registered as expected.
> > >
> > > Ah, good.
> > >
> > > > I submitted a patch to review for this feature
> > > > on gcc-patches; please let me know if it looks OK.
> > >
> > > Thanks Eric; I've posted a reply to your email there, so let's
> > > discuss
> > > the details there.
> > >
> > > Dave
> > >
> >
>


[Bug c++/110905] GCC rejects constexpr code that may re-initialize union member

2023-08-04 Thread danakj at orodu dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905

--- Comment #2 from danakj at orodu dot net ---
Ah ok. Here's a big reproduction: https://godbolt.org/z/Kj7Tcd6P4

/opt/compiler-explorer/gcc-trunk-20230804/include/c++/14.0.0/bits/stl_construct.h:97:14:
  in 'constexpr' expansion of
'((sus::containers::VecIntoIter*))->sus::containers::VecIntoIter::VecIntoIter((*
& std::forward >((* & __args#0'
:32895:22: error: use of deleted function
'sus::option::__private::Storage,
false>()'
32895 | struct [[nodiscard]] VecIntoIter final
  |  ^~~
:3015:9: note:
'sus::option::__private::Storage,
false>()' is implicitly deleted because the
default definition would be ill-formed:
 3015 |   union {
  | ^
:3015:9: error: no matching function for call to
'sus::containers::VecIntoIter::VecIntoIter()'
:32953:13: note: candidate: 'constexpr
sus::containers::VecIntoIter::VecIntoIter(sus::containers::Vec&&,
sus::num::usize, sus::num::usize) [with ItemT = sus::num::i32]'
32953 |   constexpr VecIntoIter(Vec&& vec, usize front, usize back)
noexcept
  | ^~~
:32953:13: note:   candidate expects 3 arguments, 0 provided
:32951:13: note: candidate: 'constexpr
sus::containers::VecIntoIter::VecIntoIter(sus::containers::Vec&&)
[with ItemT = sus::num::i32]'
32951 |   constexpr VecIntoIter(Vec&& vec) noexcept :
vec_(::sus::move(vec)) {}
  | ^~~
:32951:13: note:   candidate expects 1 argument, 0 provided
:32895:22: note: candidate: 'constexpr
sus::containers::VecIntoIter::VecIntoIter(sus::containers::VecIntoIter&&)'
32895 | struct [[nodiscard]] VecIntoIter final
  |  ^~~
:32895:22: note:   candidate expects 1 argument, 0 provided
Compiler returned: 1


I will try to shrink it now.

[Bug tree-optimization/21998] (cond ? result1 : result2) is vectorized, where equivalent if-syntax isn't (store)

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21998

--- Comment #7 from Andrew Pinski  ---
We can vectorize test2 using mask stores 

[Bug analyzer/110426] Missing buffer overflow warning with function pointer that has the alloc_size attribute

2023-08-04 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110426

David Malcolm  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from David Malcolm  ---
Should be implemented for gcc 14 by the above patch.

[pushed] analyzer: handle function attribute "alloc_size" [PR110426]

2023-08-04 Thread David Malcolm via Gcc-patches
This patch makes -fanalyzer make use of the function attribute
"alloc_size", allowing -fanalyzer to emit -Wanalyzer-allocation-size,
-Wanalyzer-out-of-bounds, and -Wanalyzer-tainted-allocation-size on
execution paths involving allocations using such functions.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3001-g021077b94741c9.

gcc/analyzer/ChangeLog:
PR analyzer/110426
* bounds-checking.cc (region_model::check_region_bounds): Handle
symbolic base regions.
* call-details.cc: Include "stringpool.h" and "attribs.h".
(call_details::lookup_function_attribute): New function.
* call-details.h (call_details::lookup_function_attribute): New
function decl.
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): Add reference to
PR analyzer/110902.
* region-model-reachability.cc (reachable_regions::handle_sval):
Add symbolic regions for pointers that are conjured svalues for
the LHS of a stmt.
* region-model.cc (region_model::canonicalize): Purge dynamic
extents for regions that aren't referenced.
(get_result_size_in_bytes): New function.
(region_model::on_call_pre): Use get_result_size_in_bytes and
potentially set the dynamic extents of the region pointed to by
the return value.
(region_model::deref_rvalue): Add param "add_nonnull_constraint"
and use it to conditionalize adding the constraint.
(pending_diagnostic_subclass::dubious_allocation_size): Add "stmt"
param to both ctors and use it to initialize new "m_stmt" field.
(pending_diagnostic_subclass::operator==): Use m_stmt; don't use
m_lhs or m_rhs.
(pending_diagnostic_subclass::m_stmt): New field.
(region_model::check_region_size): Generalize to any kind of
pointer svalue by using deref_rvalue rather than checking for
region_svalue.  Pass stmt to dubious_allocation_size ctor.
* region-model.h (region_model::deref_rvalue): Add param
"add_nonnull_constraint".
* svalue.cc (conjured_svalue::lhs_value_p): New function.
* svalue.h (conjured_svalue::lhs_value_p): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/110426
* gcc.dg/analyzer/allocation-size-1.c: Update expected message to
reflect consolidation of size and assignment into a single event.
* gcc.dg/analyzer/allocation-size-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-3.c: Likewise.
* gcc.dg/analyzer/allocation-size-4.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-1.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise.
* gcc.dg/analyzer/attr-alloc_size-1.c: New test.
* gcc.dg/analyzer/attr-alloc_size-2.c: New test.
* gcc.dg/analyzer/attr-alloc_size-3.c: New test.
* gcc.dg/analyzer/explode-4.c: New test.
* gcc.dg/analyzer/taint-size-1.c: Add test coverage for
__attribute__ alloc_size.
---
 gcc/analyzer/bounds-checking.cc   |  12 +-
 gcc/analyzer/call-details.cc  |  21 +++
 gcc/analyzer/call-details.h   |   2 +
 gcc/analyzer/region-model-manager.cc  |   2 +
 gcc/analyzer/region-model-reachability.cc |  21 +++
 gcc/analyzer/region-model.cc  | 109 ++--
 gcc/analyzer/region-model.h   |   3 +-
 gcc/analyzer/svalue.cc|  11 ++
 gcc/analyzer/svalue.h |   1 +
 .../gcc.dg/analyzer/allocation-size-1.c   |   3 +-
 .../gcc.dg/analyzer/allocation-size-2.c   |   3 +-
 .../gcc.dg/analyzer/allocation-size-3.c   |   9 +-
 .../gcc.dg/analyzer/allocation-size-4.c   |   6 +-
 .../analyzer/allocation-size-multiline-1.c|  12 +-
 .../analyzer/allocation-size-multiline-2.c|  15 +-
 .../analyzer/allocation-size-multiline-3.c|  10 +-
 .../gcc.dg/analyzer/attr-alloc_size-1.c   |  81 +
 .../gcc.dg/analyzer/attr-alloc_size-2.c   |  19 +++
 .../gcc.dg/analyzer/attr-alloc_size-3.c   |  14 ++
 gcc/testsuite/gcc.dg/analyzer/explode-4.c | 157 ++
 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  10 ++
 21 files changed, 458 insertions(+), 63 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-alloc_size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-alloc_size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-alloc_size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/explode-4.c

diff --git a/gcc/analyzer/bounds-checking.cc b/gcc/analyzer/bounds-checking.cc
index 5e8de9a7aa5..f49cf7cf2af 100644
--- a/gcc/analyzer/bounds-checking.cc
+++ b/gcc/analyzer/bounds-checking.cc
@@ -981,12 +981,6 @@ region_model::check_region_bounds (const region *reg,
   region_offset reg_offset = 

[pushed] analyzer: fix some svalue::dump_to_pp implementations

2023-08-04 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3000-g187b213ddbe7ea.

gcc/analyzer/ChangeLog:
* svalue.cc (region_svalue::dump_to_pp): Support NULL type.
(constant_svalue::dump_to_pp): Likewise.
(initial_svalue::dump_to_pp): Likewise.
(conjured_svalue::dump_to_pp): Likewise.  Fix missing print of the
type.
---
 gcc/analyzer/svalue.cc | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 4395018dbc3..5d5c80f88c6 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -714,8 +714,11 @@ region_svalue::dump_to_pp (pretty_printer *pp, bool 
simple) const
   else
 {
   pp_string (pp, "region_svalue(");
-  print_quoted_type (pp, get_type ());
-  pp_string (pp, ", ");
+  if (get_type ())
+   {
+ print_quoted_type (pp, get_type ());
+ pp_string (pp, ", ");
+   }
   m_reg->dump_to_pp (pp, simple);
   pp_string (pp, ")");
 }
@@ -811,8 +814,11 @@ constant_svalue::dump_to_pp (pretty_printer *pp, bool 
simple) const
   else
 {
   pp_string (pp, "constant_svalue(");
-  print_quoted_type (pp, get_type ());
-  pp_string (pp, ", ");
+  if (get_type ())
+   {
+ print_quoted_type (pp, get_type ());
+ pp_string (pp, ", ");
+   }
   dump_tree (pp, m_cst_expr);
   pp_string (pp, ")");
 }
@@ -1029,8 +1035,11 @@ initial_svalue::dump_to_pp (pretty_printer *pp, bool 
simple) const
   else
 {
   pp_string (pp, "initial_svalue(");
-  print_quoted_type (pp, get_type ());
-  pp_string (pp, ", ");
+  if (get_type ())
+   {
+ print_quoted_type (pp, get_type ());
+ pp_string (pp, ", ");
+   }
   m_reg->dump_to_pp (pp, simple);
   pp_string (pp, ")");
 }
@@ -1910,7 +1919,11 @@ conjured_svalue::dump_to_pp (pretty_printer *pp, bool 
simple) const
   else
 {
   pp_string (pp, "conjured_svalue (");
-  pp_string (pp, ", ");
+  if (get_type ())
+   {
+ print_quoted_type (pp, get_type ());
+ pp_string (pp, ", ");
+   }
   pp_gimple_stmt_1 (pp, m_stmt, 0, (dump_flags_t)0);
   pp_string (pp, ", ");
   m_id_reg->dump_to_pp (pp, simple);
-- 
2.26.3



[Bug tree-optimization/18437] vectorizer failed for matrix multiplication

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437

--- Comment #10 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #9)
> For the original testcase in comment #0, with `-O3 -fno-vect-cost-model` GCC
> can vectorize it on aarch64 but not on x86_64.

I should say starting in GCC 6 .

[Bug analyzer/110902] Missing cast in region_model_manager::maybe_fold_binop on MULT_EXPR by 1

2023-08-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110902

--- Comment #1 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:021077b94741c9300dfff3a24e95b3ffa3f508a7

commit r14-3001-g021077b94741c9300dfff3a24e95b3ffa3f508a7
Author: David Malcolm 
Date:   Fri Aug 4 16:18:40 2023 -0400

analyzer: handle function attribute "alloc_size" [PR110426]

This patch makes -fanalyzer make use of the function attribute
"alloc_size", allowing -fanalyzer to emit -Wanalyzer-allocation-size,
-Wanalyzer-out-of-bounds, and -Wanalyzer-tainted-allocation-size on
execution paths involving allocations using such functions.

gcc/analyzer/ChangeLog:
PR analyzer/110426
* bounds-checking.cc (region_model::check_region_bounds): Handle
symbolic base regions.
* call-details.cc: Include "stringpool.h" and "attribs.h".
(call_details::lookup_function_attribute): New function.
* call-details.h (call_details::lookup_function_attribute): New
function decl.
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): Add reference to
PR analyzer/110902.
* region-model-reachability.cc (reachable_regions::handle_sval):
Add symbolic regions for pointers that are conjured svalues for
the LHS of a stmt.
* region-model.cc (region_model::canonicalize): Purge dynamic
extents for regions that aren't referenced.
(get_result_size_in_bytes): New function.
(region_model::on_call_pre): Use get_result_size_in_bytes and
potentially set the dynamic extents of the region pointed to by
the return value.
(region_model::deref_rvalue): Add param "add_nonnull_constraint"
and use it to conditionalize adding the constraint.
(pending_diagnostic_subclass::dubious_allocation_size): Add "stmt"
param to both ctors and use it to initialize new "m_stmt" field.
(pending_diagnostic_subclass::operator==): Use m_stmt; don't use
m_lhs or m_rhs.
(pending_diagnostic_subclass::m_stmt): New field.
(region_model::check_region_size): Generalize to any kind of
pointer svalue by using deref_rvalue rather than checking for
region_svalue.  Pass stmt to dubious_allocation_size ctor.
* region-model.h (region_model::deref_rvalue): Add param
"add_nonnull_constraint".
* svalue.cc (conjured_svalue::lhs_value_p): New function.
* svalue.h (conjured_svalue::lhs_value_p): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/110426
* gcc.dg/analyzer/allocation-size-1.c: Update expected message to
reflect consolidation of size and assignment into a single event.
* gcc.dg/analyzer/allocation-size-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-3.c: Likewise.
* gcc.dg/analyzer/allocation-size-4.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-1.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise.
* gcc.dg/analyzer/attr-alloc_size-1.c: New test.
* gcc.dg/analyzer/attr-alloc_size-2.c: New test.
* gcc.dg/analyzer/attr-alloc_size-3.c: New test.
* gcc.dg/analyzer/explode-4.c: New test.
* gcc.dg/analyzer/taint-size-1.c: Add test coverage for
__attribute__ alloc_size.

Signed-off-by: David Malcolm 

[Bug analyzer/110426] Missing buffer overflow warning with function pointer that has the alloc_size attribute

2023-08-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110426

--- Comment #2 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:021077b94741c9300dfff3a24e95b3ffa3f508a7

commit r14-3001-g021077b94741c9300dfff3a24e95b3ffa3f508a7
Author: David Malcolm 
Date:   Fri Aug 4 16:18:40 2023 -0400

analyzer: handle function attribute "alloc_size" [PR110426]

This patch makes -fanalyzer make use of the function attribute
"alloc_size", allowing -fanalyzer to emit -Wanalyzer-allocation-size,
-Wanalyzer-out-of-bounds, and -Wanalyzer-tainted-allocation-size on
execution paths involving allocations using such functions.

gcc/analyzer/ChangeLog:
PR analyzer/110426
* bounds-checking.cc (region_model::check_region_bounds): Handle
symbolic base regions.
* call-details.cc: Include "stringpool.h" and "attribs.h".
(call_details::lookup_function_attribute): New function.
* call-details.h (call_details::lookup_function_attribute): New
function decl.
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): Add reference to
PR analyzer/110902.
* region-model-reachability.cc (reachable_regions::handle_sval):
Add symbolic regions for pointers that are conjured svalues for
the LHS of a stmt.
* region-model.cc (region_model::canonicalize): Purge dynamic
extents for regions that aren't referenced.
(get_result_size_in_bytes): New function.
(region_model::on_call_pre): Use get_result_size_in_bytes and
potentially set the dynamic extents of the region pointed to by
the return value.
(region_model::deref_rvalue): Add param "add_nonnull_constraint"
and use it to conditionalize adding the constraint.
(pending_diagnostic_subclass::dubious_allocation_size): Add "stmt"
param to both ctors and use it to initialize new "m_stmt" field.
(pending_diagnostic_subclass::operator==): Use m_stmt; don't use
m_lhs or m_rhs.
(pending_diagnostic_subclass::m_stmt): New field.
(region_model::check_region_size): Generalize to any kind of
pointer svalue by using deref_rvalue rather than checking for
region_svalue.  Pass stmt to dubious_allocation_size ctor.
* region-model.h (region_model::deref_rvalue): Add param
"add_nonnull_constraint".
* svalue.cc (conjured_svalue::lhs_value_p): New function.
* svalue.h (conjured_svalue::lhs_value_p): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/110426
* gcc.dg/analyzer/allocation-size-1.c: Update expected message to
reflect consolidation of size and assignment into a single event.
* gcc.dg/analyzer/allocation-size-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-3.c: Likewise.
* gcc.dg/analyzer/allocation-size-4.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-1.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise.
* gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise.
* gcc.dg/analyzer/attr-alloc_size-1.c: New test.
* gcc.dg/analyzer/attr-alloc_size-2.c: New test.
* gcc.dg/analyzer/attr-alloc_size-3.c: New test.
* gcc.dg/analyzer/explode-4.c: New test.
* gcc.dg/analyzer/taint-size-1.c: Add test coverage for
__attribute__ alloc_size.

Signed-off-by: David Malcolm 

[Bug tree-optimization/18437] vectorizer failed for matrix multiplication

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437

--- Comment #9 from Andrew Pinski  ---
For the original testcase in comment #0, with `-O3 -fno-vect-cost-model` GCC
can vectorize it on aarch64 but not on x86_64.

[Bug tree-optimization/49955] Fails to do partial basic-block SLP

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

--- Comment #4 from Andrew Pinski  ---
The testcase in comment #0 started to be vectorized in GCC 13 

[Bug tree-optimization/35224] scalar evolution analysis fails with "evolution of base is not affine"

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35224

--- Comment #1 from Andrew Pinski  ---
Created attachment 55691
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55691=edit
testcase

Re: [committed][RISC-V] Fix 20010221-1.c with zicond

2023-08-04 Thread Jeff Law via Gcc-patches




On 8/4/23 03:29, Xiao Zeng wrote:

On Thu, Aug 03, 2023 at 01:20:00 AM  Jeff Law  wrote:




In the wrong two optimization modes, I only considered the
case of satisfying the ELSE branch, but in fact, like the correct
two optimization modes, I should consider the case of satisfying
both the THAN and ELSE branches.
It happens -- we all make mistakes.  FWIW I didn't spot it during the 
review either.





By the way, I was assigned other tasks during the week and
didn't have time to reply to emails, sorry.
No worries.  I'm trying to keep this moving because we have multiple 
submissions from different authors in this space as well as bits 
internal to Ventana.  That's a recipe for a messy integration phase if 
it's not well managed.


It's also something I kept meaning to resolve and your submission just 
gave me the proper motivation to move zicond forward.  The target 
specific bits you did lined up perfectly with the community feedback on 
the original VRULL implementation as well as the direction Ventana had 
taken on our internal tree.


Jeff


[Bug tree-optimization/30049] Variable-length arrays (VLA) should be converted to normal arrays if possible

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30049

--- Comment #3 from Andrew Pinski  ---
The only difference I saw is scheduling and some small IV-OPTs difference ...

[V2][PATCH 3/3] Use the counted_by attribute information in bound sanitizer[PR108896]

2023-08-04 Thread Qing Zhao via Gcc-patches
gcc/c-family/ChangeLog:

PR C/108896
* c-ubsan.cc (ubsan_instrument_bounds): Use counted_by attribute
information.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 16 +++
 .../ubsan/flex-array-counted-by-bounds-2.c| 27 +++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 3 files changed, 89 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 51aa83a378d2..a99e8433069f 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -362,6 +362,10 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
 {
   tree type = TREE_TYPE (array);
   tree domain = TYPE_DOMAIN (type);
+  /* whether the array ref is a flexible array member with valid counted_by
+ attribute.  */
+  bool fam_has_count_attr = false;
+  tree counted_by = NULL_TREE;
 
   if (domain == NULL_TREE)
 return NULL_TREE;
@@ -375,6 +379,17 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  /* If the array ref is to flexible array member field which has
+counted_by attribute.  We can use the information from the
+attribute as the bound to instrument the reference.  */
+  else if ((counted_by = component_ref_get_counted_by (array))
+   != NULL_TREE)
+   {
+ fam_has_count_attr = true;
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (counted_by),
+  counted_by,
+  build_int_cst (TREE_TYPE (counted_by), 1));
+   }
   else
return NULL_TREE;
 }
@@ -387,6 +402,7 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  -fsanitize=bounds-strict.  */
   tree base = get_base_address (array);
   if (!sanitize_flags_p (SANITIZE_BOUNDS_STRICT)
+  && !fam_has_count_attr
   && TREE_CODE (array) == COMPONENT_REF
   && base && (INDIRECT_REF_P (base) || TREE_CODE (base) == MEM_REF))
 {
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..77ec333509d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,27 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+int main(int argc, char *argv[])
+{
+  setup_and_test_vla (10, 11);
+  return 0;
+}
+
+/* { dg-output "17:8: runtime error: index 11 out of bounds for type" } */
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
new file mode 100644
index ..81eaeb3f2681
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
@@ -0,0 +1,46 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+#include 
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int 
annotated_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + annotated_count *  sizeof (int));
+  array_annotated->b = annotated_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test (int normal_index, int annotated_index)
+{
+  array_flex->c[normal_index] = 1;
+  array_annotated->c[annotated_index] = 2;
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10, 10);   
+  test (10, 10);
+  return 0;
+}
+
+/* { dg-output "36:21: runtime error: index 10 out of bounds for type" } */
-- 
2.31.1



[V2][PATCH 2/3] Use the counted_by atribute info in builtin object size [PR108896]

2023-08-04 Thread Qing Zhao via Gcc-patches
gcc/ChangeLog:

PR C/108896
* tree-object-size.cc (addr_object_size): Use the counted_by
attribute info.
* tree.cc (component_ref_has_counted_by_p): New function.
(component_ref_get_counted_by): New function.
* tree.h (component_ref_has_counted_by_p): New prototype.
(component_ref_get_counted_by): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by-2.c: New test.
* gcc.dg/flex-array-counted-by-3.c: New test.
---
 .../gcc.dg/flex-array-counted-by-2.c  |  74 +++
 .../gcc.dg/flex-array-counted-by-3.c  | 197 ++
 gcc/tree-object-size.cc   |  37 +++-
 gcc/tree.cc   |  95 -
 gcc/tree.h|  10 +
 5 files changed, 405 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c

diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
new file mode 100644
index ..ec580c1f1f01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
@@ -0,0 +1,74 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+#define expect(p, _v) do { \
+size_t v = _v; \
+if (p == v) \
+   __builtin_printf ("ok:  %s == %zd\n", #p, p); \
+else \
+   {  \
+ __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
+ FAIL (); \
+   } \
+} while (0);
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+expect(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+expect(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+expect(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..22ef2af31c20
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,197 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the allocaiton
+size mismatched with the value of counted_by attribute?  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  int array[] __attribute__((counted_by (foo)));
+};
+
+#define expect(p, _v) do { \
+size_t v = _v; \
+if (p == v) \
+__builtin_printf ("ok:  %s == %zd\n", #p, p); \
+else \
+{  \
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
+ FAIL (); \
+} \
+} while (0);
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 5
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   there are only two reliable ways:
+   A. observed allocations  (call to the allocation functions in the routine)
+   B. observed accesses (read or write access to the location of the
+ pointer points to)
+
+   that provide information about the type/existence of an object at
+   the corresponding address.
+
+   for A, we use the "alloc_size" attribute for the corresponding allocation
+   functions to determine the object size;
+
+   For B, we use the SIZE info of the TYPE attached to the corresponding 
access.
+   

[V2][PATCH 1/3] Provide counted_by attribute to flexible array member field (PR108896)

2023-08-04 Thread Qing Zhao via Gcc-patches
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the flexible array
 member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.  GCC uses this
 information to improve the results of the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
int array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  An explicit 'counted_by' annotation defines a
 relationship between two objects, 'p->array' and 'p->count', that
 'p->array' has _at least_ 'p->count' number of elements available.
 This relationship must hold even after any of these related objects
 are updated.  It's the user's responsibility to make sure this
 relationship to be kept all the time.  Otherwise the results of the
 array bound sanitizer and the '__builtin_dynamic_object_size' might
 be incorrect.

 For instance, in the following example, the allocated array has
 less elements than what's specified by the 'sbuf->count', this is
 an user error.  As a result, out-of-bounds access to the array
 might not be detected.

  #define SIZE_BUMP 10
  struct P *sbuf;
  void alloc_buf (size_t nelems)
  {
sbuf = (int *) malloc (sizeof (struct P) + sizeof (int) * nelems);
sbuf->count = nelems + SIZE_BUMP;
/* This is invalid when the sbuf->array has less than sbuf->count
   elements.  */
  }

 In the following example, the 2nd update to the field 'sbuf->count'
 of the above structure will permit out-of-bounds access to the
 array 'sbuf>array' as well.

  #define SIZE_BUMP 10
  struct P *sbuf;
  void alloc_buf (size_t nelems)
  {
sbuf = (int *) malloc (sizeof (struct P)
 + sizeof (int) * (nelems + SIZE_BUMP));
sbuf->count = nelems;
/* This is valid when the sbuf->array has at least sbuf->count
   elements.  */
  }
  void use_buf (int index)
  {
sbuf->count = sbuf->count + SIZE_BUMP + 1;
/* Now the value of sbuf->count is larger than the number
   of elements of sbuf->array.  */
sbuf->array[index] = 0;
/* then the out-of-bound access to this array
   might not be detected.  */
  }

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.
* tree.cc (get_named_field): New function.
* tree.h (get_named_field): New prototype.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc| 54 -
 gcc/c-family/c-common.cc | 13 
 gcc/c-family/c-common.h  |  1 +
 gcc/c/c-decl.cc  | 79 +++-
 gcc/doc/extend.texi  | 73 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 ++
 gcc/tree.cc  | 40 ++
 gcc/tree.h   |  5 ++
 8 files changed, 287 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898b..65e4f6639109 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -103,6 +103,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  

[V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-04 Thread Qing Zhao via Gcc-patches
Hi,

This is the 2nd version of the patch, per our discussion based on the
review comments for the 1st version, the major changes in this version
are:

1. change the name "element_count" to "counted_by";
2. change the parameter for the attribute from a STRING to an
Identifier;
3. Add logic and testing cases to handle anonymous structure/unions;
4. Clarify documentation to permit the situation when the allocation
size is larger than what's specified by "counted_by", at the same time,
it's user's error if allocation size is smaller than what's specified by
"counted_by";
5. Add a complete testing case for using counted_by attribute in
__builtin_dynamic_object_size when there is mismatch between the
allocation size and the value of "counted_by", the expecting behavior
for each case and the explanation on why in the comments. 

As discussed, I plan to add two more separate patch sets after this initial
patch set is approved and committed.

set 1. A new warning option and a new sanitizer option for the user error
   when the allocation size is smaller than the value of "counted_by".
set 2. An improvement to __builtin_dynamic_object_size  for the following
   case:

struct A
{
size_t foo;
int array[] __attribute__((counted_by (foo)));
};

extern struct fix * alloc_buf ();

int main ()
{
struct fix *p = alloc_buf ();
__builtin_object_size(p->array, 0) == sizeof(struct A) + p->foo * sizeof(int);
  /* with the current algorithm, it’s UNKNOWN */ 
__builtin_object_size(p->array, 2) == sizeof(struct A) + p->foo * sizeof(int);
  /* with the current algorithm, it’s UNKNOWN */
}

Bootstrapped and regression tested on both aarch64 and X86, no issue.

Please see more details on the description of this work on:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619708.html

Okay for committing?

thanks.

Qing

Qing Zhao (3):
  Provide counted_by attribute to flexible array member field (PR108896)
  Use the counted_by atribute info in builtin object size [PR108896]
  Use the counted_by attribute information in bound sanitizer[PR108896]

 gcc/c-family/c-attribs.cc |  54 -
 gcc/c-family/c-common.cc  |  13 ++
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-ubsan.cc   |  16 ++
 gcc/c/c-decl.cc   |  79 +--
 gcc/doc/extend.texi   |  73 +++
 .../gcc.dg/flex-array-counted-by-2.c  |  74 +++
 .../gcc.dg/flex-array-counted-by-3.c  | 197 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  40 
 .../ubsan/flex-array-counted-by-bounds-2.c|  27 +++
 .../ubsan/flex-array-counted-by-bounds.c  |  46 
 gcc/tree-object-size.cc   |  37 +++-
 gcc/tree.cc   | 133 
 gcc/tree.h|  15 ++
 14 files changed, 780 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

-- 
2.31.1



[Bug tree-optimization/30049] Variable-length arrays (VLA) should be converted to normal arrays if possible

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30049

--- Comment #2 from Andrew Pinski  ---
Created attachment 55690
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55690=edit
testcase

[apinski@xeond2 upstream-gcc-git]$ ~/upstream-gcc/bin/gcc  t.c -march=opteron
-ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 -g
[apinski@xeond2 upstream-gcc-git]$ time ./a.out
real0m1.522s
user0m1.517s
sys 0m0.001s
[apinski@xeond2 upstream-gcc-git]$ ~/upstream-gcc/bin/gcc  t.c -march=opteron
-ffast-math -funroll-loops -ftree-vectorize -msse3 -O3 -g -DNORMAL_ARRAY
[apinski@xeond2 upstream-gcc-git]$ time ./a.out
real0m0.356s
user0m0.352s
sys 0m0.002s

[Bug tree-optimization/32806] Missing optimization to remove backward dependencies

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32806

--- Comment #2 from Andrew Pinski  ---
Created attachment 55689
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55689=edit
compilable testcase

Re: One question on the source code of tree-object-size.cc

2023-08-04 Thread Qing Zhao via Gcc-patches


> On Aug 4, 2023, at 3:09 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-04 15:06, Qing Zhao wrote:
>>> Yes, that's what I'm thinking.
>>> 
> so `q` must be pointing to a single element.  So you could deduce:
> 
> 1. the minimum size of the whole object that q points to.
 You mean that the TYPE will determine the minimum size of the whole 
 object?  (Does this include the size of the flexible array member, or only 
 the other part of the structure except the flexible array member?)
>>> 
>>> Only the constant sized part of the structure.
>> Okay. I see.
>> But if the “counted_by” info is available, then from p->array, we can deduce 
>> the minimum size too, as sizeof(struct A) + q->foo * sizeof(int), right?
> 
> Yes.
> 
>>> 
> Actually for minimum size we'd also need a guarantee that 
> `alloc_buf_more` returns a valid allocated object.
 Why? Please explain a little bit here.
>>> 
>>> So `alloc_buf_more` could return NULL, a valid pointer or an invalid 
>>> pointer.  So, we could end up returning a non-zero minimum size for an 
>>> invalid or NULL pointer, which is incorrect, we don't know that.
>> I see what’ s you mean now.
>> However, if we already see p->array, then the p is guaranteed a valid 
>> pointer and not a NULL, right?  (We are discussing on 
>> __builtin_dynamic_object_size (q->array, 2), we see q->array already)
> 
> Yes, you could argue that for p->array, I agree, but not for p.

Agreed. Yes, for p->array, observed access. -:)

Looks like we can improve __builtin_dynamic_object_size  for the following case:
struct A
{
 size_t foo;
 int array[] __attribute__((counted_by (foo)));
};

extern struct fix * alloc_buf ();

int main ()
{
 struct fix *p = alloc_buf ();
 __builtin_object_size(p->array, 0) == sizeof(struct A) + p->foo * sizeof(int); 
  /* with the current algorithm, it’s UNKNOWN */ 
 __builtin_object_size(p->array, 2) == sizeof(struct A) + p->foo * sizeof(int); 
  /* with the current algorithm, it’s UNKNOWN */
}

I will add this improvement to __builtin_dynamic_object_size for FAM with 
“counted_by” attribute in a later patch after the initial patch is committed.

Thanks a lot for the help.

Qing
> 
>>> 
>>> We won't need the object validity guarantee for (2) beyond, e.g. guarding 
>>> against a new NULL pointer dereference because it's a *maximum* estimate; 
>>> an invalid or NULL pointer would have 0 size.  So for such cases, __bos(q, 
>>> 0) could return
>>> 
>>> sizeof(*q) + (q ? q->foo:0)
>>> 
>>> and __bos(q->array, 0) could be
>>> 
>>> sizeof(*q) + q->foo - offsetof(q, array)
>>> 
>>> There's no need to guard against a dereference in the second case because 
>>> the q->array dereference already assumes that q is valid.
>> q->array should also guarantee that q is a valid pointer for minimum size, 
>> right? Or do I miss anything here?
> 
> Yes.
> 
> Thanks,
> Sid



[Bug other/109910] GCC prologue/epilogue saves/restores callee-saved registers that are never changed

2023-08-04 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109910

Georg-Johann Lay  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-04
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

Re: One question on the source code of tree-object-size.cc

2023-08-04 Thread Siddhesh Poyarekar

On 2023-08-04 15:06, Qing Zhao wrote:

Yes, that's what I'm thinking.


so `q` must be pointing to a single element.  So you could deduce:

1. the minimum size of the whole object that q points to.

You mean that the TYPE will determine the minimum size of the whole object?  
(Does this include the size of the flexible array member, or only the other 
part of the structure except the flexible array member?)


Only the constant sized part of the structure.

Okay. I see.
But if the “counted_by” info is available, then from p->array, we can deduce the 
minimum size too, as sizeof(struct A) + q->foo * sizeof(int), right?


Yes.




Actually for minimum size we'd also need a guarantee that `alloc_buf_more` 
returns a valid allocated object.

Why? Please explain a little bit here.


So `alloc_buf_more` could return NULL, a valid pointer or an invalid pointer.  
So, we could end up returning a non-zero minimum size for an invalid or NULL 
pointer, which is incorrect, we don't know that.


I see what’ s you mean now.

However, if we already see p->array, then the p is guaranteed a valid pointer and not 
a NULL, right?  (We are discussing on __builtin_dynamic_object_size (q->array, 2), we 
see q->array already)


Yes, you could argue that for p->array, I agree, but not for p.



We won't need the object validity guarantee for (2) beyond, e.g. guarding 
against a new NULL pointer dereference because it's a *maximum* estimate; an 
invalid or NULL pointer would have 0 size.  So for such cases, __bos(q, 0) 
could return

sizeof(*q) + (q ? q->foo:0)

and __bos(q->array, 0) could be

sizeof(*q) + q->foo - offsetof(q, array)

There's no need to guard against a dereference in the second case because the 
q->array dereference already assumes that q is valid.


q->array should also guarantee that q is a valid pointer for minimum size, 
right? Or do I miss anything here?


Yes.

Thanks,
Sid


Re: [committed][RISC-V] Remove errant hunk of code

2023-08-04 Thread Andrew Pinski via Gcc-patches
On Thu, Aug 3, 2023 at 10:31 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/3/23 17:38, Vineet Gupta wrote:
>
> >> ;-)  Actually if you wanted to poke at zicond, the most interesting
> >> unexplored area I've come across is the COND_EXPR handling in gimple.
> >> When we expand a COND_EXPR into RTL the first approach we take is to
> >> try movcc in RTL.
> >>
> >> Unfortunately we don't create COND_EXPRs all that often in gimple.
> >> Some simple match.pd patterns would likely really help here.
> >>
> >> The problem is RTL expansion when movcc FAILs is usually poor at
> >> best.  So if we're going to add those match.pd patterns, we probably
> >> need to beef up the RTL expansion code to do a better job when the
> >> target doesn't have a movcc RTL pattern.
> >
> > Ok, I'll add that to my todo list.
> You might want to reach out to Andrew Pinski if you do poke at this.  I
> made a reference to this issue in a BZ he recently commented on.  It was
> an x86 issue with cmov generation, but the same core issue applies --
> we're not generating COND_EXPRs very aggressively in gimple.

Yes I have some ideas of producing more aggressively COND_EXPR in
either isel or in the last phiopt.
There is also a canonicalization form issue dealing with `bool * b`
representing `bool ? b : 0` where isel could select between the
COND_EXPR and multiply too.
This is the issue Jeff is talking about too.

Thanks,
Andrew

>
> jeff


Re: One question on the source code of tree-object-size.cc

2023-08-04 Thread Qing Zhao via Gcc-patches


> On Aug 4, 2023, at 12:36 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-04 11:27, Qing Zhao wrote:
>>> On Aug 4, 2023, at 10:40 AM, Siddhesh Poyarekar  wrote:
>>> 
>>> On 2023-08-03 13:34, Qing Zhao wrote:
 One thing I need to point out first is, currently, even for regular fixed 
 size array in the structure,
 We have this same issue, for example:
 #define LENGTH 10
 struct fix {
   size_t foo;
   int array[LENGTH];
 };
 …
 int main ()
 {
   struct fix *p;
   p = alloc_buf_more ();
   expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int));
   expect(__builtin_object_size(p->array, 0), -1);
 }
 Currently, for __builtin_object_size(p->array, 0),  GCC return UNKNOWN for 
 it.
 This is not a special issue for flexible array member.
>>> 
>>> That's fine for fixed arrays at the end of a struct because the "whole 
>>> object" size could be anything; `p` could be pointing to the beginning of 
>>> an array for all we know.  If however `array` is strictly a flex array, 
>>> i.e.:
>>> 
>>> ```
>>> struct A
>>> {
>>>  size_t foo;
>>>  int array[];
>>> };
>>> ```
>>> 
>>> then there's no way in valid C to have an array of `struct fix`,
>> Yes!!   this is exactly the place that makes difference between structures 
>> with fixed arrays and the ones with flexible arrays.
>> With such difference, I guess that using the type of the structure with 
>> flexible array member for p->array to get the size of the whole object p 
>> point to might be reasonable?
> 
> Yes, that's what I'm thinking.
> 
>>> so `q` must be pointing to a single element.  So you could deduce:
>>> 
>>> 1. the minimum size of the whole object that q points to.
>> You mean that the TYPE will determine the minimum size of the whole object?  
>> (Does this include the size of the flexible array member, or only the other 
>> part of the structure except the flexible array member?)
> 
> Only the constant sized part of the structure.
Okay. I see.
But if the “counted_by” info is available, then from p->array, we can deduce 
the minimum size too, as sizeof(struct A) + q->foo * sizeof(int), right?
> 
>>> Actually for minimum size we'd also need a guarantee that `alloc_buf_more` 
>>> returns a valid allocated object.
>> Why? Please explain a little bit here.
> 
> So `alloc_buf_more` could return NULL, a valid pointer or an invalid pointer. 
>  So, we could end up returning a non-zero minimum size for an invalid or NULL 
> pointer, which is incorrect, we don't know that.

I see what’ s you mean now.

However, if we already see p->array, then the p is guaranteed a valid pointer 
and not a NULL, right?  (We are discussing on __builtin_dynamic_object_size 
(q->array, 2), we see q->array already)

> 
> We won't need the object validity guarantee for (2) beyond, e.g. guarding 
> against a new NULL pointer dereference because it's a *maximum* estimate; an 
> invalid or NULL pointer would have 0 size.  So for such cases, __bos(q, 0) 
> could return
> 
> sizeof(*q) + (q ? q->foo:0)
> 
> and __bos(q->array, 0) could be
> 
> sizeof(*q) + q->foo - offsetof(q, array)
> 
> There's no need to guard against a dereference in the second case because the 
> q->array dereference already assumes that q is valid.

q->array should also guarantee that q is a valid pointer for minimum size, 
right? Or do I miss anything here?

thanks.

Qing
> 
>>> 
>>> and
>>> 
>>> 2. if you're able to determine the size of the flex array (through 
>>> __element_count__(foo) for example), you could even determine the maximum 
>>> size of the whole object.
>>> 
>>> For (2) though, you'd break applications that overallocate and then expect 
>>> to be able to use that overallocation despite the space not being reflected 
>>> in the __element_count__.  I think it's a bug in the application and I 
>>> can't see a way for an application to be able to do this in a valid way so 
>>> I'm inclined towards breaking it.
>> Currently, we allow the situation when the allocation size for the whole 
>> object is larger than the value reflected in the “counted_by” attribute (the 
>> old name is __element_count__). But don’t allow the other way around (i.e, 
>> when the allocation size for the whole object is smaller than the value 
>> reflected in the “counted_by” attribute.
> 
> Right, that's going to be the "break".  For underallocation __bos will only 
> end up overestimating the space available, which is not ideal, but won't end 
> up breaking compatibility.
> 
>>> 
>>> Of course, the fact that gcc allows flex arrays to be in the middle of 
>>> structs breaks the base assumption but that's something we need to get rid 
>>> of anyway since there's no way for valid C programs to use that safely.
>> Since GCC14, we started to deprecate this extension (allow flex array to be 
>> in the middle of structs).
>> https://gcc.gnu.org/pipermail/gcc-cvs/2023-June/385730.html
> 
> Yes, that's what I'm banking on.
> 
> Thanks,
> Sid



[Bug c++/110905] GCC rejects constexpr code that may re-initialize union member

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-04
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
>In my more minimal test case the error is more terse and less clear.


The reduced testcase is a different issue and is a dup of bug 85944.

In the first testcase provided below if we move the static_assert into main
instead of the toplevel, it gets accepted.

I think you need to redo your reduction.

[Bug c++/110905] New: GCC rejects constexpr code that may re-initialize union member

2023-08-04 Thread danakj at orodu dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110905

Bug ID: 110905
   Summary: GCC rejects constexpr code that may re-initialize
union member
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: danakj at orodu dot net
  Target Milestone: ---

Godbolt: https://gcc.godbolt.org/z/v5anxqnP1

This repro contains a std::optional (which has a union) and it sets the union
in a loop. Doing so causes GCC to reject the code as not being a constant
expression. The error I was getting in my project was far more descriptive,
with it trying to call the deleted constructor of the union.

error: use of deleted function
‘sus::option::__private::Storage,
false>()’

In my more minimal test case the error is more terse and less clear.

:62:59: error: non-constant condition for static assertion
   62 | static_assert(Flatten({{1, 2, 3}, {}, {4, 5}}).sum() == 1 + 2 + 3
+ 4 + 5);
  |  
^~~~
:62:51: error: '(((const std::vector
>*)(&)) != 0)' is not a constant expression
   62 | static_assert(Flatten({{1, 2, 3}, {}, {4, 5}}).sum() == 1 + 2 + 3
+ 4 + 5);
  |   

```cpp
#include 
#include 

template 
struct VectorIter {
constexpr std::optional next() {
if (front == back) return std::optional();
T& item = v[front];
front += 1u;
return std::optional(std::move(item));
}

constexpr VectorIter(std::vector v2) : v(std::move(v2)), front(0u),
back(v.size()) {}
VectorIter(VectorIter&&) = default;
VectorIter& operator=(VectorIter&&) = default;

std::vector v;
size_t front;
size_t back;
};

template 
struct Flatten {
constexpr Flatten(std::vector> v) : vec(std::move(v)) {}

constexpr std::optional next() {
std::optional out;
while (true) {
// Take an item off front_iter_ if possible.
if (front_iter_.has_value()) {
out = front_iter_.value().next();
if (out.has_value()) return out;
front_iter_ = std::nullopt;
}
// Otherwise grab the next vector into front_iter_.
if (!vec.empty()) {
std::vector v = std::move(vec[0]);
vec.erase(vec.begin());
front_iter_.emplace([](auto&& iter) {
return VectorIter(std::move(iter));
}(std::move(v)));
}
if (!front_iter_.has_value()) break;
}
return out;
}

constexpr T sum() && {
T out = T();
while (true) {
std::optional i = next();
if (!i.has_value()) break;
out += *i;
}
return out;
}

std::vector> vec;
std::optional> front_iter_;
};

static_assert(Flatten({{1, 2, 3}, {}, {4, 5}}).sum() == 1 + 2 + 3 + 4 +
5);

int main() {}

```

When the Flatten::next() method is simplified a bit, so that it can see the
union is only initialized once, the GCC compiler no longer rejects the code.
https://gcc.godbolt.org/z/szfGsdxb7

```cpp
#include 
#include 

template 
struct VectorIter {
constexpr std::optional next() {
if (front == back) return std::optional();
T& item = v[front];
front += 1u;
return std::optional(std::move(item));
}

constexpr VectorIter(std::vector v2) : v(std::move(v2)), front(0u),
back(v.size()) {}
VectorIter(VectorIter&&) = default;
VectorIter& operator=(VectorIter&&) = default;

std::vector v;
size_t front;
size_t back;
};

template 
struct Flatten {
constexpr Flatten(std::vector v) : vec(std::move(v)) {}

constexpr std::optional next() {
std::optional out;
while (true) {
// Take an item off front_iter_ if possible.
if (front_iter_.has_value()) {
out = front_iter_.value().next();
if (out.has_value()) return out;
front_iter_ = std::nullopt;
}
// Otherwise grab the next vector into front_iter_.
if (!moved) {
std::vector v = std::move(vec);
moved = true;
front_iter_.emplace([](auto&& iter) {
return VectorIter(std::move(iter));
}(std::move(v)));
}
if (!front_iter_.has_value()) break;
}
return out;
}

constexpr T sum() && {
T out = T();
while (true) {
std::optional i = next();
if (!i.has_value()) break;
out += *i;
}
return out;
}

bool moved = false;
std::vector vec;
std::optional> front_iter_;
};

static_assert(Flatten({1, 2, 3}).sum() == 1 + 2 + 3);

int main() {}
```

Yet in the first example, the GCC 

[Bug tree-optimization/110903] [12/13/14 Regression] Dead Code Elimination Regression

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110903

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-04
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |12.4
 Ever confirmed|0   |1
   Keywords||needs-bisection
Summary|[14 Regression] Dead Code   |[12/13/14 Regression] Dead
   |Elimination Regression  |Code Elimination Regression
   |since r14-1597-g64d90d06d2d |

--- Comment #2 from Andrew Pinski  ---
Confirmed. before r14-1597, there was a jump threading happening with respect
to:
  if (j_32 <= 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 238907556]:

   [local count: 477815112]:
  # iftmp.10_37 = PHI <_11(7), 0(8)>

But after, we change that into
iftmp.10_37 = _11 & (j_32 <= 0);

It just happens we depend on that due to:
  _43 = l_22 | _25;
  _39 = j_32 <= 0;
  _12 = ~_43;
  _44 = _12 & _39;


If we change the code to be:
```
void foo(void);
static signed char b, c;
static short e, f;
static int g = 41317;
static int(a)(int h, int i) { return h + i; }
static int(d)(int h, int i) { return i & h;}//i ? h : 0; }
short t = 10;
int main() {
{
signed char j;
short k = t;
for (; g >= 10; g = (short)g) {
_Bool l = 1;
int m = 0;
j = 8 * k;
k = j <= 0;
f = c + 3;
for (; c < 2; c = f) {
signed char n = 4073709551615;
if (!(((m) >= 0) && ((m) <= 0))) {
__builtin_unreachable();
}
if (g)
;
else {
if ((m = k, (b = a(d(l, k), e) && n) || l) < k) foo();
e = l = 0;
}
}
}
}
}
```

GCC 11 is able to remove the call to foo but GCC 12 cannot.
the IR for the part where the phiopt2 changes on the trunk is similar enough.

So this is instead a regression from GCC 11.

The gcc version used for compiling qt4.

2023-08-04 Thread amber.coles--- via Gcc
Did you ever solve this problem? I'm getting the exact same error.


[PATCH] _Decimal* to _BitInt conversion support [PR102989]

2023-08-04 Thread Jakub Jelinek via Gcc-patches
Hi!

Repost because the patch was too large.

On Fri, Jul 28, 2023 at 06:03:33PM +, Joseph Myers wrote:
> Note that representations with too-large significand are defined to be
> noncanonical representations of zero, so you need to take care of that in
> decoding BID.

Done.

> You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N
> etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied
> by a 34-digit integer significand), so that only one multiplication is
> needed to get the power of 10 and then a second multiplication by the
> significand.  (Or split into three parts at the cost of an extra
> multiplication, or multiply the significand by 1, 10, 100, 1000 or 1
> as a multiplication within 128 bits and so only need to compute 10^k for k
> a multiple of 5, or any number of variations on those themes.)

So, I've used N 256 and applied a little further space optimization,
omitting least significant whole limbs full of just zeros (for 32-bit limbs
actually pairs of those limbs, otherwise it would be a nightmare).
With that I got down to 32KiB or so (32128 bytes the limb array and
560 bytes the offset array), tables generated such that they can be used
with both 32-bit and 64-bit limbs and both little and big endian ordering of
them.

The following patch implements for now just the _Decimal -> _BitInt
conversions and uses the soft-fp infrastructure to raise exceptions (not
heavily tight to that, if the bitint.h header from soft-fp is copied/tweaked
and a few typedefs are provided it could be also in libbid if it grows
usable exception support).

I'll work on _BitInt -> _Decimal next and hope to use the __bid_pow10bitint
function in there as well (guess some safe lower power of 10 divisor,
use __divmodbitint4 to divide by that power of 10 including computing
remainder, analyze that remainder (check if it is 0, exact half of the
power of 10, or something smaller or larger than that) and if guessed too
low, divide the usually much smaller quotient again to get exact answer
(+ again check remainder).

The bitintpow10.c patch is included compressed, as the single source file is
408KiB.

2023-08-04  Jakub Jelinek  

PR c/102989
gcc/
* gimple-lower-bitint.cc (bitint_large_huge::lower_float_conv_stmt):
Handle _Decimal* to _BitInt conversions.
* internal-fn.cc (expand_FLOATTOBITINT): Likewise.
gcc/testsuite/
* gcc.dg/dfp/bitint-1.c: New test.
* gcc.dg/dfp/bitint-2.c: New test.
* gcc.dg/dfp/bitint-3.c: New test.
libgcc/
* config/t-softfp (softfp_bid_list, softfp_bid_file_list): New
variables.
(LIB2ADD_ST): Add $(softfp_bid_file_list).
* soft-fp/fixsdbitint.c: New file.
* soft-fp/fixddbitint.c: New file.
* soft-fp/fixtdbitint.c: New file.
* soft-fp/bitint.h (bitint_negate): New static inline function.
(__mulbitint3, __divmodbitint4, __bid_pow10bitint): Declare.
* soft-fp/bitintpow10.c: New file.

--- gcc/gimple-lower-bitint.cc.jj   2023-08-02 17:36:15.439915237 +0200
+++ gcc/gimple-lower-bitint.cc  2023-08-04 09:40:14.271005211 +0200
@@ -3363,8 +3363,7 @@ bitint_large_huge::lower_float_conv_stmt
   tree rhs1 = gimple_assign_rhs1 (stmt);
   tree lhs = gimple_assign_lhs (stmt);
   tree_code rhs_code = gimple_assign_rhs_code (stmt);
-  if (DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (rhs1)))
-  || DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (lhs
+  if (DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (lhs
 {
   sorry_at (gimple_location (stmt),
"unsupported conversion between %<_BitInt(%d)%> and %qT",
--- gcc/internal-fn.cc.jj   2023-07-26 10:06:29.233849044 +0200
+++ gcc/internal-fn.cc  2023-08-04 09:47:58.368480546 +0200
@@ -4846,11 +4846,25 @@ expand_FLOATTOBITINT (internal_fn, gcall
   const char *mname = GET_MODE_NAME (mode);
   unsigned mname_len = strlen (mname);
   int len = 12 + mname_len;
+  if (DECIMAL_FLOAT_MODE_P (mode))
+len += 4;
   char *libfunc_name = XALLOCAVEC (char, len);
   char *p = libfunc_name;
   const char *q;
-  memcpy (p, "__fix", 5);
-  p += 5;
+  if (DECIMAL_FLOAT_MODE_P (mode))
+{
+#if ENABLE_DECIMAL_BID_FORMAT
+  memcpy (p, "__bid_fix", 9);
+#else
+  memcpy (p, "__dpd_fix", 9);
+#endif
+  p += 9;
+}
+  else
+{
+  memcpy (p, "__fix", 5);
+  p += 5;
+}
   for (q = mname; *q; q++)
 *p++ = TOLOWER (*q);
   memcpy (p, "bitint", 7);
--- gcc/testsuite/gcc.dg/dfp/bitint-1.c.jj  2023-08-04 14:30:24.615100334 
+0200
+++ gcc/testsuite/gcc.dg/dfp/bitint-1.c 2023-08-04 19:37:26.834790279 +0200
@@ -0,0 +1,98 @@
+/* PR c/102989 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-O2 -std=c2x -pedantic-errors" } */
+
+#if __BITINT_MAXWIDTH__ >= 192
+__attribute__((noipa)) _BitInt(192)
+tests192 (_Decimal64 d)
+{
+  return d;
+}
+
+__attribute__((noipa)) unsigned _BitInt(192)
+testu192 (_Decimal64 d)
+{
+  return d;
+}
+#endif
+
+#if 

[Bug c++/109680] [13 Regression] is_convertible incorrectly true

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109680

Andrew Pinski  changed:

   What|Removed |Added

 CC||nikolasklauser at berlin dot de

--- Comment #14 from Andrew Pinski  ---
*** Bug 110904 has been marked as a duplicate of this bug. ***

[Bug c++/110904] __is_convertible incorrectly reports non-referenceable function prototypes as convertible

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110904

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 109680.

*** This bug has been marked as a duplicate of bug 109680 ***

[Bug c++/110904] New: __is_convertible incorrectly reports non-referenceable function prototypes as convertible

2023-08-04 Thread nikolasklauser at berlin dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110904

Bug ID: 110904
   Summary: __is_convertible incorrectly reports non-referenceable
 function prototypes as convertible
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nikolasklauser at berlin dot de
  Target Milestone: ---

```
#include 

using Function = void();
using ConstFunction = void() const;

static_assert((!std::is_convertible::value), "");
static_assert((!std::is_convertible::value), ""); //
convertible
static_assert((!std::is_convertible::value), ""); //
convertible
static_assert((!std::is_convertible::value), ""); //
convertible
static_assert((!std::is_convertible::value), "");
static_assert((!std::is_convertible::value), "");
static_assert((!std::is_convertible::value), "");
static_assert((!std::is_convertible::value), "");
```
__is_convertible() claims that the cases marked above are convertible, but
AFAICT that shouldn't be true. According to the standard,
```
To test() {
  return declval();
}
```
has to be well formed, but that's never the case for `ConstFunction`.

[Bug middle-end/110857] aarch64-linux-gnu profiledbootstrap broken

2023-08-04 Thread prathamesh3492 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110857

--- Comment #6 from prathamesh3492 at gcc dot gnu.org ---
profiledbootstrap now works on aarch64-linux-gnu, thanks!

[Bug tree-optimization/110903] [14 Regression] Dead Code Elimination Regression since r14-1597-g64d90d06d2d

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110903

--- Comment #1 from Andrew Pinski  ---
Note the original testcase has some obvious use of an uninitialized variable.
Anyways here is a fixed up testcase which does not have that uninitialized
variable and GCC 13 was able to optimize away the call to foo still:
```
void foo(void);
static signed char b, c;
static short e, f;
static int g = 41317;
static int(a)(int h, int i) { return h + i; }
static int(d)(int h, int i) { return i ? h : 0; }
short t = 10;
int main() {
{
signed char j;
short k = t;
for (; g >= 10; g = (short)g) {
_Bool l = 1;
int m = 0;
j = 8 * k;
k = j <= 0;
f = c + 3;
for (; c < 2; c = f) {
signed char n = 4073709551615;
if (!(((m) >= 0) && ((m) <= 0))) {
__builtin_unreachable();
}
if (g)
;
else {
if ((m = k, (b = a(d(l, k), e) && n) || l) < k) foo();
e = l = 0;
}
}
}
}
}
```

[PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-04 Thread Bradley Lucier via Gcc-patches
The patch at the end adds a warning when a tail/sibling call cannot be 
optimized for various reasons.


I built and tested GCC with and without the patch with configuration

Configured with: ../../gcc-mainline/configure --enable-languages=c 
--disable-multilib --prefix=/pkgs/gcc-mainline --disable-werror


There were some changes in the test results, but I can't say that they 
look substantive:


diff -C 2 summary.log ../gcc-mainline
*** summary.log Thu Aug  3 22:56:13 2023
--- ../gcc-mainline/summary.log Thu Aug  3 19:42:33 2023
***
*** 14,22 
=== g++ Summary ===

! # of expected passes  239234
  # of unexpected failures  5
  # of expected failures2087
! # of unsupported tests10566
! /home/lucier/programs/gcc/objdirs/gcc-mainline-new/gcc/xg++  version 
14.0.0 20230802 (experimental) (GCC)


=== gcc tests ===
--- 14,22 
=== g++ Summary ===

! # of expected passes  239262
  # of unexpected failures  5
  # of expected failures2087
! # of unsupported tests10562
! /home/lucier/programs/gcc/objdirs/gcc-mainline/gcc/xg++  version 
14.0.0 20230802 (experimental) (GCC)


=== gcc tests ===
***
*** 155,164 
=== gcc Summary ===

! # of expected passes  192553
  # of unexpected failures  109
  # of unexpected successes 19
  # of expected failures1506
! # of unsupported tests2623
! /home/lucier/programs/gcc/objdirs/gcc-mainline-new/gcc/xgcc  version 
14.0.0 20230802 (experimental) (GCC)


=== libatomic tests ===
--- 155,164 
=== gcc Summary ===

! # of expected passes  192563
  # of unexpected failures  109
  # of unexpected successes 19
  # of expected failures1506
! # of unsupported tests2619
! /home/lucier/programs/gcc/objdirs/gcc-mainline/gcc/xgcc  version 
14.0.0 20230802 (experimental) (GCC)


=== libatomic tests ===

I then configured and built GCC with

 ../../gcc-mainline/configure CXX="/pkgs/gcc-mainline-new/bin/g++ 
-Wdisabled-optimization" --enable-languages=c --disable-multilib 
--prefix=/pkgs/gcc-mainline-test --disable-werror --disable-bootstrap


to test the new warning.  The warnings are of the form, e.g.,

../../../gcc-mainline/gcc/tree-vect-stmts.cc:11990:44: warning: cannot 
apply sibling-call optimization: callee required more stack slots than 
the caller [-Wdisabled-optimization]


These are the number of times this warning was triggered building stage1:

grep warning: build.log | grep sibling | sed 's/^.*://' | sort | uniq -c
259  callee required more stack slots than the caller 
[-Wdisabled-optimization]

 43  callee returns a structure [-Wdisabled-optimization]

If this patch is OK, someone else will need to commit it for me.

Brad

gcc/Changelog

* calls.cc (maybe_complain_about_tail_call) Add warning when
tail or sibling call cannot be optimized.

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 1f3a6d5c450..b95c876fda8 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1242,10 +1242,12 @@ void
 maybe_complain_about_tail_call (tree call_expr, const char *reason)
 {
   gcc_assert (TREE_CODE (call_expr) == CALL_EXPR);
-  if (!CALL_EXPR_MUST_TAIL_CALL (call_expr))
-return;
-
-  error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
+  if (CALL_EXPR_MUST_TAIL_CALL (call_expr))
+error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
+  else if (flag_optimize_sibling_calls)
+warning (OPT_Wdisabled_optimization,
+ "cannot apply sibling-call optimization: %s", reason);
+  return;
 }

 /* Fill in ARGS_SIZE and ARGS array based on the parameters found in




Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-04 Thread Richard Biener via Gcc-patches



> Am 04.08.2023 um 18:26 schrieb Martin Jambor :
> 
> Hello,
> 
>> On Wed, Aug 02 2023, Richard Biener wrote:
>>> On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
>>> 
>>> Hi,
>>> 
>>> when IPA-SRA detects whether a parameter passed by reference is
>>> written to, it does not special case CLOBBERs which means it often
>>> bails out unnecessarily, especially when dealing with C++ destructors.
>>> Fixed by the obvious continue in the two relevant loops.
>>> 
>>> The (slightly) more complex testcases in the PR need surprisingly more
>>> effort but the simple one can be fixed now easily by this patch and I'll
>>> work on the others incrementally.
>>> 
>>> Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
>>> if it passes too?
>> 
>> LGTM, btw - how are the clobbers handled during transform?
> 
> it turns out your question is spot on.  I assumed that the mini-DCE that
> I implemented into IPA-SRA transform would delete but I had a closer
> look and it is not invoked on split parameters,only on removed ones.
> What was actually happening is that the parameter got remapped to a
> default definition of a replacement VAR_DECL and we were thus
> gimple-clobbering a pointer pointing to nowhere.  The clobber then got
> DSEd and so I originally did not notice looking at the optimized dump.
> 
> Still that is of course not ideal and so I added a simple function
> removing clobbers when splitting.  I as considering adding that
> functionality to ipa_param_body_adjustments::mark_dead_statements but
> that would make the function harder to read without much gain.
> 
> So thanks again for the remark.  The following passes bootstrap and
> testing on x86_64-linux.  I am running LTO bootstrap now.  OK if it
> passes?

Ok

Richard 

> Martin
> 
> 
> 
> When IPA-SRA detects whether a parameter passed by reference is
> written to, it does not special case CLOBBERs which means it often
> bails out unnecessarily, especially when dealing with C++ destructors.
> Fixed by the obvious continue in the two relevant loops and by adding
> a simple function that marks the clobbers in the transformation code
> as statements to be removed.
> 
> gcc/ChangeLog:
> 
> 2023-08-04  Martin Jambor  
> 
>PR ipa/110378
>* ipa-param-manipulation.h (class ipa_param_body_adjustments): New
>members get_ddef_if_exists_and_is_used and mark_clobbers_dead.
>* ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
>(ptr_parm_has_nonarg_uses): Likewise.
>* ipa-param-manipulation.cc
>(ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New.
>(ipa_param_body_adjustments::mark_dead_statements): Move initial
>checks to get_ddef_if_exists_and_is_used.
>(ipa_param_body_adjustments::mark_clobbers_dead): New.
>(ipa_param_body_adjustments::common_initialization): Call
>mark_clobbers_dead when splitting.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-07-31  Martin Jambor  
> 
>PR ipa/110378
>* g++.dg/ipa/pr110378-1.C: New test.
> ---
> gcc/ipa-param-manipulation.cc | 44 +---
> gcc/ipa-param-manipulation.h  |  2 ++
> gcc/ipa-sra.cc|  6 ++--
> gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++
> 4 files changed, 94 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C
> 
> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
> index a286af7f5d9..4a185ddbdf4 100644
> --- a/gcc/ipa-param-manipulation.cc
> +++ b/gcc/ipa-param-manipulation.cc
> @@ -1072,6 +1072,20 @@ ipa_param_body_adjustments::carry_over_param (tree t)
>   return new_parm;
> }
> 
> +/* If DECL is a gimple register that has a default definition SSA name and 
> that
> +   has some uses, return the default definition, otherwise return NULL_TREE. 
>  */
> +
> +tree
> +ipa_param_body_adjustments::get_ddef_if_exists_and_is_used (tree decl)
> +{
> + if (!is_gimple_reg (decl))
> +return NULL_TREE;
> +  tree ddef = ssa_default_def (m_id->src_cfun, decl);
> +  if (!ddef || has_zero_uses (ddef))
> +return NULL_TREE;
> +  return ddef;
> +}
> +
> /* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
>any replacement or splitting.  REPL is the replacement VAR_SECL to base any
>remaining uses of a removed parameter on.  Push all removed SSA names that
> @@ -1084,10 +1098,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
> dead_param,
>   /* Current IPA analyses which remove unused parameters never remove a
>  non-gimple register ones which have any use except as parameters in other
>  calls, so we can safely leve them as they are.  */
> -  if (!is_gimple_reg (dead_param))
> -return;
> -  tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param);
> -  if (!parm_ddef || has_zero_uses (parm_ddef))
> +  tree parm_ddef = get_ddef_if_exists_and_is_used (dead_param);
> +  if (!parm_ddef)
> return;
> 
>   auto_vec stack;
> @@ -1169,6 

[Bug tree-optimization/110903] New: [14 Regression] Dead Code Elimination Regression since r14-1597-g64d90d06d2d

2023-08-04 Thread theodort at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110903

Bug ID: 110903
   Summary: [14 Regression] Dead Code Elimination Regression since
r14-1597-g64d90d06d2d
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: theodort at inf dot ethz.ch
  Target Milestone: ---

https://godbolt.org/z/7of4jjM3K

Given the following code:

void foo(void);
static char b, c;
static short e, f;
static int g = 41317;
static int(a)(int h, int i) { return h + i; }
static int(d)(int h, int i) { return i ? h : 0; }
int main() {
{
char j;
short k;
for (; g >= 10; g = (short)g) {
int l = 1, m = 0;
j = 8 * k;
k = j <= 0;
f = c + 3;
for (; c < 2; c = f) {
char n = 4073709551615;
if (!(((m) >= 0) && ((m) <= 0))) {
__builtin_unreachable();
}
if (g)
;
else {
if ((m = k, (b = a(d(l, k), e) && n) || l) < k) foo();
e = l = 0;
}
}
}
}
}

gcc-trunk -O3 does not eliminate the call to foo:

main:
movlg(%rip), %edi
cmpl$9, %edi
jle .L25
pushq   %rbp
movl%edi, %ecx
movl$1, %ebp
movl$1, %esi
pushq   %rbx
movl$1, %ebx
subq$8, %rsp
movzbl  c(%rip), %edx
movsbw  %dl, %ax
addl$3, %eax
movw%ax, f(%rip)
cmpb$1, %dl
jg  .L12
.p2align 4,,10
.p2align 3
.L6:
testl   %edi, %edi
je  .L7
movb%al, c(%rip)
movsbw  %al, %dx
cmpb$1, %al
jle .L6
.L9:
movswl  %di, %ecx
movl%ecx, g(%rip)
cmpl$9, %ecx
jle .L17
addl$3, %edx
movw%dx, f(%rip)
.L12:
movswl  %cx, %eax
cmpw$9, %cx
jle .L29
.L4:
jmp .L4
.p2align 4,,10
.p2align 3
.L7:
movswl  e(%rip), %ecx
movl%ebx, %edx
andl%esi, %edx
addl%ecx, %edx
orl %esi, %edx
jne .L10
testb   %bpl, %bpl
jne .L30
.L10:
xorl%edx, %edx
movb%al, c(%rip)
movw%dx, e(%rip)
movsbw  %al, %dx
cmpb$1, %al
jg  .L9
xorl%esi, %esi
jmp .L6
.p2align 4,,10
.p2align 3
.L30:
callfoo
movzwl  f(%rip), %eax
movlg(%rip), %edi
jmp .L10
.L29:
movl%eax, g(%rip)
.L17:
addq$8, %rsp
xorl%eax, %eax
popq%rbx
popq%rbp
ret
.L25:
xorl%eax, %eax
ret

gcc-13.2.0 -O3 eliminates the call to foo:

main:
movlg(%rip), %esi
movl%esi, %ecx
cmpl$9, %esi
jle .L14
movzbl  c(%rip), %eax
movsbw  %al, %dx
addl$3, %edx
movw%dx, f(%rip)
cmpb$1, %al
jg  .L12
xorl%eax, %eax
testb   %al, %al
movl%edx, %eax
je  .L6
cmpb$1, %dl
jg  .L22
.L7:
jmp .L7
.p2align 4,,10
.p2align 3
.L22:
movb%dl, c(%rip)
.L8:
movswl  %si, %ecx
movl%ecx, g(%rip)
cmpl$9, %ecx
jle .L14
addl$3, %eax
cbtw
movw%ax, f(%rip)
.L12:
movswl  %cx, %eax
cmpw$9, %cx
jle .L23
.L4:
jmp .L4
.p2align 4,,10
.p2align 3
.L6:
movb%dl, c(%rip)
cmpw$1, %dx
jg  .L8
.p2align 4,,10
.p2align 3
.L9:
movlg(%rip), %eax
testl   %eax, %eax
jne .L9
movw$0, e(%rip)
movb%dl, c(%rip)
.L23:
movl%eax, g(%rip)
.L14:
xorl%eax, %eax
ret

Bisects to r14-1597-g64d90d06d2d

[Bug tree-optimization/92335] [11/12/13 Regression] sinking of loads happen too early which causes vectorization not to be done

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335

--- Comment #10 from Andrew Pinski  ---
*** Bug 95084 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/95084] [11/12/13/14 Regression] code sinking prevents if-conversion

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95084

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #6 from Andrew Pinski  ---
This was fixed by the patch which fixed PR 92335 and since that is still open
as a regression like this one I am going to close this one as a dup of bug
92335 and they are exactly the same issue even.

*** This bug has been marked as a duplicate of bug 92335 ***

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 95084, which changed state.

Bug 95084 Summary: [11/12/13/14 Regression] code sinking prevents if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95084

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug middle-end/94442] [11/12/13/14 regression] Redundant loads/stores emitted at -O3

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94442

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Target Milestone|11.5|11.0
 Resolution|--- |FIXED

--- Comment #13 from Andrew Pinski  ---
Fixed by r11-6794-g04b472ad0e1dc93abafe .

[Bug target/95958] [meta-bug] Inefficient arm_neon.h code for AArch64

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958
Bug 95958 depends on bug 94442, which changed state.

Bug 94442 Summary: [11/12/13/14 regression] Redundant loads/stores emitted at 
-O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94442

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug ada/110898] compilation of adacl-assert-integer.ads failed

2023-08-04 Thread krischik at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110898

Martin Krischik  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Martin Krischik  ---
@(In reply to Marc Poulhiès from comment #1)

> I've checked and I also get the same errors with gcc 11.x, so that's not
> something new. I think your code should be fixed here.

Yes, those error messages make sense. Especially the „error: child of a generic
package must be a generic unit“. That is indeed a problem on my side.

Thanks for checking. What confuse me was the not at all helpful “compilation of
adacl-assert-integer.ads failed” and the proper error message is no where to be
seen.

But is probably an Alire problem. I'll close the bug.

[Bug target/110901] -march does not override -mcpu (big.little on aarch64

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901

--- Comment #3 from Andrew Pinski  ---
With C code, these use of -march and -mcpu would normally be rejected even.

[Bug target/110901] -march does not override -mcpu on aarch64

2023-08-04 Thread raj.khem at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901

--- Comment #2 from Khem Raj  ---
(In reply to Andrew Pinski from comment #1)
> Order matters. In this case -march is after -mcpu ...

It does not seem to be effective in this case. I tried to specify -mcpu after
-march and vice-versa, result is same 


% ../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-gcc
-mbranch-protection=standard 
--sysroot=/mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot
-fuse-ld=bfd -c -march=armv8.2-a+sve -mcpu=cortex-a72.cortex-a53  a.s
a.s: Assembler messages:
a.s:2: Error: selected processor does not support `ptrue p0.b'

../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-gcc
-mbranch-protection=standard 
--sysroot=/mnt/b/yoe/master/build/tmp/work/cortexa72-cortexa53-crypto-yoe-linux/glibc/2.38-r0/recipe-sysroot
-fuse-ld=bfd -c -mcpu=cortex-a72.cortex-a53 -march=armv8.2-a+sve  a.s
a.s: Assembler messages:
a.s:2: Error: selected processor does not support `ptrue p0.b'

[Bug target/110901] -march does not override -mcpu on aarch64

2023-08-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110901

--- Comment #1 from Andrew Pinski  ---
Order matters. In this case -march is after -mcpu ...

[Bug analyzer/110902] New: Missing cast in region_model_manager::maybe_fold_binop on MULT_EXPR by 1

2023-08-04 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110902

Bug ID: 110902
   Summary: Missing cast in region_model_manager::maybe_fold_binop
on MULT_EXPR by 1
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

Whilst trying to fix PR analyzer/110426, I noticed that
region_model_manager::maybe_fold_binop doesn't always return the correct type;
specifically, it fails to cast to TYPE when folding (VAL * 1) -> VAL:

diff --git a/gcc/analyzer/region-model-manager.cc
b/gcc/analyzer/region-model-manager.cc
index 46d271a295c..010906f1ec0 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -654,7 +654,7 @@ region_model_manager::maybe_fold_binop (tree type, enum
tree_code op,
return get_or_create_constant_svalue (build_int_cst (type, 0));
   /* (VAL * 1) -> VAL.  */
   if (cst1 && integer_onep (cst1))
-   return arg0;
+   return get_or_create_cast (type, arg0);
   break;
 case BIT_AND_EXPR:
   if (cst1)

However, on adding the above cast, various bounds-checking tests fail,
seemingly due to confusion about ptrdiff_t vs size_t, and how to compare such
values:

FAIL: gcc.dg/analyzer/flexible-array-member-1.c  (test for warnings, line 96)

With -m64:
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-3.c  (test for warnings, line 19)
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-3.c  (test for warnings, line 24)
FAIL: gcc.dg/analyzer/out-of-bounds-diagram-3.c expected multiline pattern
lines 29-44

  1   2   3   >