date:20231026

[Bug tree-optimization/51030] PHI opt does not handle value-replacement with a transfer function

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51030

--- Comment #5 from Andrew Pinski  ---
Created attachment 56317
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56317=edit
First set of patches

Note the last patch is still being worked on really.
Note the first patch is just a small speed up and makes the last patch easier
to write.
The middle 2 patches are ones which move some of the optimizations that
value_replacement does to match. I think I need the `a == 1` cases still.

[Bug tree-optimization/51030] PHI opt does not handle value-replacement with a transfer function

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51030

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #4 from Andrew Pinski  ---
I am rewriting value-replacement in phi-opt to use match-and-simplify which
should simplify many of this.

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972

--- Comment #7 from Hongtao.liu  ---
(In reply to Andrew Pinski from comment #3)
> First off does this even make sense to vectorize but rather do some kind of
> scalar reduction with respect to j = j^1 here  .  Filed PR 112104 for that.
> 
> Basically vectorizing this loop is a waste compared to that.

Yes, it's always zero, it would be nice if the middle end can optimize the
whole loop off. So for this PR, it's more related to the misoptimization of the
redundant loop(better finalize the induction variable with a simple
assignment), not vectorization.

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972

--- Comment #6 from Hongtao.liu  ---
(In reply to Andrew Pinski from comment #5)
> Oh this is the original code:
> https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c
> 
Yes, it's from unixbench.

[Bug libstdc++/82366] std::regex constructor called from shared library throws std::bad_cast

2023-10-26 Thread a1ba.omarov at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82366

Alibek Omarov  changed:

   What|Removed |Added

 CC||a1ba.omarov at gmail dot com

--- Comment #7 from Alibek Omarov  ---
Can confirm, it still happens with GCC/libstdc++ 13.0. However, in my case,
it's in static initializer.

[Bug tree-optimization/112104] loop of ^1 should just be reduced to ^(n&1)

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104

--- Comment #1 from Andrew Pinski  ---
This shows up in a really really bad benchmark:
https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972

--- Comment #5 from Andrew Pinski  ---
Oh this is the original code:
https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c


HEHEHEHEHEHEHEHEH. Basically after optimizing:

  _9 = j_19 != 1;
  _14 = (long int) _9;

Over to:
_14 = j_19 ^ 1;

We could optimize this whole loop out.

Note this is a bad benchmark.

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972

--- Comment #4 from Andrew Pinski  ---
Is there a non-reduced testcase here? Or does the loop really just do j = j^1 ?

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=112104

--- Comment #3 from Andrew Pinski  ---
First off does this even make sense to vectorize but rather do some kind of
scalar reduction with respect to j = j^1 here  .  Filed PR 112104 for that.

Basically vectorizing this loop is a waste compared to that.

[Bug tree-optimization/112104] New: loop of ^1 should just be reduced to ^(n&1)

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104

Bug ID: 112104
   Summary: loop of ^1 should just be reduced to ^(n&1)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
int
foo(long n3) {
  int j = 0;
  for(int i=0; i=0)
j = j ^ (n3&1);
  return j;
}
```

We should figure out that j as the result is just alternating between 0 and 1
(VRP figures that that is the range) in SCCP pattern matching.

I Noticed this while looking into PR 111972

[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820

--- Comment #17 from Andrew Pinski  ---
*** Bug 111833 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 111820 ***

[Bug target/103861] [i386] vectorize v2qi vectors

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #15 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:7eed861e8ca3f533e56dea6348573caa09f16f5e

commit r14-4964-g7eed861e8ca3f533e56dea6348573caa09f16f5e
Author: liuhongt 
Date:   Mon Oct 23 13:40:10 2023 +0800

Support vec_cmpmn/vcondmn for v2hf/v4hf.

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
V2HF/V2BF/V4HF/V4BFmode.
* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
data_mode is V4HF/V2HFmode.
* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
(vcond_mask_v4hi): Ditto.
(vcond_mask_qi): Ditto.
(vec_cmpv2hfqi): Ditto.
(vcond_mask_v2hi): Ditto.
(mmx_plendvb_): Add 2 combine splitters after the
patterns.
(mmx_pblendvb_v8qi): Ditto.
(v2hi3): Add a combine splitter after the pattern.
(3): Ditto.
(v8qi3): Ditto.
(3): Ditto.
* config/i386/sse.md (vcond): Merge this with ..
(vcond): .. this into ..
(vcond): .. this,
and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: New test.
* gcc.target/i386/part-vect-vec_cmphf.c: New test.

[Bug target/111318] RISC-V: Redundant vsetvl instructions

2023-10-26 Thread lehua.ding at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111318

Lehua Ding  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Lehua Ding  ---
Confirmed.

[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833

--- Comment #5 from Hongtao.liu  ---
It's the same issue as PR111820, thus should be fixed.

[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #16 from Andrew Pinski  ---
.

[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820

--- Comment #15 from Hongtao.liu  ---
(In reply to Richard Biener from comment #13)
> (In reply to Hongtao.liu from comment #12)
> > Fixed in GCC14, not sure if we want to backport the patch.
> > If so, the patch needs to be adjusted since GCC13 doesn't support auto_mpz.
> 
> Yes, we want to backport.

Also fixed in GCC13.

[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820

--- Comment #14 from CVS Commits  ---
The releases/gcc-13 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:82919cf4cb232166fed03d84a91fefd07feef6bb

commit r13-7988-g82919cf4cb232166fed03d84a91fefd07feef6bb
Author: liuhongt 
Date:   Wed Oct 18 10:08:24 2023 +0800

Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear
induction vec_step_op_mul when iteration count is too big.

There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).

Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.

gcc/ChangeLog:

PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.

[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833

--- Comment #4 from CVS Commits  ---
The releases/gcc-13 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:82919cf4cb232166fed03d84a91fefd07feef6bb

commit r13-7988-g82919cf4cb232166fed03d84a91fefd07feef6bb
Author: liuhongt 
Date:   Wed Oct 18 10:08:24 2023 +0800

Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear
induction vec_step_op_mul when iteration count is too big.

There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).

Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.

gcc/ChangeLog:

PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #9 from JuzheZhong  ---
(In reply to Maciej W. Rozycki from comment #7)
> Thank you for all your explanations.  I think I'm still missing something
> here, so I'll write it differently (and let's ignore the tail-agnostic vs
> tail-undisturbed choice for the purpose of this consideration).
> 
> Why is the `vl' value determined by hardware from `avl' by an explicit
> request (!) of the programmer who inserted the vsetvl intrinsics ignored?
> Is the compiler able to prove the use of `avl' in place of `vl' does not
> affect the operation of the VLE32.V and VSE32.V instructions in any way?
> What is the purpose of these intrinsics if they can be freely ignored?
> 
> Please forgive me if my questions seem to you obvious to answer or
> irrelevant, I'm still rather new to this RVV stuff.

As long as the ratio of user vsetvl intrinsics are same as the following
RVV normal instruction, compiler is free to optimize it.

For example:

vl = __riscv_vsetvl_e32m1 (avl)
__riscv_vadd_vv_i32m1 (...,vl)

A naive way to insert vsetvl:

vsetvl VL, AVL e32 m1
vsetvl zero, VL e32 m1
vadd.vv

Howerver, since they are have same ratio, we can do it:

vsetvl zero, AVL e32 m1
vadd.vv

It's absolutely correct in-dependent on hardware.

However, different ratio:

vl = __riscv_vsetvl_e32m1 (avl)
__riscv_vadd_vv_i64m1 (...,vl)

vsetvl VL, AVL e32 m1
vsetvl zero, VL e64 m1
vadd.vv

We can't optimize it. This is the only correct codegen.

Thanks.

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #8 from JuzheZhong  ---
(In reply to Maciej W. Rozycki from comment #7)
> Thank you for all your explanations.  I think I'm still missing something
> here, so I'll write it differently (and let's ignore the tail-agnostic vs
> tail-undisturbed choice for the purpose of this consideration).
> 
> Let me paste the whole assembly code produced here (sans decorations):
> 
>   beq a5,zero,.L2
>   vsetvli zero,a6,e32,m1,tu,ma
> .L3:
>   beq a4,zero,.L7
>   li  a5,0
> .L5:
>   vle32.v v1,0(a0)
>   vle32.v v1,0(a1)
>   vle32.v v1,0(a2)
>   vse32.v v1,0(a3)
>   addia5,a5,1
>   bne a4,a5,.L5
> .L7:
>   ret
> .L2:
>   vsetvli zero,a6,e32,m1,tu,ma
>   j   .L3
> 
> This seems to me to correspond to this source code:
> 
>   if (cond)
> __riscv_vsetvl_e32m1(avl);
>   else
> __riscv_vsetvl_e16mf2(avl);
>   for (size_t i = 0; i < n; i += 1) {
> vint32m1_t a = __riscv_vle32_v_i32m1(in1, avl);
> vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, avl);
> vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, avl);
> __riscv_vse32_v_i32m1(out, c, avl);
>   }
> 
> And in that case I'd expect the conditional to be optimised away, as its
> result is ignored (along with the intrinsics) and does not affect actual
> code executed except for the different execution path, i.e.:
> 
>   beq a4,zero,.L7
>   vsetvli zero,a6,e32,m1,tu,ma
>   li  a5,0
> .L5:
>   vle32.v v1,0(a0)
>   vle32.v v1,0(a1)
>   vle32.v v1,0(a2)
>   vse32.v v1,0(a3)
>   addia5,a5,1
>   bne a4,a5,.L5
> .L7:
>   ret
> 

Good catch ! I think we have a missed-optimization here and I agree this code
is correct and optimal codegen for this case.
We have a close-to-optimal (not optimal enough) codegen for now.

And this optimization should not be done by VSETVL PASS.

After VSETVL PASS fusion, both e16mf2 and e32m1 user vsetvl instrinsic are
fused into e32m1, tu. They are totally the same so it's meaningless seperate
them into different blocks (They should be the same single block).

The reason why we missed an optimization here is because we expand user
vsetvl __riscv_vsetvl_e32m1 and __riscv_vsetvl_e16mf2 into 2 different
RTL expressions. The before PASSes (before VSETVL) don't known they are
equivalent, so separate them into different blocks.

If you change codes as follows:
  if (cond)
vl = __riscv_vsetvl_e32m1(avl);
  else
vl = __riscv_vsetvl_e32m1(avl);

I am sure the codegen will be as you said above. (A single vsetvl e32m1 tu in
a single block).

To optimize it, a alternative approach is that we expand all user vsetvl
instrinscs into same RTL expression (as long as they are having same ratio).


Meaning, expand 

__riscv_vsetvl_e64m1
__riscv_vsetvl_e32m1
__riscv_vsetvl_e16mf2
__riscv_vsetvl_e8mf8

into same RTL expression since their VL outputs are definitely the same.

I don't see it will cause any problems here.

But different ratio like 32m1 and e32mf2 should be different RLT expression.

I am not sure kito agree with this idea.


Another alternative approach is that we enhance bb_reorder PASS.
The VSETVL PASS is run before bb_reorder PASS and current bb_reorder PASS
is unable to fuse these 2 vsetvls e32m1 Tu into same block because we split
it into "real" vsetvls which is the RTL pattern has side effects.

The "real" vsetvl patterns which generate assembly should have side effects
since vsetvl does change global VL/VTYPE status and also set a general
register.

No matter which approach to optimize it, I won't do it in GCC-14 since stage 1
is soon to close.  We have a few more features (which are much more imporant)
that we are planning and working to support in GCC-14.
I have confidence that our RVV GCC current VSETVL PASS is really optimal and
fancy enough.

After stage 1 close, we won't do any optimizations, we will only run full
coverage testing (for example, using different LMUL different -march to run the
whole gcc testsuite) and fix bugs.

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread macro at orcam dot me.uk via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #7 from Maciej W. Rozycki  ---
Thank you for all your explanations.  I think I'm still missing something
here, so I'll write it differently (and let's ignore the tail-agnostic vs
tail-undisturbed choice for the purpose of this consideration).

Let me paste the whole assembly code produced here (sans decorations):

beq a5,zero,.L2
vsetvli zero,a6,e32,m1,tu,ma
.L3:
beq a4,zero,.L7
li  a5,0
.L5:
vle32.v v1,0(a0)
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
addia5,a5,1
bne a4,a5,.L5
.L7:
ret
.L2:
vsetvli zero,a6,e32,m1,tu,ma
j   .L3

This seems to me to correspond to this source code:

  if (cond)
__riscv_vsetvl_e32m1(avl);
  else
__riscv_vsetvl_e16mf2(avl);
  for (size_t i = 0; i < n; i += 1) {
vint32m1_t a = __riscv_vle32_v_i32m1(in1, avl);
vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, avl);
vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, avl);
__riscv_vse32_v_i32m1(out, c, avl);
  }

And in that case I'd expect the conditional to be optimised away, as its
result is ignored (along with the intrinsics) and does not affect actual
code executed except for the different execution path, i.e.:

beq a4,zero,.L7
vsetvli zero,a6,e32,m1,tu,ma
li  a5,0
.L5:
vle32.v v1,0(a0)
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
addia5,a5,1
bne a4,a5,.L5
.L7:
ret

However actual source code is as follows:

  size_t vl;
  if (cond)
vl = __riscv_vsetvl_e32m1(avl);
  else
vl = __riscv_vsetvl_e16mf2(avl);
  for (size_t i = 0; i < n; i += 1) {
vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
__riscv_vse32_v_i32m1(out, c, vl);
  }

Based on what you write I'd expect code like this instead:

beq a5,zero,.L2
vsetvli a6,a6,e16,mf2,ta,ma
.L3:
beq a4,zero,.L7
vsetvli zero,a6,e32,m1,tu,ma
li  a5,0
.L5:
vle32.v v1,0(a0)
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
addia5,a5,1
bne a4,a5,.L5
.L7:
ret
.L2:
vsetvli a6,a6,e32,m1,ta,ma
j   .L3

which is roughly what you say LLVM produces.

Why is the `vl' value determined by hardware from `avl' by an explicit
request (!) of the programmer who inserted the vsetvl intrinsics ignored?
Is the compiler able to prove the use of `avl' in place of `vl' does not
affect the operation of the VLE32.V and VSE32.V instructions in any way?
What is the purpose of these intrinsics if they can be freely ignored?

Please forgive me if my questions seem to you obvious to answer or
irrelevant, I'm still rather new to this RVV stuff.

[Bug target/111888] RISC-V: Horrible redundant number vsetvl instructions in vectorized codes

2023-10-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111888

JuzheZhong  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from JuzheZhong  ---
Fixed

[Bug target/111888] RISC-V: Horrible redundant number vsetvl instructions in vectorized codes

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111888

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:e37bc2cf00671e3bc4d82f2627330c0f885a6f29

commit r14-4961-ge37bc2cf00671e3bc4d82f2627330c0f885a6f29
Author: Juzhe-Zhong 
Date:   Thu Oct 26 16:13:51 2023 +0800

RISC-V: Add AVL propagation PASS for RVV auto-vectorization

This patch addresses the redundant AVL/VL toggling in RVV partial
auto-vectorization
which is a known issue for a long time and I finally find the time to
address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
 int *__restrict b,
 int *__restrict n)
{
  for (int i = 0; i < n; i++)
  a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);
 -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);  
 -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); 
 -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);
 -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple
PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector
operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such
flow):

ARM SVE:

.L3:
ld1wz30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
ld1wz31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
add z31.s, z31.s, z30.s-> un-predicated add
st1wz31.s, p7, [x0, x3, lsl 2] -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL
propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all
vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we
don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to
maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD,
COND_LEN_SUB, 
   We also need COND_LEN_EXTEND, , COND_LEN_CEIL, ... .. over 100+
patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and
clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation.
Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL
PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we
recently refactored VSETVL PASS. However, I changed my mind recently after
several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is
complicated enough and aleady has enough aggressive and fancy optimizations
which
   turns out it can always generate optimal codegen in most of the cases.
It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
 PASS become heavy and heavy again, then we will need to refactor
it again in the future.
 Actuall, the VSETVL PASS is very stable and optimal after the
recent refactoring. Hopefully, we should not change VSETVL PASS any more except
the minor
 fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2
different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done
before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV
partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and
can be very easily extended features and enhancements.
 We can easily extend and enhance more AVL propagation in a clean
and separate PASS in the future. (If we do it on VSETVL PASS, we will
complicate
 VSETVL PASS again which is already so complicated.)

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
  int *__restrict b,
  int *__restrict c,
  int *__restrict a2,
  int *__restrict b2,

[Bug target/111318] RISC-V: Redundant vsetvl instructions

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111318

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:e37bc2cf00671e3bc4d82f2627330c0f885a6f29

commit r14-4961-ge37bc2cf00671e3bc4d82f2627330c0f885a6f29
Author: Juzhe-Zhong 
Date:   Thu Oct 26 16:13:51 2023 +0800

RISC-V: Add AVL propagation PASS for RVV auto-vectorization

This patch addresses the redundant AVL/VL toggling in RVV partial
auto-vectorization
which is a known issue for a long time and I finally find the time to
address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
 int *__restrict b,
 int *__restrict n)
{
  for (int i = 0; i < n; i++)
  a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);
 -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);  
 -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); 
 -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
 -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);
 -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple
PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector
operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such
flow):

ARM SVE:

.L3:
ld1wz30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
ld1wz31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
add z31.s, z31.s, z30.s-> un-predicated add
st1wz31.s, p7, [x0, x3, lsl 2] -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL
propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all
vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we
don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to
maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD,
COND_LEN_SUB, 
   We also need COND_LEN_EXTEND, , COND_LEN_CEIL, ... .. over 100+
patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and
clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation.
Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL
PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we
recently refactored VSETVL PASS. However, I changed my mind recently after
several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is
complicated enough and aleady has enough aggressive and fancy optimizations
which
   turns out it can always generate optimal codegen in most of the cases.
It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
 PASS become heavy and heavy again, then we will need to refactor
it again in the future.
 Actuall, the VSETVL PASS is very stable and optimal after the
recent refactoring. Hopefully, we should not change VSETVL PASS any more except
the minor
 fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2
different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done
before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV
partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and
can be very easily extended features and enhancements.
 We can easily extend and enhance more AVL propagation in a clean
and separate PASS in the future. (If we do it on VSETVL PASS, we will
complicate
 VSETVL PASS again which is already so complicated.)

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
  int *__restrict b,
  int *__restrict c,
  int *__restrict a2,
  int *__restrict b2,

[Bug target/112103] [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304

2023-10-26 Thread roger at nextmovesoftware dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112103

Roger Sayle  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||roger at nextmovesoftware dot 
com
   Last reconfirmed||2023-10-26

[Bug fortran/104649] ICE in gfc_match_formal_arglist, at fortran/decl.cc:6733 since r6-1958-g4668d6f9c00d4767

2023-10-26 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104649

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org

--- Comment #5 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2023-October/059872.html

[Bug libstdc++/112089] std::shared_lock::unlock should throw operation_not_permitted instead resource_deadlock_would_occur

2023-10-26 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112089

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |11.5

--- Comment #3 from Jonathan Wakely  ---
Fixed on trunk so far.

[Bug tree-optimization/112096] `(a || b) ? a : b` should be simplified to a

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112096

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> ```
> int t01(int x, int y)
> {
>   bool t = x == 5 && y == 5;
>   if (t) return 5; return y;
> } // y
> ```
> Is able to be detected in phiopt2.  Just not the == 0/!=0 case.

r0-125639-gc9ef86a1717dd6 added that code.
https://inbox.sourceware.org/gcc-patches/01cebdbf$a3155310$e93ff930$@arm.com/

I really have not read this code in over 10 years and I forgot I even reviewed
it slightly.

Anyways I am thinking about ways of improving this even further ...
And maybe even rewriting part of it since it has got way too complex.

[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0

2023-10-26 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100

--- Comment #4 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #2)
> It would need a completely new category of "memory location that you can
> read and write to but nothing else"

That was supposed to say "read and write zero to but nothing else".

[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0

2023-10-26 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100

--- Comment #3 from Jonathan Wakely  ---
(In reply to Andrew Pinski from comment #1)
> Maybe some how libstdc++ debug mode can catch this
> https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libstdc++/manual/manual/
> debug_mode_using.html#debug_mode.using.mode
> -D_GLIBCXX_DEBUG

Only by adding a "past-the-end character is still null" check to std::string
member functions (which ones, all of them? Just accessors that would let you
read the null, like c_str, operator[], data etc.?)

That would be doable, but sounds pretty expensive.

[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0

2023-10-26 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100

--- Comment #2 from Jonathan Wakely  ---
(In reply to Jan Engelhardt from comment #0)
> ==55843==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xsomething

How would that even be possible? The terminating nul clearly has to be in
allocated memory, because you are allowed to read it. So asan can't treat it as
overflow. It's valid memory. Not only that, it's valid *writable* memory. You
are allowed to store '\0' there.

It would need a completely new category of "memory location that you can read
and write to but nothing else". That's not an asan or ubsan check.

> https://eel.is/c++draft/string.access specifies the modification of the NUL
> char's position to values other than \0 is UB, so it should warn about this.

There are hundreds of things the standard says are undefined that asan and
ubsan can never detect. It's unreasonable to expect it IMHO.

[Bug tree-optimization/112096] `(a || b) ? a : b` should be simplified to a

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112096

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #2 from Andrew Pinski  ---
```
int t01(int x, int y)
{
  bool t = x == 5 && y == 5;
  if (t) return 5; return y;
} // y
```
Is able to be detected in phiopt2.  Just not the == 0/!=0 case.

Nor:
```
int t1(int x, int y)
{
  bool t = x != 5 || y != 5;
  if (t) return x; return 5;
} // x
```
I have to look at where phiopt is able to detect this and improve it for these
2 cases ...

[Bug libstdc++/112089] std::shared_lock::unlock should throw operation_not_permitted instead resource_deadlock_would_occur

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112089

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:0c305f3dec9a992dd775a3b9607b7b1e8c051859

commit r14-4960-g0c305f3dec9a992dd775a3b9607b7b1e8c051859
Author: Jonathan Wakely 
Date:   Thu Oct 26 16:51:30 2023 +0100

libstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089]

The incorrect errc constant here looks like a copy error.

libstdc++-v3/ChangeLog:

PR libstdc++/112089
* include/std/shared_mutex (shared_lock::unlock): Change errc
constant to operation_not_permitted.
* testsuite/30_threads/shared_lock/locking/112089.cc: New test.

[Bug c/112101] feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112101

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug middle-end/112098] suboptimal optimization of inverted bit extraction

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098

--- Comment #3 from Andrew Pinski  ---
Trying 6, 7, 8 -> 9:
6: {r105:SI=r108:SI 0>>0x9;clobber flags:CC;}
  REG_DEAD r108:SI
  REG_UNUSED flags:CC
7: {r106:SI=r105:SI&0x1;clobber flags:CC;}
  REG_DEAD r105:SI
  REG_UNUSED flags:CC
8: {r107:SI=r106:SI^0x1;clobber flags:CC;}
  REG_DEAD r106:SI
  REG_UNUSED flags:CC
9: {r103:SI=r107:SI<<0x4;clobber flags:CC;}
  REG_DEAD r107:SI
  REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
(set (reg:SI 103)
(and:SI (lshiftrt:SI (xor:SI (reg:SI 108)
(const_int 512 [0x200]))
(const_int 5 [0x5]))
(const_int 16 [0x10])))
(clobber (reg:CC 17 flags))
])


The xor here maybe should have been not. But I can't remember if we allow 4->3
combining or just 4->2.

[Bug modula2/111530] Unable to build GM2 standard library on BSD due to a `getopt_long_only' GNU extension dependency

2023-10-26 Thread gaius at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111530

Gaius Mulley  changed:

   What|Removed |Added

 CC||gaius at gcc dot gnu.org

--- Comment #2 from Gaius Mulley  ---
Created attachment 56316
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56316=edit
Proposed fix

Here is a proposed patch which is currently undergoing bootstrap testing.  I
thought I'd post the proposed patch for testing and potential comments.
It uses the libiberty getopt long functions (wrapped up inside
libgm2/libm2pim/cgetopt.cc) and only enables this implementation if configure
detects no getopt_long and friends on the target.

[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100

Andrew Pinski  changed:

   What|Removed |Added

  Component|sanitizer   |libstdc++

--- Comment #1 from Andrew Pinski  ---
Maybe some how libstdc++ debug mode can catch this
https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libstdc++/manual/manual/debug_mode_using.html#debug_mode.using.mode
-D_GLIBCXX_DEBUG

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread kazeemanuar at googlemail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #8 from Kaze Emanuar  ---
This code is just an example, but I have seen this issue appear in many of my
collision functions. I agree it's not a huge issue in my use case, but it'd
still be cool to have this work well. I can work around it with inline assembly
if this is not deemed an important enough issue to address.

[Bug tree-optimization/111957] `a ? abs(a) : 0` is not simplified to just abs(a)

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111957

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #6 from Andrew Pinski  ---
fixed.

[Bug tree-optimization/111957] `a ? abs(a) : 0` is not simplified to just abs(a)

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111957

--- Comment #5 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:662655e22dddf5392d9aa67fce45beee980e5454

commit r14-4955-g662655e22dddf5392d9aa67fce45beee980e5454
Author: Andrew Pinski 
Date:   Tue Oct 24 23:13:18 2023 +

match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]

This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu
instead
of abs.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111957

gcc/ChangeLog:

* match.pd (`a != C1 ? abs(a) : C2`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-40.c: New test.

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #7 from Andrew Pinski  ---
Also is this function from real code or just an example to show the issue?
I suspect in real code you either have 2 extra nops or a scheduling bubble. the
nops might not make a huge difference ...

[Bug testsuite/111969] RISC-V rv32gcv: 12 grouped flaky failures

2023-10-26 Thread patrick at rivosinc dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111969

--- Comment #6 from Patrick O'Neill  ---
Mixed up my hashes when copy/pasting.
r14-4875-g9cf2e7441ee passes locally/CI

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #6 from Andrew Pinski  ---
It just happened the scheduler didn't schedule it that way. Scheduling is NP
complete problem too.

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #5 from Andrew Pinski  ---
/* True if mflo and mfhi can be immediately followed by instructions
   which write to the HI and LO registers.

   According to MIPS specifications, MIPS ISAs I, II, and III need
   (at least) two instructions between the reads of HI/LO and
   instructions which write them, and later ISAs do not.  Contradicting
   the MIPS specifications, some MIPS IV processor user manuals (e.g.
   the UM for the NEC Vr5000) document needing the instructions between
   HI/LO reads and writes, as well.  Therefore, we declare only MIPS32,
   MIPS64 and later ISAs to have the interlocks, plus any specific
   earlier-ISA CPUs for which CPU documentation declares that the
   instructions are really interlocked.  */
#define ISA_HAS_HILO_INTERLOCKS (mips_isa_rev >= 1  \
 || TARGET_MIPS5500 \
 || TARGET_MIPS5900 \
 || TARGET_LOONGSON_2EF)


So the question becomes what are you compiling for?

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread kazeemanuar at googlemail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #4 from Kaze Emanuar  ---
I'm using the vr4300 (Nintendo 64). It does have the hazard between mult and
mflos. MULT can't be within 2 instructions of the MFLO. This shouldn't be an
issue here though since there were 3 instructions available to put into the 2
NOP slots the MULT<>MFLO clash caused.

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #3 from Andrew Pinski  ---
Which mips arch are you really trying to compile for?
Mips 1, 2, 4 or mips32 (r1-r5 or r6).
There are many different ones and mips32 (and above) does not have any delay
slots/hazards for the mult instruction.

[Bug target/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #2 from Andrew Pinski  ---
-march=mips32r2 removes the nops. Iirc there was a hazard between the mflo and
mult instructions for older architectures.

[Bug rtl-optimization/111971] [12/13/14 regression] ICE: maximum number of generated reload insns per insn achieved (90) since r12-6803-g85419ac59724b7

2023-10-26 Thread vmakarov at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

--- Comment #6 from Vladimir Makarov  ---
(In reply to Andrew Pinski from comment #4)
> But r1 is the argument register.

It is even worse, r1 is a stack pointer.  Still the compilation should not
finish by LRA failure.

I've just started to work on this problem. I hope a patch fixing this will be
committed on this week or at the beginning of the next week.

[Bug middle-end/111632] gcc fails to bootstrap when using libc++

2023-10-26 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632

--- Comment #6 from Sam James  ---
Try replying to the patch with 'ping'. I'm not a reviewer, but it both LGTM and
we're using it in Gentoo with no reported problems.

[Bug middle-end/111632] gcc fails to bootstrap when using libc++

2023-10-26 Thread dimitry at andric dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632

--- Comment #5 from Dimitry Andric  ---
Is there any further action required to get this patch in? :)

[Bug target/112103] New: [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304

2023-10-26 Thread seurer at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112103

Bug ID: 112103
   Summary: [14 regression] gcc.target/powerpc/rlwinm-0.c fails
after r14-4941-gd1bb9569d70304
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:d1bb9569d7030490fe7bb35af432f934560d689d, r14-4941-gd1bb9569d70304
make  -k check-gcc RUNTESTFLAGS="powerpc.exp=gcc.target/powerpc/rlwinm-0.c"
FAIL: gcc.target/powerpc/rlwinm-0.c scan-assembler-times (?n)^\\s+rldicl 3081
FAIL: gcc.target/powerpc/rlwinm-0.c scan-assembler-times (?n)^\\s+rlwinm 3093
# of expected passes5
# of unexpected failures2

These changes in code output are OK as neither the original rlwinm nor the
rldicl actually have any effect.  So in the short term the test case just needs
to update its instruction counts.  We are tracking something to get rid of the
extraneous ops later.

seurer@ltcden2-lp1:~/gcc/git/build/gcc-test$ diff rlwinm-0.s.r14-4940
rlwinm-0.s.r14-4941
5371c5371
<   rlwinm 3,3,0,0x
---
>   rldicl 3,3,0,32
6089c6089
<   rlwinm 3,3,0,0xff
---
>   rldicl 3,3,0,32
8959c8959
<   rlwinm 3,3,0,0x
---
>   rldicl 3,3,0,32
9677c9677
<   rlwinm 3,3,0,0xff
---
>   rldicl 3,3,0,32
12546c12546
<   rlwinm 3,3,0,0x
---
>   rldicl 3,3,0,32
13264c13264
<   rlwinm 3,3,0,0xff
---
>   rldicl 3,3,0,32
16131c16131
<   rlwinm 3,3,0,0x
---
>   rldicl 3,3,0,32
19715c19715
<   rlwinm 3,3,0,0x
---
>   rldicl 3,3,0,32
23298c23298
<   rlwinm 3,3,0,0x
---
>   rldicl 3,3,0,32


commit d1bb9569d7030490fe7bb35af432f934560d689d (HEAD)
Author: Roger Sayle 
Date:   Thu Oct 26 10:06:59 2023 +0100

PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

[Bug c++/100470] std::is_nothrow_move_constructible incorrect behavior for explicitly defaulted members

2023-10-26 Thread johelegp at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100470

Johel Ernesto Guerrero Peña  changed:

   What|Removed |Added

 CC||johelegp at gmail dot com

--- Comment #6 from Johel Ernesto Guerrero Peña  ---
Is this a duplicate of Bug 96090?

[Bug testsuite/109951] [14 Regression] libgomp, testsuite: non-native multilib c++ tests fail on Darwin.

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951

--- Comment #14 from CVS Commits  ---
The master branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:d8ff4b96b4be3bb4346c045bd0a7337079eabf90

commit r14-4949-gd8ff4b96b4be3bb4346c045bd0a7337079eabf90
Author: Thomas Schwinge 
Date:   Mon Sep 11 11:36:31 2023 +0200

libatomic: Consider '--with-build-sysroot=[...]' for target libraries'
build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries'
build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.

PR testsuite/109951
libatomic/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libatomic.exp (libatomic_init): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
set.
(SYSROOT_CFLAGS_FOR_TARGET): Set.

[Bug testsuite/109951] [14 Regression] libgomp, testsuite: non-native multilib c++ tests fail on Darwin.

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:967d4171b2eb0557e86ba28996423353f0f1b141

commit r14-4948-g967d4171b2eb0557e86ba28996423353f0f1b141
Author: Thomas Schwinge 
Date:   Mon Sep 11 10:50:00 2023 +0200

libffi: Consider '--with-build-sysroot=[...]' for target libraries'
build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries'
build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.

PR testsuite/109951
libffi/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
set 'SYSROOT_CFLAGS_FOR_TARGET'.
* Makefile.in: Regenerate.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libffi.exp (libffi_target_compile): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.

[Bug c++/112099] GCC doesn't recognize matching friend operator!= to resolve ambiguity in operator==

2023-10-26 Thread daniel.kruegler at googlemail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112099

Daniel Krügler  changed:

   What|Removed |Added

 CC||daniel.kruegler@googlemail.
   ||com

--- Comment #1 from Daniel Krügler  ---
This could be related to https://cplusplus.github.io/CWG/issues/2804.html

[Bug c/112102] Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread kazeemanuar at googlemail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

--- Comment #1 from Kaze Emanuar  ---
Ignore the line about cycle counts. That was only applicable to my use case
before I realized GCC does this for all MIPS architectures. Sorry!

[Bug c++/101631] gcc allows for the changing of an union active member to be changed via a reference

2023-10-26 Thread ppalka at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101631

Patrick Palka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||ppalka at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |14.0

--- Comment #7 from Patrick Palka  ---
Marking this fixed for GCC 14 then, thanks!

[Bug c/112102] New: Inefficient Integer multiplication on MIPS processors

2023-10-26 Thread kazeemanuar at googlemail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102

Bug ID: 112102
   Summary: Inefficient Integer multiplication on MIPS processors
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kazeemanuar at googlemail dot com
  Target Milestone: ---

Running integer multiplication with the -Os flag enabled can generate 2
unnecessary NOP instructions. This increases the cost of integer multiplication
from 7 to 9 cycles in most cases.


Example code:
int test(int a, int b, int c, int d)
{
 return 788*a + 789 * b + 187 + c + d;
}

output:
li  $2,788
mult$4,$2
li  $2,789  <--- could be moved down into one of the NOPs
mflo$4
nop
nop
mult$5,$2
mflo$5
addu$4,$4,$5
addiu   $4,$4,187 <--- could be moved up into one of the NOPs
addu$4,$4,$6  <--- could be moved up into one of the NOPs
jr  $31
addu$2,$4,$7

This happens on all GCC versions as far as I can tell.
Compiler explorer link: https://godbolt.org/z/M3x3s3KhM

[Bug c/112101] feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments

2023-10-26 Thread malekwryyy at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112101

--- Comment #1 from Abdulmalek Almkainzi  ---
correction for the gurantee_type macro:

#define gurantee_type(exp, type) \
_Generic(exp, type: exp, default: (type){0})

[Bug c/112101] New: feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments

2023-10-26 Thread malekwryyy at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112101

Bug ID: 112101
   Summary: feature request: typeof_arg for extracting the type of
a function's (or function pointer's) arguments
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: malekwryyy at gmail dot com
  Target Milestone: ---

C23 will add typeof (although gcc had a for a while) which gives the type of an
expression or a type. By using it, it is possible to get the return type of a
function like so:

```
int func();

typeof(func()) x; // int x; 
```

But there's no way to extract the type of the argument of a function:

```
void func(int);

?? x;
```
I think something like 'typeof_arg' would be a good addition. 

It takes 2 operands, first is function or function pointer, and second is an
integer constant for the index of the argument, which must be within [0,
arg_count).

for example:
```
#define print_func(f) \
printf(#f \
"(" \
_Generic( (__typeof_arg(f, 0)){0}, \
int: "int", \
long:"long", \
float:   "float", \
char*:   "char*", \
default: "other ") \
")")
```
this would print a single-argument function's name and arg type like this
"puts(char*)".

another example:
```
#define gurantee_type(exp, type) \
_Generic(exp, type: exp, default: (typeof(exp)){0})

#define call_with_empty(f) \
_Generic( (__typeof_arg(f, 0)){0}, \
char*: gurantee_type(f, void(*)(char*))(""), \
default: f( (__typeof_arg(f, 0)){0} ) \
)
```
which calls the function 'f' with empty string if it takes char*, or 0 of the
correct type otherwise.

this wouldn't work for variadic functions, so __typeof_arg(printf, 1) would be
an error.

I think a feature like this would be really helpful for generic programming in
C

[Bug sanitizer/112100] New: ubsan: misses UB when modifying std::string's trailing \0

2023-10-26 Thread jengelh at inai dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100

Bug ID: 112100
   Summary: ubsan: misses UB when modifying std::string's trailing
\0
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jengelh at inai dot de
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

Input:

#include 
int main()
{
std::string s="fooo";
s[s.size()] = 0xff;
}

Observed:
$  g++ x.cpp -v -Wall -ggdb3 -fsanitize=undefined,address && ./a.out
gcc version 13.2.1 20230912 [revision b96e66fd4ef3e36983969fb8cdd1956f551a074b]
(SUSE Linux) 

(no runtime output by executable)

Expected:

==55843==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xsomething

https://eel.is/c++draft/string.access specifies the modification of the NUL
char's position to values other than \0 is UB, so it should warn about this.

[Bug c++/112099] New: GCC doesn't recognize matching friend operator!= to resolve ambiguity in operator==

2023-10-26 Thread usaxena95 at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112099

Bug ID: 112099
   Summary: GCC doesn't recognize matching friend operator!= to
resolve ambiguity in operator==
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: usaxena95 at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/13T5vhETK
```cpp
struct S {
operator int();
friend bool operator==(const S &, int);
friend bool operator!=(const S &, int);
};

struct A : S {};
struct B : S {};

bool x = A{} == B{}; // ambiguous!!
```

Adding a decl for `operator!=` to the **namespace scope** makes it work fine:
https://godbolt.org/z/zzGWxb9zG
```cpp
struct S {
operator int();
friend bool operator==(const S &, int);
friend bool operator!=(const S &, int);
};

bool operator!=(const S &, int);

struct A : S {};
struct B : S {};

bool x = A{} == B{};
```

According to [P2468R2 - The Equality Operator You Are Looking
For](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2468r2.html)

""" 
A non-template function or function template F named operator== is a rewrite
target with first operand o unless a search for the name operator!= in the
scope S from the instantiation context of the operator expression finds a
function or function template that would correspond ([basic.scope.scope]) to F
if its name were operator==, **where S is the scope of the class type of o if F
is a class member, and the namespace scope of which F is a member otherwise**.
A function template specialization named operator== is a rewrite target if its
function template is a rewrite target.
"""

It feels like for `friend` functions (which are class members), `S` is
namespace scope. A lookup in the namespace scope does not find a matching
`operator!=` unless declared outside the class scope.

The need to add a re-decl of the friend outside of class scope looks
unreasonable to me. This looks to me like an oversight in this paper OR this is
a compiler bug and namespace lookup should actually find the `friend operator!=
in class scope` and resolve the ambiguity.

[Bug middle-end/112098] suboptimal optimization of inverted bit extraction

2023-10-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2023-10-26
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Mine.
The problem is more simplified than that.
We recognize: `(A & C) != 0 ? D : 0` But not `(A & C) == 0 ? D : 0`

Also the order of match causes issues for

unsigned int foo_ (unsigned int x)
{
  int t = x & 0x200;
  if (t) return 0x10;
  return 0;
}

/* A few simplifications of "a ? CST1 : CST2". */

And a few other issues too.

[Bug fortran/67740] Wrong association status of allocatable character pointer in derived types

2023-10-26 Thread baradi09 at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67740

--- Comment #14 from Bálint Aradi  ---
Thanks a lot for fixing it!

[Bug tree-optimization/111520] [14 Regression] ICE: verify_flow_info failed (error: probability of edge 3->8 not initialized) with -O -fsignaling-nans -fharden-compares -fnon-call-exceptions

2023-10-26 Thread aoliva at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111520

Alexandre Oliva  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Alexandre Oliva  ---
Fixed.

[Bug tree-optimization/111943] ICE in gimple_split_edge, at tree-cfg.cc:3019 on 20050510-1.c with new -fharden-control-flow-redundancy with computed gotos

2023-10-26 Thread aoliva at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111943

Alexandre Oliva  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-10-26
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |aoliva at gcc dot 
gnu.org

--- Comment #1 from Alexandre Oliva  ---
Created attachment 56315
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56315=edit
candidate patch under test

Mine.  Thanks for the report, testing a fix.

[Bug middle-end/112098] suboptimal optimization of inverted bit extraction

2023-10-26 Thread bruno at clisp dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098

--- Comment #1 from Bruno Haible  ---
The code that gets executed inside gcc is maybe the one mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109907#c2 .

[Bug middle-end/112098] New: suboptimal optimization of inverted bit extraction

2023-10-26 Thread bruno at clisp dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098

Bug ID: 112098
   Summary: suboptimal optimization of inverted bit extraction
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bruno at clisp dot org
  Target Milestone: ---

gcc optimizes quite well a bit extraction such as

-- foo.c --
unsigned int foo (unsigned int x)
{
  return (x & 0x200 ? 0x10 : 0);
}
---

$ gcc -O2 -S foo.c && cat foo.s
...
shrl$5, %eax
andl$16, %eax
...
That is perfect: 2 arithmetic instructions.

However, for the inverted bit extraction
== foo.c ==
unsigned int foo (unsigned int x)
{
  return (x & 0x200 ? 0 : 0x10);
}
===

the resulting code has 4 arithmetic instructions:

$ gcc -O2 -S foo.c && cat foo.s
...
shrl$9, %eax
xorl$1, %eax
andl$1, %eax
sall$4, %eax
...

Very clearly, the last shift instruction could be saved by transforming this
code to

...
shrl$5, %eax
xorl$16, %eax
andl$16, %eax
...

clang 16 even replaces the "xorl $16, %eax" instruction with a "notl %eax". So,
the optimal instruction sequence is one of
...
shrl$5, %eax
notl%eax
andl$16, %eax
...
or
...
notl%eax
shrl$5, %eax
andl$16, %eax
...

$ gcc --version
gcc (GCC) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.

This is for x86_64. But similar optimization opportunities exist for other CPUs
as well.
For example, arm:

...
lsr r0, r0, #9
eor r0, r0, #1
and r0, r0, #1
lsl r0, r0, #4
...
which can be optimized to
...
lsr r0, r0, #5
eor r0, r0, #16
and r0, r0, #16
...

Or for sparc64:

...
and %o0, 512, %o0
cmp %g0, %o0
subx%g0, -1, %o0
sll %o0, 4, %o0
jmp %o7+8
 srl%o0, 0, %o0
...
which can be optimized to
...
xnor%g0, %o0, %o0
srl %o0, 5, %o0
jmp %o7+8
 and%o0, 16, %o0
...

[Bug libstdc++/112097] _PSTL_EARLYEXIT_PRESENT macro doesn't correctly identify intel compilers.

2023-10-26 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112097

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-26
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
This is set in  which does:

#if defined(__INTEL_COMPILER) && __INTEL_COMPILER >= 1800
#   define _PSTL_EARLYEXIT_PRESENT
#   define _PSTL_MONOTONIC_PRESENT
#endif

That was written by Intel, but maybe before icc was replaced by icx.

The relevant macros for icx are:

#define __INTEL_CLANG_COMPILER 20230200
#define __INTEL_LLVM_COMPILER 20230200

[Bug fortran/67740] Wrong association status of allocatable character pointer in derived types

2023-10-26 Thread pault at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67740

Paul Thomas  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Paul Thomas  ---
Fixed on 13-branch and trunk.

Thanks for the report

Paul

[Bug fortran/67740] Wrong association status of allocatable character pointer in derived types

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67740

--- Comment #12 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Paul Thomas :

https://gcc.gnu.org/g:6fb12d3a0456a3503a670d95803aef10549f0134

commit r13-7986-g6fb12d3a0456a3503a670d95803aef10549f0134
Author: Paul Thomas 
Date:   Thu Oct 12 07:26:59 2023 +0100

Fortran: Set hidden string length for pointer components [PR67740].

2023-10-11  Paul Thomas  

gcc/fortran
PR fortran/67740
* trans-expr.cc (gfc_trans_pointer_assignment): Set the hidden
string length component for pointer assignment to character
pointer components.

gcc/testsuite/
PR fortran/67740
* gfortran.dg/pr67740.f90: New test

(cherry picked from commit 701363d827d45d3e3601735fa42f95644fda8b64)

[Bug rtl-optimization/91865] Combine misses opportunity to remove (sign_extend (zero_extend)) before searching for insn patterns

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91865

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:d1bb9569d7030490fe7bb35af432f934560d689d

commit r14-4941-gd1bb9569d7030490fe7bb35af432f934560d689d
Author: Roger Sayle 
Date:   Thu Oct 26 10:06:59 2023 +0100

PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
  REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
  REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]

which results in the following code:

foo:AND #0xff, R12
RLAM.A #4, R12 { RRAM.A #4, R12
RLAM.A  #1, R12
MOVX.W  table(R12), R12
RETA

With this patch, we now see:

Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
  REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
  REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
(zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:MOV.B   R12, R12
RLAM.A  #1, R12
MOVX.W  table(R12), R12
RETA

2023-10-26  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.

[Bug fortran/104625] ICE in fixup_array_ref, at fortran/resolve.cc:9275 since r10-2912-g70570ec192745095

2023-10-26 Thread pault at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104625

--- Comment #8 from Paul Thomas  ---
(In reply to anlauf from comment #6)
> Steve Lionel of Intel confirmed that the code is valid, and that if X is
> polymorphic, so is (X):
> 
> community.intel.com/t5/Intel-Fortran-Compiler/SELECT-TYPE-statement-and-
> parenthesized-selector/m-p/1537256#M168843

Indeed:

"R1105: selector is expr or variable" is totally unambiguous.

I will post the latest version of the patch at end-of-play today. I have dealt
with nested parentheses but find that array references to 'z' in the original
testcase are generating "unclassifiable statement" errors.

I have also corrected the error message generated when 'z' is put in an
assignment context from, "‘z’ at (1) associated to vector-indexed target cannot
be used in a variable definition context (assignment)" to "‘z’ at (1)
associated to expression cannot be used in a variable definition context
(assignment)", simply by checking for vector-indexing at expr.cc:6477.

Cheers

Paul

[Bug fortran/104625] ICE in fixup_array_ref, at fortran/resolve.cc:9275 since r10-2912-g70570ec192745095

2023-10-26 Thread pault at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104625

--- Comment #7 from Paul Thomas  ---
(In reply to anlauf from comment #6)
> Steve Lionel of Intel confirmed that the code is valid, and that if X is
> polymorphic, so is (X):
> 
> community.intel.com/t5/Intel-Fortran-Compiler/SELECT-TYPE-statement-and-
> parenthesized-selector/m-p/1537256#M168843

Indeed:

"R1105: selector is expr or variable" is totally unambiguous.

I will post the latest version of the patch at end-of-play today. I have dealt
with nested parentheses but find that array references to 'z' in the original
testcase are generating "unclassifiable statement" errors.

I have also corrected the error message generated when 'z' is put in an
assignment context from, "‘z’ at (1) associated to vector-indexed target cannot
be used in a variable definition context (assignment)" to "‘z’ at (1)
associated to expression cannot be used in a variable definition context
(assignment)", simply by checking for vector-indexing at expr.cc:6477.

Cheers

Paul

[Bug libstdc++/112097] New: _PSTL_EARLYEXIT_PRESENT macro doesn't correctly identify intel compilers.

2023-10-26 Thread denis.yaroshevskij at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112097

Bug ID: 112097
   Summary: _PSTL_EARLYEXIT_PRESENT macro doesn't correctly
identify intel compilers.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: denis.yaroshevskij at gmail dot com
  Target Milestone: ---

simd first code: 

https://github.com/gcc-mirror/gcc/blob/be34a8b538c0f04b11a428bd1a9340eb19dec13f/libstdc%2B%2B-v3/include/pstl/unseq_backend_simd.h#L164C13-L164C36

find_if(std::unseq on icx goes to the block = 8 part

https://godbolt.org/z/6fdT4j4cz

despite it supporting the `#pragma omp simd early_exit`
https://godbolt.org/z/Yre19vxdG

[Bug target/111828] rs6000: Parse inline asm string to figure out it requires HTM feature or not.

2023-10-26 Thread linkw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111828

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-10-26

[Bug middle-end/111942] ICE in rtl_split_edge, at cfgrtl.cc:1943 on pr98096.c with new -fharden-control-flow-redundancy with asm goto

2023-10-26 Thread aoliva at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111942

Alexandre Oliva  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-10-26
 Ever confirmed|0   |1

--- Comment #1 from Alexandre Oliva  ---
Thanks for the report.

This latent bug is independent from -fharden-control-flow-redundancy.

The issue is that volatile asm stmts are considered throw points when
-fnon-call-exceptions is enabled.

I'm not sure where C++ wires non-call exceptions to enclosing handlers, but it
appears that it doesn't: even with a handler added around f's body, no EH edges
are added.

However, inlining f() into another function with a handler for the call will
wire all escaping exceptions to that handler, creating the same arrangement of
3 outgoing edges (fallthrough, asm jmp, and EH) that the rtl splitter barfs at.

/* compile with -fnon-call-exceptions */
int i, j;
int f(void) {
  asm goto ("# %0 %2" : "+r" (i) ::: jmp);
  i += 2;
  asm goto ("# %0 %1 %l[jmp]" : "+r" (i), "+r" (j) ::: jmp);
 jmp: return i;
}
int inline __attribute__ ((__always_inline__)) f(void);
int g(void) {
  try {
return f();
  } catch (...) {
i++;
throw;
  }
}

./xgcc -B./ pr98096.cc -fno-harden-control-flow-redundancy
-fnon-call-exceptions
during RTL pass: expand
pr98096.cc: In function ‘int g()’:
pr98096.cc:19:1: internal compiler error: in rtl_split_edge, at cfgrtl.cc:1943
   19 | }

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #6 from JuzheZhong  ---

> I have troubles chasing one down and the source code is so
> convoluted with macros I can't even find the implementation.

I am sorry for causing confusion to you here.

But because of the RVV fusion rules are so complicated, we define it in

riscv-vsetvl.def. To understand the codes, I suggest you directly read the
riscv-vsetvl.def

We define all compatible, fusion, available rules there.

For example, vle16.v (e16, m1 ) is compatible with vadd.vv (e32, mf2 ),
In this case, adjacent 2 instructions "vle16" (e16m1) and vadd.vv (e32mf2) can
have the same vsetvl (vsetvl e32mf2).

Wheras vsub.vv(e16,m1) and vadd (e32 mf2), they are not compatible.

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

--- Comment #5 from JuzheZhong  ---
Yes. I am agree that some arch prefer agnostic than undisturbed even with more
vsetvls. That's why I have post PR for asking whether we can have a option like

-mprefer-agosnotic.

https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/37


But I think Maciej is worrying about why GCC fuse vsetvl, and change

e16mf2 vsetvl into e32m1.


For example:

https://godbolt.org/z/6G9G7Pbe9

No 'TU' included.

I think LLVM codegen looks more reasonable:

beqza5, .LBB0_4
vsetvli a1, a6, e32, m1, ta, ma
beqza4, .LBB0_3
.LBB0_2:# =>This Inner Loop Header: Depth=1
vsetvli zero, a1, e32, m1, ta, ma
vle32.v v8, (a0)
vadd.vv v8, v8, v8
addia4, a4, -1
vse32.v v8, (a3)
bneza4, .LBB0_2
.LBB0_3:
ret
.LBB0_4:
sraia1, a6, 2
vsetvli a1, a1, e16, mf2, ta, ma
bneza4, .LBB0_2
j   .LBB0_3

But GCC is correct with optimizations:

foo(int*, int*, int*, int*, unsigned long, int, int):
beq a5,zero,.L2
vsetvli a5,a6,e32,m1,ta,ma
.L3:
beq a4,zero,.L10
li  a2,0
.L5:
vle32.v v1,0(a0)
addia2,a2,1
vadd.vv v1,v1,v1
vse32.v v1,0(a3)
bne a4,a2,.L5
.L10:
ret
.L2:
sraiw   a5,a6,2
vsetvli zero,a5,e32,m1,ta,ma
j   .L3

[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c

2023-10-26 Thread kito at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #4 from Kito Cheng  ---
The testcase it self is look like tricky but right, 
it typically could use to optimize mixed-width (mixed-SEW) operations,

You can refer to the EEW stuffs in v-spec[1], most load store has encoding
static-EEW and then could apply such vsetvli fusion optimization.

[1]
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands

Give a (more) practical example here:

```c
#include "riscv_vector.h"

void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, int
cond, int avl) {
size_t vl = __riscv_vsetvl_e16mf2(avl);
vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
vint16mf2_t b = __riscv_vle16_v_i16mf2(in2, vl);
vint16mf2_t c = __riscv_vle16_v_i16mf2(in3, vl);
vint32m1_t x = __riscv_vwmacc_vv_i32m1(a, b, c, vl);
__riscv_vse32_v_i32m1(out, x, vl);
}

```

> Is is guaranteed by the RVV specification that the value of `vl' produced
> (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.;
> I presume implicitly via the VL CSR as I can't see it in actual assembly
> produced) is going to be the same for all microarchitectures for both:
>
>   vsetvli zero,a6,e32,m1,tu,ma
>
>and:
>
>   vsetvli zero,a6,e16,mf2,ta,ma

This is another trick in this case: tail agnostic vs tail undisturbed

tail undisturbed has stronger semantic than tail agnostic, so using tail
undisturbed for agnostic is always safe and satisfied the semantic, same for
mask agnostic vs mask undisturbed.

But performance is another story, as I know some uArch implement agnostic as
undisturbed, which means agnostic or undisturbed no much difference, so fuse
those two vsetvli is become kind of optimization.

However you could imagine, that also means some uArch is implement agnostic in
another way: agnostic MAY has better performance than undisturbed, we should
not fuse those vsetvli IF we are targeting such target, anyway, our cost model
for RVV still in an initial states, so personally I am fine with that for now,
but I guess we need add some more stuff to -mtune to handle those difference.

[Bug tree-optimization/111520] [14 Regression] ICE: verify_flow_info failed (error: probability of edge 3->8 not initialized) with -O -fsignaling-nans -fharden-compares -fnon-call-exceptions

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111520

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Alexandre Oliva :

https://gcc.gnu.org/g:33d38b431cced81e575b1d17d36cb9e43d64b02b

commit r14-4936-g33d38b431cced81e575b1d17d36cb9e43d64b02b
Author: Alexandre Oliva 
Date:   Thu Oct 26 03:06:09 2023 -0300

set hardcmp eh probs

Set execution count of EH blocks, and probability of EH edges.


for  gcc/ChangeLog

PR tree-optimization/111520
* gimple-harden-conditionals.cc
(pass_harden_compares::execute): Set EH edge probability and
EH block execution count.

for  gcc/testsuite/ChangeLog

PR tree-optimization/111520
* g++.dg/torture/harden-comp-pr111520.cc: New.

82 matches

Mail list logo