Re: LRA for avr: Handling hard regs set directly at expand

2023-08-02 Thread SenthilKumar.Selvaraj--- via Gcc
On Wed, 2023-08-02 at 12:54 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/17/23 07:33, senthilkumar.selva...@microchip.com wrote:
> > Hi,
> > 
> >The avr target has a bunch of patterns that directly set hard regs at 
> > expand time, like so
> > 
> > (define_expand "cpymemhi"
> >[(parallel [(set (match_operand:BLK 0 "memory_operand" "")
> > (match_operand:BLK 1 "memory_operand" ""))
> >(use (match_operand:HI 2 "const_int_operand" ""))
> >(use (match_operand:HI 3 "const_int_operand" ""))])]
> >""
> >{
> >  if (avr_emit_cpymemhi (operands))
> >DONE;
> > 
> >  FAIL;
> >})
> > 
> > where avr_emit_cpymemhi generates
> > 
> > (insn 14 13 15 4 (set (reg:HI 30 r30)
> >  (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 15 14 16 4 (set (reg:HI 26 r26)
> >  (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 16 15 17 4 (parallel [
> >  (set (mem:BLK (reg:HI 26 r26) [0  A8])
> >  (mem:BLK (reg:HI 30 r30) [0  A8]))
> >  (unspec [
> >  (const_int 0 [0])
> >  ] UNSPEC_CPYMEM)
> >  (use (reg:QI 52))
> >  (clobber (reg:HI 26 r26))
> >  (clobber (reg:HI 30 r30))
> >  (clobber (reg:QI 0 r0))
> >  (clobber (reg:QI 52))
> >  ]) "pr53505.c":21:22 -1
> >   (nil))
> > 
> > Classic reload knows about these - find_reg masks out bad_spill_regs, and 
> > bad_spill_regs
> > when ORed with chain->live_throughout in order_regs_for_reload picks up r30.
> > 
> > LRA, however, appears to not consider that, and proceeds to use such regs 
> > as reload regs.
> > For the same source, it generates
> > 
> >   Choosing alt 0 in insn 15:  (0) =r  (1) r {*movhi_split}
> >Creating newreg=70, assigning class GENERAL_REGS to r70
> > 15: r26:HI=r70:HI
> >REG_EQUAL r28:HI+0x1
> >  Inserting insn reload before:
> > 58: r70:HI=r28:HI+0x1
> > 
> > Choosing alt 3 in insn 58:  (0) d  (1) 0  (2) nYnn {*addhi3_split}
> >Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71
> > 58: r71:HI=r71:HI+0x1
> >  Inserting insn reload before:
> > 59: r71:HI=r28:HI
> >  Inserting insn reload after:
> > 60: r70:HI=r71:HI
> > 
> > ** Assignment #1: **
> > 
> >Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, 
> > tfreq=3000)...
> >  Assign 30 to reload r71 (freq=3000)
> >   Hard reg 26 is preferable by r70 with profit 1000
> >   Hard reg 30 is preferable by r70 with profit 1000
> >Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, 
> > tfreq=2000)...
> >  Assign 30 to reload r70 (freq=2000)
> > 
> > 
> > (insn 14 13 59 3 (set (reg:HI 30 r30)
> >  (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 
> > {*movhi_split}
> >   (nil))
> > (insn 59 14 58 3 (set (reg:HI 30 r30 [70])
> >  (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split}
> >   (nil))
> > (insn 58 59 15 3 (set (reg:HI 30 r30 [70])
> >  (plus:HI (reg:HI 30 r30 [70])
> >  (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split}
> >   (nil))
> > (insn 15 58 16 3 (set (reg:HI 26 r26)
> >  (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split}
> >   (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28)
> >  (const_int 1 [0x1]))
> >  (nil)))
> > (insn 16 15 17 3 (parallel [
> >  (set (mem:BLK (reg:HI 26 r26) [0  A8])
> >  (mem:BLK (reg:HI 30 r30) [0  A8]))
> >  (unspec [
> >  (const_int 0 [0])
> >  ] UNSPEC_CPYMEM)
> >  (use (reg:QI 22 r22 [52]))
> >  (clobber (reg:HI 26 r26))
> >  (clobber (reg:HI 30 r30))
> >  (clobber (reg:QI 0 r0))
> >  (clobber (reg:QI 22 r22 [52]))
> >  ]) "pr53505.c":21:22 132 {cpymem_qi}
> >   (nil))
> > 
> > LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution
> > failure down the line.
> > 
> > How should the avr backend deal with this?
> > 
> Sorry for the big delay with the answer.  I was on vacation.
> 
> There are probably some ways to fix it by changing patterns as other
> people suggested but I'd like to see the current patterns work for LRA
> as well.
> 
> Could you send me the test case on which I could reproduce the problem
> and work on implementing such functionality.
> 
> 
Thanks for taking your time to look at this.

To reproduce the behavior, apply the below patch on master

diff --git gcc/config/avr/avr.cc gcc/config/avr/avr.cc
index 25f3f4c22e0..a9ab8259339 100644
--- gcc/config/avr/avr.cc
+++ gcc/config/avr/avr.cc
@@ -1574,6 +1574,9 @@ avr_allocate_stack_slots_for_args 

Re: [x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword.

2023-08-02 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 3, 2023 at 12:18 AM Roger Sayle  wrote:
>
>
> This patch is a conservative fix for PR target/110792, a wrong-code
> regression affecting doubleword rotations by BITS_PER_WORD, which
> effectively swaps the highpart and lowpart words, when the source to be
> rotated resides in memory. The issue is that if the register used to
> hold the lowpart of the destination is mentioned in the address of
> the memory operand, the current define_insn_and_split unintentionally
> clobbers it before reading the highpart.
>
> Hence, for the testcase, the incorrectly generated code looks like:
>
> salq$4, %rdi// calculate address
> movqWHIRL_S+8(%rdi), %rdi   // accidentally clobber addr
> movqWHIRL_S(%rdi), %rbp // load (wrong) lowpart
>
> Traditionally, the textbook way to fix this would be to add an
> explicit early clobber to the instruction's constraints.
>
>  (define_insn_and_split "32di2_doubleword"
> - [(set (match_operand:DI 0 "register_operand" "=r,r,r")
> + [(set (match_operand:DI 0 "register_operand" "=r,r,")
> (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o")
>(const_int 32)))]
>
> but unfortunately this currently generates significantly worse code,
> due to a strange choice of reloads (effectively memcpy), which ends up
> looking like:
>
> salq$4, %rdi// calculate address
> movdqa  WHIRL_S(%rdi), %xmm0// load the double word in SSE reg.
> movaps  %xmm0, -16(%rsp)// store the SSE reg back to the
> stack
> movq-8(%rsp), %rdi  // load highpart
> movq-16(%rsp), %rbp // load lowpart
>
> Note that reload's "&" doesn't distinguish between the memory being
> early clobbered, vs the registers used in an addressing mode being
> early clobbered.
>
> The fix proposed in this patch is to remove the third alternative, that
> allowed offsetable memory as an operand, forcing reload to place the
> operand into a register before the rotation.  This results in:
>
> salq$4, %rdi
> movqWHIRL_S(%rdi), %rax
> movqWHIRL_S+8(%rdi), %rdi
> movq%rax, %rbp
>
> I believe there's a more advanced solution, by swapping the order of
> the loads (if first destination register is mentioned in the address),
> or inserting a lea insn (if both destination registers are mentioned
> in the address), but this fix is a minimal "safe" solution, that
> should hopefully be suitable for backporting.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-08-02  Roger Sayle  
>
> gcc/ChangeLog
> PR target/110792
> * config/i386/i386.md (ti3): For rotations by 64 bits
> place operand in a register before gen_64ti2_doubleword.
> (di3): Likewise, for rotations by 32 bits, place
> operand in a register before gen_32di2_doubleword.
> (32di2_doubleword): Constrain operand to be in register.
> (64ti2_doubleword): Likewise.
>
> gcc/testsuite/ChangeLog
> PR target/110792
> * g++.target/i386/pr110792.C: New 32-bit C++ test case.
> * gcc.target/i386/pr110792.c: New 64-bit C test case.

OK.

Thanks,
Uros.
>
>
> Thanks in advance,
> Roger
> --
>


[Bug tree-optimization/51049] A regression caused by "Improve handling of conditional-branches on targets with high branch costs"

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51049

--- Comment #2 from Andrew Pinski  ---
So at -O1 we get what we expect:
```
  _1 = *i_4(D);
  _7 = j_5(D) != 2;
  _8 = _1 != 0;
  _9 = _7 & _8;
  if (_9 != 0)
goto ; [43.56%]
  else
goto ; [56.44%]

   [local count: 467721933]:
  _6 = (int) _1;

   [local count: 1073741824]:
  # _2 = PHI <_6(3), j_5(D)(2)>
```
But at -O2 we get:
```
  if (_1 != 0)
goto ; [66.00%]
  else
goto ; [34.00%]

   [local count: 708669599]:
  if (j_5(D) != 2)
goto ; [66.00%]
  else
goto ; [34.00%]

   [local count: 467721933]:
  _6 = (int) _1;

   [local count: 1073741824]:
  # _2 = PHI <_6(4), j_5(D)(3), j_5(D)(2)>
```

Because VRP comes along and replaces j_5(D) with 2 along the edge `3->5` and
ifcombine does not do handle that as the phi entries are different (but maybe
can be proved as the same).

Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-02 Thread Jeff Law via Gcc-patches




On 7/29/23 03:14, Xiao Zeng wrote:



1 Thank you for Jeff's code review comments. I have made the modifications
and submitted the V2-patch[3/5].
Yea.  I'm adjusting my tree based on those updates.  For testing I've 
actually got my compiler generating zicond by default and qemu allowing 
zicond by default.  I can then run the execute.exp tests which validate 
code correctness to a reasonable degree.





2 For the calculation method of cost, I hope to submit a separate patch[cost]
after the V2-patch[3/5] merged into master, which will focus on explaining
the reasons for calculating cost in the same way as in patch[4/5].
I think the costing problem is going to require its own little 
subproject.  GCC's approach to costing is a bit crazy with multiple APIs 
that behave differently and in some cases do some rather surprising 
things.  It's a long standing design flaw.


The point being that I think we'll probably move forward with the 
functional bits, perhaps initially without the basic functionality 
tests.  That allows folks to start utilizing the core functionality 
while we audit and likely adjust the risc-v cost hook implementation.





4. In V2-patch[3/5], Zicond's cost calculation is not involved, therefore, all 
test
cases are skipped with "- O0" and "- Os". I will remove the "- Os" constraint 
from
the test case in patch[cost].
We may need to avoid for -Og as well.  I've got that change here 
locally, but I wanted to go back and review that as well.


jeff


Re: [RFC] Combine zero_extract and sign_extend for TARGET_TRULY_NOOP_TRUNCATION

2023-08-02 Thread YunQiang Su via Gcc-patches
YunQiang Su  于2023年8月3日周四 11:18写道:
>
> PR #104914
>
> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
> zero_extract (SI, SI) can be sign-extended.  So, if a zero_extract (DI,
> DI) following with an sign_extend(SI, DI) can be merged to a single
> zero_extract (SI, SI).
>

The RTL is like:

(insn 10 49 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
(const_int 8 [0x8])
(const_int 0 [0]))
(subreg:DI (reg:QI 202 [ *buf_8(D) ]) 0)) "xx.c":4:29 281 {*insvdi}
 (expr_list:REG_DEAD (reg:QI 202 [ *buf_8(D) ])
(nil)))
(insn 11 10 12 2 (set (reg/v:DI 200 [ val ])
(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0)))
"xx.c":4:29 238 {extendsidi2}
 (nil))

--->

(note 10 49 11 2 NOTE_INSN_DELETED)
(insn 11 10 12 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
(const_int 8 [0x8])
(const_int 0 [0]))
(subreg:SI (reg:QI 202 [ *buf_8(D) ]) 0)) "xx.c":4:29 280 {*insvsi}
 (expr_list:REG_DEAD (reg:QI 202 [ *buf_8(D) ])
(nil)))


This is another method to solve #104914.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104914

Another method is here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624856.html
aka when generate RTL for zero_extract, we can determine whether it
is SImode. So we can generate the correct zero_extract at the first time,
aka in the expand pass.

Any idea about which method is better?

> gcc/ChangeLog:
> PR: 104914.
> * combine.cc (try_combine): Combine zero_extract (DI, DI) and
>   following sign_extend (DI, SI) for
>   TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
>   (subst): Allow replacing reg(DI) with subreg(SI (reg DI))
>   if to is SImode and from is DImode for
>   TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
>
> gcc/testsuite/ChangeLog:
> PR: 104914.
> * gcc.target/mips/pr104914.c: New testcase.
> ---
>  gcc/combine.cc   | 88 
>  gcc/testsuite/gcc.target/mips/pr104914.c | 17 +
>  2 files changed, 90 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index e46d202d0a7..701b7c33b17 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3294,15 +3294,64 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
> *i1, rtx_insn *i0,
>n_occurrences = 0;   /* `subst' counts here */
>subst_low_luid = DF_INSN_LUID (i2);
>
> -  /* If I1 feeds into I2 and I1DEST is in I1SRC, we need to make a unique
> -copy of I2SRC each time we substitute it, in order to avoid creating
> -self-referential RTL when we will be substituting I1SRC for I1DEST
> -later.  Likewise if I0 feeds into I2, either directly or indirectly
> -through I1, and I0DEST is in I0SRC.  */
> -  newpat = subst (PATTERN (i3), i2dest, i2src, false, false,
> - (i1_feeds_i2_n && i1dest_in_i1src)
> - || ((i0_feeds_i2_n || (i0_feeds_i1_n && i1_feeds_i2_n))
> - && i0dest_in_i0src));
> +  /* Try to combine zero_extract (DImode) and sign_extend (SImode to 
> DImode)
> +for TARGET_TRULY_NOOP_TRUNCATION.  The RTL may look like:
> +
> +(insn 10 49 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
> +   (const_int 8 [0x8])
> +   (const_int 0 [0]))
> +(subreg:DI (reg:QI 202 [ *buf_8(D) ]) 0)) "xx.c":4:29 278 
> {*insvdi}
> +(expr_list:REG_DEAD (reg:QI 202 [ *buf_8(D) ]) (nil)))
> +(insn 11 10 12 2 (set (reg/v:DI 200 [ val ])
> +
> +(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) 238 
> {extendsidi2}
> +(nil))
> +
> +Since these architectures (MIPS64 as an example), the 32bit operation
> +instructions will sign-extend the reuslt to 64bit.  The result can 
> be:
> +
> +(insn 10 49 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ 
> val ]) 0)
> +  (const_int 8 [0x8])
> +  (const_int 0 [0]))
> +(subreg:SI (reg:QI 202 [ *buf_8(D) ]) 0)) "xx.c":4:29 280 
> {*insvsi}
> +(expr_list:REG_DEAD (reg:QI 202 [ *buf_8(D) ]) (nil)))
> +   */
> +  if (i0 == 0 && i1 == 0 && i3 != 0 && i2 != 0 && GET_CODE (i2) == INSN
> + && GET_CODE (i3) == INSN && GET_CODE (PATTERN (i2)) == SET
> + && GET_CODE (PATTERN (i3)) == SET
> + && GET_CODE (SET_DEST (single_set (i2))) == ZERO_EXTRACT
> + && GET_CODE (SET_SRC (single_set (i3))) == SIGN_EXTEND
> + && SUBREG_P (XEXP (SET_SRC (single_set (i3)), 0))
> + && REGNO (SUBREG_REG (XEXP (SET_SRC (single_set (i3)), 0)))
> +== REGNO (SET_DEST (single_set (i3)))
> + && REGNO (XEXP (SET_DEST (single_set (i2)), 0))
> +== REGNO (SET_DEST (single_set (i3)))
> + && 

Re: [PATCH v1] RISC-V: Support RVV VFDIV and VFRDIV rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai
I am considering whether it is better to have multiple macro define for FRM ?

like:

DECLARE_FRM_FUNCTION_BASE (NAME)\
  extern const function_base *const NAME;
  extern const function_base *const NAME##_frm;

DECLARE_FRM_FUNCTION (NAME, )\
  DEF_RVV_FUNCTION (NAME##_frm, alu, );
  DEF_RVV_FUNCTION (NAME##_frm, alu_frm,);


I am not sure. I would rather wait for kito's more comments.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-03 11:29
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFDIV and VFRDIV rounding mode 
intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFDIV and VFRDIV for the below samples.
 
* __riscv_vfdiv_vv_f32m1_rm
* __riscv_vfdiv_vv_f32m1_rm_m
* __riscv_vfdiv_vf_f32m1_rm
* __riscv_vfdiv_vf_f32m1_rm_m
* __riscv_vfrdiv_vf_f32m1_rm
* __riscv_vfrdiv_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(binop_frm): New declaration.
(reverse_binop_frm): Likewise.
(BASE): Likewise.
* config/riscv/riscv-vector-builtins-bases.h:
(vfdiv_frm): New extern declaration.
(vfrdiv_frm): Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfdiv_frm): New function definition.
(vfrdiv_frm): Likewise.
* config/riscv/vector.md: Add vfdiv to frm_mode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-div.c: New test.
* gcc.target/riscv/rvv/base/float-point-single-rdiv.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  6 +++
.../riscv/riscv-vector-builtins-bases.h   |  2 +
.../riscv/riscv-vector-builtins-functions.def |  3 ++
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-div.c   | 44 +++
.../riscv/rvv/base/float-point-single-rdiv.c  | 33 ++
6 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-div.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rdiv.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 3adc11138a3..95ec9ccb481 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -278,6 +278,7 @@ public:
/* Implements below instructions for now.
- vfadd
- vfmul
+   - vfdiv
*/
template
class binop_frm : public function_base
@@ -301,6 +302,7 @@ public:
/* Implements below instructions for frm
- vfrsub
+   - vfrdiv
*/
template
class reverse_binop_frm : public function_base
@@ -2106,7 +2108,9 @@ static CONSTEXPR const widen_binop_frm 
vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
+static CONSTEXPR const binop_frm vfdiv_frm_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
+static CONSTEXPR const reverse_binop_frm vfrdiv_frm_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
static CONSTEXPR const vfmacc vfmacc_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
@@ -2338,7 +2342,9 @@ BASE (vfwsub_frm)
BASE (vfmul)
BASE (vfmul_frm)
BASE (vfdiv)
+BASE (vfdiv_frm)
BASE (vfrdiv)
+BASE (vfrdiv_frm)
BASE (vfwmul)
BASE (vfmacc)
BASE (vfnmsac)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 9c12a6b4e8f..f35fd3d27cf 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -154,7 +154,9 @@ extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
+extern const function_base *const vfdiv_frm;
extern const function_base *const vfrdiv;
+extern const function_base *const vfrdiv_frm;
extern const function_base *const vfwmul;
extern const function_base *const vfmacc;
extern const function_base *const vfnmsac;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 35a83ef239c..e7e6c7d8ed8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -321,6 +321,9 @@ DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfdiv_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfdiv_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfrdiv_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5d3e4256cd5..4b6c3859947 100644
--- a/gcc/config/riscv/vector.md
+++ 

[PATCH v1] RISC-V: Support RVV VFDIV and VFRDIV rounding mode intrinsic API

2023-08-02 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFDIV and VFRDIV for the below samples.

* __riscv_vfdiv_vv_f32m1_rm
* __riscv_vfdiv_vv_f32m1_rm_m
* __riscv_vfdiv_vf_f32m1_rm
* __riscv_vfdiv_vf_f32m1_rm_m
* __riscv_vfrdiv_vf_f32m1_rm
* __riscv_vfrdiv_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(binop_frm): New declaration.
(reverse_binop_frm): Likewise.
(BASE): Likewise.
* config/riscv/riscv-vector-builtins-bases.h:
(vfdiv_frm): New extern declaration.
(vfrdiv_frm): Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfdiv_frm): New function definition.
(vfrdiv_frm): Likewise.
* config/riscv/vector.md: Add vfdiv to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-div.c: New test.
* gcc.target/riscv/rvv/base/float-point-single-rdiv.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  6 +++
 .../riscv/riscv-vector-builtins-bases.h   |  2 +
 .../riscv/riscv-vector-builtins-functions.def |  3 ++
 gcc/config/riscv/vector.md|  2 +-
 .../riscv/rvv/base/float-point-single-div.c   | 44 +++
 .../riscv/rvv/base/float-point-single-rdiv.c  | 33 ++
 6 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-div.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rdiv.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 3adc11138a3..95ec9ccb481 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -278,6 +278,7 @@ public:
 /* Implements below instructions for now.
- vfadd
- vfmul
+   - vfdiv
 */
 template
 class binop_frm : public function_base
@@ -301,6 +302,7 @@ public:
 
 /* Implements below instructions for frm
- vfrsub
+   - vfrdiv
 */
 template
 class reverse_binop_frm : public function_base
@@ -2106,7 +2108,9 @@ static CONSTEXPR const widen_binop_frm 
vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
 static CONSTEXPR const binop_frm vfmul_frm_obj;
 static CONSTEXPR const binop vfdiv_obj;
+static CONSTEXPR const binop_frm vfdiv_frm_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
+static CONSTEXPR const reverse_binop_frm vfrdiv_frm_obj;
 static CONSTEXPR const widen_binop vfwmul_obj;
 static CONSTEXPR const vfmacc vfmacc_obj;
 static CONSTEXPR const vfnmsac vfnmsac_obj;
@@ -2338,7 +2342,9 @@ BASE (vfwsub_frm)
 BASE (vfmul)
 BASE (vfmul_frm)
 BASE (vfdiv)
+BASE (vfdiv_frm)
 BASE (vfrdiv)
+BASE (vfrdiv_frm)
 BASE (vfwmul)
 BASE (vfmacc)
 BASE (vfnmsac)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 9c12a6b4e8f..f35fd3d27cf 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -154,7 +154,9 @@ extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
 extern const function_base *const vfmul_frm;
 extern const function_base *const vfdiv;
+extern const function_base *const vfdiv_frm;
 extern const function_base *const vfrdiv;
+extern const function_base *const vfrdiv_frm;
 extern const function_base *const vfwmul;
 extern const function_base *const vfmacc;
 extern const function_base *const vfnmsac;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 35a83ef239c..e7e6c7d8ed8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -321,6 +321,9 @@ DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
 DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfdiv_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfdiv_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfrdiv_frm, alu_frm, full_preds, f_vvf_ops)
 
 // 13.5. Vector Widening Floating-Point Multiply
 DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5d3e4256cd5..4b6c3859947 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
 
 ;; Defines rounding mode of an floating-point operation.
 (define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul,vfdiv")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git 

[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-03
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed. Though I have no idea how to fix this really.
The first major change to the IR happens in thread2 where we decide to do a
jump thread with the change that we didn't do before.

In GCC 13 we had:
```
   [local count: 282631250]:
  # a.8_39 = PHI <_12(23), 0(3)>
  # f_lsm.17_20 = PHI 
  # f_lsm_flag.18_22 = PHI 
  # b_lsm.19_45 = PHI <0(23), b_lsm.19_53(3)>
  # b_lsm_flag.20_47 = PHI <1(23), 0(3)>
  # a_lsm.21_49 = PHI <_12(23), _55(D)(3)>
  _1 = a.8_39 != 0;
  _2 = (int) _1;
  if (_2 != a.8_39)
goto ; [41.79%]
```

On the trunk we get:
```
   [local count: 339987332]:
  # a.8_38 = PHI <_10(24), 0(3)>
  # f_lsm.17_18 = PHI 
  # f_lsm_flag.18_20 = PHI 
  # b_lsm.19_44 = PHI <0(24), b_lsm.19_52(3)>
  # b_lsm_flag.20_46 = PHI <1(24), 0(3)>
  # a_lsm.21_48 = PHI <_10(24), _54(D)(3)>
  _13 = (unsigned int) a.8_38;
  if (_13 > 1)
goto ; [34.74%]
  else
goto ; [65.26%]
```
We duplicate bb4 for bb3 as we can figure that _13>1 will be false. This was
not done for the IR in GCC 13.

I am super confused about VRP's ranges:
We have the following that ranges that get exported and their relationships:
Global Exported: a.8_105 = [irange] int [-2, 0]
  _10 = a.8_105 + -1;
Global Exported: _10 = [irange] int [-INF, -6][-3, -1][1, 2147483645]
  _103 = (unsigned int) _10;
Global Exported: _103 = [irange] unsigned int [1, 2147483645][2147483648,
4294967290][4294967294, +INF]
Simplified relational if (_103 > 1)
 into if (_103 != 1)


Shouldn't the range of _10 just be [-3,-1] 
If so _103 can't get 0 or 1 ? And then if that gets it right then the call to
foo will go away.

[RFC] Combine zero_extract and sign_extend for TARGET_TRULY_NOOP_TRUNCATION

2023-08-02 Thread YunQiang Su
PR #104914

On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
zero_extract (SI, SI) can be sign-extended.  So, if a zero_extract (DI,
DI) following with an sign_extend(SI, DI) can be merged to a single
zero_extract (SI, SI).

gcc/ChangeLog:
PR: 104914.
* combine.cc (try_combine): Combine zero_extract (DI, DI) and
  following sign_extend (DI, SI) for
  TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.
  (subst): Allow replacing reg(DI) with subreg(SI (reg DI))
  if to is SImode and from is DImode for
  TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true.

gcc/testsuite/ChangeLog:
PR: 104914.
* gcc.target/mips/pr104914.c: New testcase.
---
 gcc/combine.cc   | 88 
 gcc/testsuite/gcc.target/mips/pr104914.c | 17 +
 2 files changed, 90 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index e46d202d0a7..701b7c33b17 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3294,15 +3294,64 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   n_occurrences = 0;   /* `subst' counts here */
   subst_low_luid = DF_INSN_LUID (i2);
 
-  /* If I1 feeds into I2 and I1DEST is in I1SRC, we need to make a unique
-copy of I2SRC each time we substitute it, in order to avoid creating
-self-referential RTL when we will be substituting I1SRC for I1DEST
-later.  Likewise if I0 feeds into I2, either directly or indirectly
-through I1, and I0DEST is in I0SRC.  */
-  newpat = subst (PATTERN (i3), i2dest, i2src, false, false,
- (i1_feeds_i2_n && i1dest_in_i1src)
- || ((i0_feeds_i2_n || (i0_feeds_i1_n && i1_feeds_i2_n))
- && i0dest_in_i0src));
+  /* Try to combine zero_extract (DImode) and sign_extend (SImode to 
DImode)
+for TARGET_TRULY_NOOP_TRUNCATION.  The RTL may look like:
+
+(insn 10 49 11 2 (set (zero_extract:DI (reg/v:DI 200 [ val ])
+   (const_int 8 [0x8])
+   (const_int 0 [0]))
+(subreg:DI (reg:QI 202 [ *buf_8(D) ]) 0)) "xx.c":4:29 278 {*insvdi}
+(expr_list:REG_DEAD (reg:QI 202 [ *buf_8(D) ]) (nil)))
+(insn 11 10 12 2 (set (reg/v:DI 200 [ val ])
+
+(sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) 238 
{extendsidi2}
+(nil))
+
+Since these architectures (MIPS64 as an example), the 32bit operation
+instructions will sign-extend the reuslt to 64bit.  The result can be:
+
+(insn 10 49 11 2 (set (zero_extract:SI (subreg:SI (reg/v:DI 200 [ val 
]) 0)
+  (const_int 8 [0x8])
+  (const_int 0 [0]))
+(subreg:SI (reg:QI 202 [ *buf_8(D) ]) 0)) "xx.c":4:29 280 {*insvsi}
+(expr_list:REG_DEAD (reg:QI 202 [ *buf_8(D) ]) (nil)))
+   */
+  if (i0 == 0 && i1 == 0 && i3 != 0 && i2 != 0 && GET_CODE (i2) == INSN
+ && GET_CODE (i3) == INSN && GET_CODE (PATTERN (i2)) == SET
+ && GET_CODE (PATTERN (i3)) == SET
+ && GET_CODE (SET_DEST (single_set (i2))) == ZERO_EXTRACT
+ && GET_CODE (SET_SRC (single_set (i3))) == SIGN_EXTEND
+ && SUBREG_P (XEXP (SET_SRC (single_set (i3)), 0))
+ && REGNO (SUBREG_REG (XEXP (SET_SRC (single_set (i3)), 0)))
+== REGNO (SET_DEST (single_set (i3)))
+ && REGNO (XEXP (SET_DEST (single_set (i2)), 0))
+== REGNO (SET_DEST (single_set (i3)))
+ && GET_MODE (SET_DEST (single_set (i2))) == DImode
+ && GET_MODE (SET_DEST (single_set (i3))) == DImode
+ && GET_MODE (XEXP (SET_SRC (single_set (i3)), 0)) == SImode
+ && TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode))
+   {
+ newpat = copy_rtx (PATTERN (i2));
+ PUT_MODE (SET_DEST (newpat), SImode);
+ PUT_MODE (SET_SRC (newpat), SImode);
+
+ rtx i2dest_r = XEXP (SET_DEST (newpat), 0);
+ rtx i3src_r = XEXP (SET_SRC (single_set (i3)), 0);
+ newpat = subst (newpat, i2dest_r, i3src_r, false, false, false);
+   }
+  else
+   {
+ /* If I1 feeds into I2 and I1DEST is in I1SRC, we need to make a
+unique copy of I2SRC each time we substitute it, in order to
+avoid creating self-referential RTL when we will be substituting
+I1SRC for I1DEST later.  Likewise if I0 feeds into I2, either
+directly or indirectly through I1, and I0DEST is in I0SRC.  */
+ newpat = subst (
+ PATTERN (i3), i2dest, i2src, false, false,
+ (i1_feeds_i2_n && i1dest_in_i1src)
+ || ((i0_feeds_i2_n || (i0_feeds_i1_n && i1_feeds_i2_n))
+ && i0dest_in_i0src));
+   }
   substed_i2 = true;
 
   /* Record whether I2's body now appears within I3's body.  */
@@ 

RE: [PATCH v2] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, August 3, 2023 10:36 AM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Support RVV VFMUL rounding mode intrinsic API

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-03 10:32
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

Update in v2:

* Sync with upstream for the vfmul duplicated declaration.

Original log:

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index f40b022239d..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,6 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- 

[Bug tree-optimization/110873] [14 Regression] Dead Code Elimination Regression at -O2 since r14-376-g47a76439911

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110873

--- Comment #2 from Andrew Pinski  ---
Note there are other missed optimizations later on too even in GCC 13.

[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |14.0

Re: [PATCH 0/5] Recognize Zicond extension

2023-08-02 Thread Jeff Law via Gcc-patches




On 7/28/23 00:34, Xiao Zeng wrote:





What I like about yours is it keeps all the logic in riscv.cc rather
than scattering it across riscv.cc and riscv.md.


Yes, when I use enough test cases, I cannot find a concise way to optimize
all test cases. When I enumerated all possible cases in the movcc
function of the RISC-V backend, I found a method that satisfied me, which
is the method in patch [3/5].

I continue to work with the riscv_expand_conditional_move improvements.

Given the deeper problems we have with costing, I'm considering starting 
to push some of the riscv_expand_conditional_move work you've done 
without the testcases since those testcases depend on fixing the costing 
problems.


The expansion changes still have value without the costing changes. 
When we expand a COND_EXPR from gimple, we will attempt to use the 
conditional move pattern first, without regard for costing.




If it's just for the Zicond instruction set, is it necessary to make judgments
outside of eq/ne? After all, it does not support comparison actions other
than eq/ne. Of course, it is also possible to use a special technique to use
Zicond in non eq/ne comparisons.
It's not necessary, but it's certainly helpful to utilize sCC insns in 
conjuction with zicond to if-convert other conditional branches.  It's 
conceptually pretty simple.


If the incoming code is not EQ/NE or we're not comparing a register 
against 0, then we can emit an scc insn to get the comparison result 
into a temporary, then use the standard zicond expansions.


Jeff


[Bug tree-optimization/110873] [14 Regression] Dead Code Elimination Regression at -O2 since r14-376-g47a76439911

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110873

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-03
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
GCC 13 had:
```
 Registering value_relation (h_29 == h_13) (bb10) at h_29 = PHI 
   Loops range found for h_29: [irange] unsigned char [5, 5] NONZERO 0x5 and
calculated range :[irange] unsigned char [0, 5] NONZERO 0x7
Global Exported: h_29 = [irange] unsigned char [5, 5] NONZERO 0x5
Folding PHI node: h_29 = PHI 
Queued PHI for removal.  Folds to: 5
Folding statement: if (h_29 <= 4)
gimple_simplified to if (0 != 0)
gimple_simplified to if (0 != 0)
Folded into: if (0 != 0)
``
While trunk does:
```
redicate evaluates to: DON'T KNOW
Not folded
 Registering value_relation (h_29 == h_13) (bb10) at h_29 = PHI 
Global Exported: h_29 = [irange] unsigned char [0, 5] MASK 0xfe VALUE 0xe7
Folding PHI node: h_29 = PHI 
No folding possible
Folding statement: if (h_29 <= 4)

Visiting conditional with predicate: if (h_29 <= 4)

With known ranges
h_29: [irange] unsigned char [0, 5] MASK 0xfe VALUE 0xe7

Predicate evaluates to: DON'T KNOW
Simplified relational if (h_29 <= 4)
 into if (h_29 != 5)

Folded into: if (h_29 != 5)
```
Which totally misses that h_29 was just 5.

[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

Andrew Pinski  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-August/
   ||626131.html
   Keywords||patch

--- Comment #11 from Andrew Pinski  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626131.html

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread juzhe.zh...@rivai.ai
Hi, Richi.

I have fully tested in RISC-V port with adding gcc_unreachable () in V4 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626133.html 

Bootstrap and regression on X86 passed.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-02 16:33
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Wed, 2 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Thanks Richard so much.
> 
> Forgive me asking question again :)
> 
> Is this following code correct for you ?
 
Well, I wonder what kind of testcase runs into the reduc_idx >= 0 case.
The point is I don't _know_ whether the code is correct, in fact it looked
suspicious ;)
 
> +  if (len_loop_p)
> +{
> +  if (len_opno >= 0)
> + {
> +   ifn = cond_len_fn;
> +   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
> +   vect_nargs += 2;
> + }
> +  else if (reduc_idx >= 0)
> + gcc_unreachable ();
> +}
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-02 15:49
> To: ???
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, ??? wrote:
>  
> > Oh, Thanks a lot.
> > I can test it in RISC-V backend now.
> > 
> > But I have another questions:
> > >> I'm a bit confused (but also by the existing mask code), whether
> > >>vect_nargs needs adjustment depends on the IFN in the IL we analyze.
> > >>If if-conversion recognizes a .COND_ADD then we need to add nothing
> > >>for masking (that is, ifn == cond_fn already).  In your code above
> > >>you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> > >>isn't that the very same?!  So how come you in one case add two
> > >>and in the other add four args?
> > >>Please make sure to place gcc_unreachable () in each arm and check
> > >>you have test coverage.  I believe that the else arm is unreachable
> > >>but when you vectorize .FMA you will need to add 4 and when you
> > >>vectorize .COND_FMA you will need to add two arguments (as said,
> > >>no idea why we special case reduc_idx >= 0 at the moment).
> > 
> > Do you mean I add gcc_unreachable in else like this:
> > 
> >   if (len_loop_p)
> > {
> >   if (len_opno >= 0)
> > {
> >   ifn = cond_len_fn;
> >   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
> >   vect_nargs += 2;
> > }
> >   else if (reduc_idx >= 0)
> > {
> >   /* FMA -> COND_LEN_FMA takes 4 extra 
> > arguments:MASK,ELSE,LEN,BIAS.  */
> >   ifn = get_len_internal_fn (cond_fn);
> >   vect_nargs += 4;
>  
> no, a gcc_unreachable () here.  That is, make sure you have test coverage
> for the above two cases (to me the len_opno >= 0 case is obvious)
>  
> > }
> > else
> > gcc_unreachable ();
> > }
> > 
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 21:58
> > To: ???
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, ??? wrote:
> >  
> > > Yeah. I have tried this case too.
> > > 
> > > But this case doesn't need to be vectorized as COND_FMA, am I right?
> >  
> > Only when you enable loop masking.  Alternatively use
> >  
> > double foo (double *a, double *b, double *c)
> > {
> >   double result = 0.0;
> >   for (int i = 0; i < 1024; ++i)
> > result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
> >   return result;
> > }
> >  
> > but then for me if-conversion produces
> >  
> >   iftmp.0_18 = __builtin_fma (_8, _10, _5);
> >   _ifc__43 = _26 ? iftmp.0_18 : 0.0;
> >  
> > with -ffast-math (probably rightfully so).  I then get .FMAs
> > vectorized and .COND_FMA folded.
> >  
> > > The thing I wonder is that whether this condtion:
> > > 
> > > if  (mask_opno >= 0 && reduc_idx >= 0)
> > > 
> > > or similar as len
> > > if  (len_opno >= 0 && reduc_idx >= 0)
> > > 
> > > Whether they are redundant in vectorizable_call ?
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-07-31 21:33
> > > To: juzhe.zh...@rivai.ai
> > > CC: richard.sandiford; gcc-patches
> > > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for 
> > > COND_LEN_*
> > > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> > >  
> > > > Hi, Richi.
> > > > 
> > > > >> I think you need to use fma from math.h together with -ffast-math
> > > > >>to get fma.
> > > > 
> > > > As you said, this is one of the case I tried:
> > > > https://godbolt.org/z/xMzrrv5dT 
> > > > GCC failed to vectorize.
> > > > 
> > > > Could you help me with this?
> > >  
> > > double foo (double *a, double *b, double *c)
> > > {
> > >   double result = 0.0;
> > >   for (int i = 0; i < 1024; ++i)
> > > result += __builtin_fma (a[i], b[i], c[i]);
> > >   return result;
> > > }
> > >  
> > > with -mavx2 -mfma -Ofast this is 

[PATCH V4] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html

This patch choose (1) approach that Richard provided, meaning:

RVV implements cond_* optabs as expanders.  RVV therefore supports
both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
are needed at the gimple level.

Such approach can make codes much cleaner and reasonable.

Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, 
int n)
{
  for (int i = 0; i < n; i++)
if (cond[i])
  a[i] = b[i] + a[i];
}


Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
:5:21: missed: couldn't vectorize loop
:5:21: missed: not vectorized: control flow in loop.

ARM SVE:

...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
...
vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
vect__6.13_56);

For RVV, we want IR as follows:

...
_68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, 
vect__6.13_55, _68, 0);
...

Both len and mask of COND_LEN_ADD are real not dummy.

This patch has been fully tested in RISC-V port with supporting both COND_* and 
COND_LEN_*.

And also, Bootstrap and Regression on X86 passed.

OK for trunk?

gcc/ChangeLog:

* internal-fn.cc (get_len_internal_fn): New function.
(DEF_INTERNAL_COND_FN): Ditto.
(DEF_INTERNAL_SIGNED_COND_FN): Ditto.
* internal-fn.h (get_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization.

---
 gcc/internal-fn.cc | 24 +
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 58 ++
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 8e294286388..7f5ede00c02 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4443,6 +4443,30 @@ get_conditional_internal_fn (internal_fn fn)
 }
 }
 
+/* If there exists an internal function like IFN that operates on vectors,
+   but with additional length and bias parameters, return the internal_fn
+   for that function, otherwise return IFN_LAST.  */
+internal_fn
+get_len_internal_fn (internal_fn fn)
+{
+  switch (fn)
+{
+#undef DEF_INTERNAL_COND_FN
+#undef DEF_INTERNAL_SIGNED_COND_FN
+#define DEF_INTERNAL_COND_FN(NAME, ...)
\
+  case IFN_COND_##NAME:
\
+return IFN_COND_LEN_##NAME;
+#define DEF_INTERNAL_SIGNED_COND_FN(NAME, ...) 
\
+  case IFN_COND_##NAME:
\
+return IFN_COND_LEN_##NAME;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_COND_FN
+#undef DEF_INTERNAL_SIGNED_COND_FN
+default:
+  return IFN_LAST;
+}
+}
+
 /* If IFN implements the conditional form of an unconditional internal
function, return that unconditional function, otherwise return IFN_LAST.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index a5c3f4765ff..410c1b623d6 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void);
 
 extern internal_fn get_conditional_internal_fn (tree_code);
 extern internal_fn get_conditional_internal_fn (internal_fn);
+extern internal_fn get_len_internal_fn (internal_fn);
 extern internal_fn get_conditional_len_internal_fn (tree_code);
 extern tree_code conditional_internal_fn_code (internal_fn);
 extern internal_fn get_unconditional_internal_fn (internal_fn);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6a4e8fce126..76b1c83f41e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3540,7 +3540,10 @@ vectorizable_call (vec_info *vinfo,
 
   int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
   internal_fn cond_fn = get_conditional_internal_fn (ifn);
+  internal_fn cond_len_fn = get_len_internal_fn (ifn);
+  int len_opno = internal_fn_len_index (cond_len_fn);
   vec_loop_masks *masks = (loop_vinfo ? _VINFO_MASKS (loop_vinfo) : NULL);
+  vec_loop_lens *lens = (loop_vinfo ? _VINFO_LENS (loop_vinfo) : NULL);
   if (!vec_stmt) /* transformation not required.  */
 {
   if (slp_node)
@@ -3569,6 +3572,9 @@ vectorizable_call (vec_info *vinfo,
  if (reduc_idx >= 0
  && (cond_fn == IFN_LAST
  || !direct_internal_fn_supported_p (cond_fn, vectype_out,
+ OPTIMIZE_FOR_SPEED))
+ && (cond_len_fn == IFN_LAST
+ || !direct_internal_fn_supported_p (cond_len_fn, vectype_out,
  OPTIMIZE_FOR_SPEED)))
{
  if (dump_enabled_p ())
@@ -3586,8 

Re: [PATCH v2] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-03 10:32
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li 
 
Update in v2:
 
* Sync with upstream for the vfmul duplicated declaration.
 
Original log:
 
This patch would like to support the rounding mode API for the VFMUL
for the below samples.
 
* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index f40b022239d..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,6 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmul_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm (op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmul_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+   size_t vl) {
+  return 

[PATCH] Fix PR 110874: infinite loop in gimple_bitwise_inverted_equal_p with fre

2023-08-02 Thread Andrew Pinski via Gcc-patches
So I didn't expect valueization to cause calling gimple_nop_convert
to iterate between 2 different SSA names causing an infinite loop
in gimple_bitwise_inverted_equal_p.
So we should cause a bound on gimple_bitwise_inverted_equal_p calling
gimple_nop_convert and only look through one rather than always.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/110874
* gimple-match-head.cc (gimple_bitwise_inverted_equal_p):
Add new argument, again with default value of true.
Don't try gimple_nop_convert if again is false.
Update call to gimple_bitwise_inverted_equal_p for
new argument.

gcc/testsuite/ChangeLog:

PR tree-optimization/110874
* gcc.c-torture/compile/pr110874-a.c: New test.
---
 gcc/gimple-match-head.cc| 14 +-
 .../gcc.c-torture/compile/pr110874-a.c  | 17 +
 2 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110874-a.c

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index b1e96304d7c..e91aaab86dd 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -273,7 +273,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
(*valueize) (tree))
 /* Helper function for bitwise_equal_p macro.  */
 
 static inline bool
-gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
(tree))
+gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, tree (*valueize) 
(tree), bool again = true)
 {
   if (expr1 == expr2)
 return false;
@@ -285,12 +285,16 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
tree (*valueize) (tree)
 return false;
 
   tree other;
-  if (gimple_nop_convert (expr1, , valueize)
-  && gimple_bitwise_inverted_equal_p (other, expr2, valueize))
+  if (again
+  && gimple_nop_convert (expr1, , valueize)
+  && other != expr1
+  && gimple_bitwise_inverted_equal_p (other, expr2, valueize, false))
 return true;
 
-  if (gimple_nop_convert (expr2, , valueize)
-  && gimple_bitwise_inverted_equal_p (expr1, other, valueize))
+  if (again
+  && gimple_nop_convert (expr2, , valueize)
+  && other != expr2
+  && gimple_bitwise_inverted_equal_p (expr1, other, valueize, false))
 return true;
 
   if (TREE_CODE (expr1) != SSA_NAME
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c 
b/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
new file mode 100644
index 000..b314410a892
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr110874-a.c
@@ -0,0 +1,17 @@
+struct S1 {
+  unsigned f0;
+};
+static int g_161;
+void func_109(unsigned g_227, unsigned t) {
+  struct S1 l_178;
+  int l_160 = 0x1FAE99D5L;
+  int *l_230[] = {_160};
+  if (l_160) {
+for (l_178.f0 = -7; l_178.f0;) {
+  ++g_227;
+  break;
+}
+(g_161) = g_227;
+  }
+  (g_161) &= t;
+}
-- 
2.31.1



[PATCH v2] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Pan Li via Gcc-patches
From: Pan Li 

Update in v2:

* Sync with upstream for the vfmul duplicated declaration.

Original log:

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  3 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 gcc/config/riscv/vector.md|  2 +-
 .../riscv/rvv/base/float-point-single-mul.c   | 44 +++
 5 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
 
 /* Implements below instructions for now.
- vfadd
+   - vfmul
 */
 template
 class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
 static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
 static CONSTEXPR const binop vfdiv_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
 static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
 BASE (vfwsub)
 BASE (vfwsub_frm)
 BASE (vfmul)
+BASE (vfmul_frm)
 BASE (vfdiv)
 BASE (vfrdiv)
 BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index f40b022239d..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,6 +152,7 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
 DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
 
 // 13.5. Vector Widening Floating-Point Multiply
 DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
 
 ;; Defines rounding mode of an floating-point operation.
 (define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmul_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm (op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmul_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+ size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat32m1_t

[Bug gcov-profile/110883] internal compiler error: in ipa_profile_write_edge_summary

2023-08-02 Thread hndxvon at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110883

--- Comment #2 from 海山  ---
(In reply to Andrew Pinski from comment #1)
> autofdo is not well supported and fixes for it only made it into GCC 12 and
> not GCC 10.x.
> 
> Please try GCC 12.x or newer and report back.
> 
> Also since you are using a redhat supplied GCC, you should have reported it
> to them as mentioned in the message that outputs:
> 
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.

thank you.
could you list the modifying files for autofdo, not only ipa-profile.cc?

[PATCH] Fix `~X & X` and `~X | X` patterns

2023-08-02 Thread Andrew Pinski via Gcc-patches
As Jakub noticed in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626039.html
what I did was not totally correct because sometimes chosing the wrong type.
So to get back to what the original code but keeping around the use of 
bitwise_inverted_equal_p,
we just need to check if the types of the two catupures are the same type.

Also adds a testcase for the problem Jakub found.

Committed as obvious after a bootstrap and test.

gcc/ChangeLog:

* match.pd (`~X & X`): Check that the types match.
(`~x | x`, `~x ^ x`): Likewise.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/20230802-1.c: New test.
---
 gcc/match.pd  |  6 +-
 .../gcc.c-torture/execute/20230802-1.c| 68 +++
 2 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/20230802-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c62f205c13c..53e622bf28f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1158,7 +1158,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Simplify ~X & X as zero.  */
 (simplify
  (bit_and (convert? @0) (convert? @1))
- (if (bitwise_inverted_equal_p (@0, @1))
+ (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
+  && bitwise_inverted_equal_p (@0, @1))
   { build_zero_cst (type); }))
 
 /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
@@ -1397,7 +1398,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for op (bit_ior bit_xor)
  (simplify
   (op (convert? @0) (convert? @1))
-  (if (bitwise_inverted_equal_p (@0, @1))
+  (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
+   && bitwise_inverted_equal_p (@0, @1))
(convert { build_all_ones_cst (TREE_TYPE (@0)); }
 
 /* x ^ x -> 0 */
diff --git a/gcc/testsuite/gcc.c-torture/execute/20230802-1.c 
b/gcc/testsuite/gcc.c-torture/execute/20230802-1.c
new file mode 100644
index 000..8802ffa8238
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/20230802-1.c
@@ -0,0 +1,68 @@
+/*  We used to simplify these incorrectly.  */
+__attribute__((noipa))
+long long
+foo (unsigned int x)
+{
+  int y = x;
+  y = ~y;
+  return ((long long) x) & y;
+}
+
+__attribute__((noipa))
+long long
+foo_v (volatile unsigned int x)
+{
+  volatile int y = x;
+  y = ~y;
+  return ((long long) x) & y;
+}
+
+__attribute__((noipa))
+long long
+bar (unsigned int x)
+{
+  int y = x;
+  y = ~y;
+  return ((long long) x) ^ y;
+}
+
+__attribute__((noipa))
+long long
+bar_v (volatile unsigned int x)
+{
+  volatile int y = x;
+  y = ~y;
+  return ((long long) x) ^ y;
+}
+
+__attribute__((noipa))
+long long
+baz (unsigned int x)
+{
+  int y = x;
+  y = ~y;
+  return y ^ ((long long) x);
+}
+
+__attribute__((noipa))
+long long
+baz_v (volatile unsigned int x)
+{
+  volatile int y = x;
+  y = ~y;
+  return y ^ ((long long) x);
+}
+
+
+int main()
+{
+  for(int t = -1; t <= 1; t++)
+{
+  if (foo(t) != foo_v(t))
+__builtin_abort ();
+  if (bar(t) != bar_v(t))
+__builtin_abort ();
+  if (baz(t) != baz_v(t))
+__builtin_abort ();
+}
+}
-- 
2.31.1



[Bug testsuite/110858] [14 Regression] gcc.dg/unroll-1.c UNRESOLVED

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110858

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Andrew Pinski  ---
Fixed by r14-2922-gb278d3080ef23835438ec625b984 .

Re: [PATCH V5 1/2] Add overflow API for plus minus mult on range

2023-08-02 Thread Jiufu Guo via Gcc-patches


Hi,

I would like to have a ping on this patch.

BR,
Jeff (Jiufu Guo)


Jiufu Guo  writes:

> Hi,
>
> As discussed in previous reviews, adding overflow APIs to range-op
> would be useful. Those APIs could help to check if overflow happens
> when operating between two 'range's, like: plus, minus, and mult.
>
> Previous discussions are here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html
>
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this patch ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
> gcc/ChangeLog:
>
>   * range-op-mixed.h (operator_plus::overflow_free_p): New declare.
>   (operator_minus::overflow_free_p): New declare.
>   (operator_mult::overflow_free_p): New declare.
>   * range-op.cc (range_op_handler::overflow_free_p): New function.
>   (range_operator::overflow_free_p): New default function.
>   (operator_plus::overflow_free_p): New function.
>   (operator_minus::overflow_free_p): New function.
>   (operator_mult::overflow_free_p): New function.
>   * range-op.h (range_op_handler::overflow_free_p): New declare.
>   (range_operator::overflow_free_p): New declare.
>   * value-range.cc (irange::nonnegative_p): New function.
>   (irange::nonpositive_p): New function.
>   * value-range.h (irange::nonnegative_p): New declare.
>   (irange::nonpositive_p): New declare.
>
> ---
>  gcc/range-op-mixed.h |  11 
>  gcc/range-op.cc  | 124 +++
>  gcc/range-op.h   |   5 ++
>  gcc/value-range.cc   |  12 +
>  gcc/value-range.h|   2 +
>  5 files changed, 154 insertions(+)
>
> diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
> index 6944742ecbc..42157ed9061 100644
> --- a/gcc/range-op-mixed.h
> +++ b/gcc/range-op-mixed.h
> @@ -383,6 +383,10 @@ public:
> relation_kind rel) const final override;
>void update_bitmask (irange , const irange ,
>  const irange ) const final override;
> +
> +  virtual bool overflow_free_p (const irange , const irange ,
> + relation_trio = TRIO_VARYING) const;
> +
>  private:
>void wi_fold (irange , tree type, const wide_int _lb,
>   const wide_int _ub, const wide_int _lb,
> @@ -446,6 +450,10 @@ public:
>   relation_kind rel) const final override;
>void update_bitmask (irange , const irange ,
>  const irange ) const final override;
> +
> +  virtual bool overflow_free_p (const irange , const irange ,
> + relation_trio = TRIO_VARYING) const;
> +
>  private:
>void wi_fold (irange , tree type, const wide_int _lb,
>   const wide_int _ub, const wide_int _lb,
> @@ -525,6 +533,9 @@ public:
>   const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
>   const REAL_VALUE_TYPE _lb, const REAL_VALUE_TYPE _ub,
>   relation_kind kind) const final override;
> +  virtual bool overflow_free_p (const irange , const irange ,
> + relation_trio = TRIO_VARYING) const;
> +
>  };
>  
>  class operator_addr_expr : public range_operator
> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
> index cb584314f4c..632b044331b 100644
> --- a/gcc/range-op.cc
> +++ b/gcc/range-op.cc
> @@ -366,6 +366,22 @@ range_op_handler::op1_op2_relation (const vrange ) 
> const
>  }
>  }
>  
> +bool
> +range_op_handler::overflow_free_p (const vrange ,
> +const vrange ,
> +relation_trio rel) const
> +{
> +  gcc_checking_assert (m_operator);
> +  switch (dispatch_kind (lh, lh, rh))
> +{
> +  case RO_III:
> + return m_operator->overflow_free_p(as_a  (lh),
> +as_a  (rh),
> +rel);
> +  default:
> + return false;
> +}
> +}
>  
>  // Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP.
>  
> @@ -688,6 +704,13 @@ range_operator::op1_op2_relation_effect (irange 
> _range ATTRIBUTE_UNUSED,
>return false;
>  }
>  
> +bool
> +range_operator::overflow_free_p (const irange &, const irange &,
> +  relation_trio) const
> +{
> +  return false;
> +}
> +
>  // Apply any known bitmask updates based on this operator.
>  
>  void
> @@ -4311,6 +4334,107 @@ range_op_table::initialize_integral_ops ()
>  
>  }
>  
> +bool
> +operator_plus::overflow_free_p (const irange , const irange ,
> + relation_trio) const
> +{
> +  if (lh.undefined_p () || rh.undefined_p ())
> +return false;
> +
> +  tree type = lh.type ();
> +  if (TYPE_OVERFLOW_UNDEFINED (type))
> +return true;
> +
> +  wi::overflow_type ovf;
> +  signop sgn = TYPE_SIGN (type);
> +  wide_int wmax0 = lh.upper_bound ();
> +  wide_int wmax1 = rh.upper_bound ();
> +  wi::add (wmax0, 

[Bug gcov-profile/110883] internal compiler error: in ipa_profile_write_edge_summary

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110883

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=71672
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2023-08-03

--- Comment #1 from Andrew Pinski  ---
autofdo is not well supported and fixes for it only made it into GCC 12 and not
GCC 10.x.

Please try GCC 12.x or newer and report back.

Also since you are using a redhat supplied GCC, you should have reported it to
them as mentioned in the message that outputs:

Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

RE: [PATCH v1] RISC-V: Remove redudant extern declaration in function base

2023-08-02 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Thursday, August 3, 2023 10:12 AM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Remove redudant extern declaration in function 
base

LGTM

mailto:pan2...@intel.com>> 於 2023年8月3日 週四 10:11 寫道:
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to remove the redudant declaration.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove
redudant declaration.
---
 gcc/config/riscv/riscv-vector-builtins-bases.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..f40b022239d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,6 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
-extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
--
2.34.1


Re: [PATCH v1] RISC-V: Remove redudant extern declaration in function base

2023-08-02 Thread Kito Cheng via Gcc-patches
LGTM

 於 2023年8月3日 週四 10:11 寫道:

> From: Pan Li 
>
> This patch would like to remove the redudant declaration.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.h: Remove
> redudant declaration.
> ---
>  gcc/config/riscv/riscv-vector-builtins-bases.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 5800fca0169..f40b022239d 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -152,7 +152,6 @@ extern const function_base *const vfwadd_frm;
>  extern const function_base *const vfwsub;
>  extern const function_base *const vfwsub_frm;
>  extern const function_base *const vfmul;
> -extern const function_base *const vfmul;
>  extern const function_base *const vfdiv;
>  extern const function_base *const vfrdiv;
>  extern const function_base *const vfwmul;
> --
> 2.34.1
>
>


[PATCH v1] RISC-V: Remove redudant extern declaration in function base

2023-08-02 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to remove the redudant declaration.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove
redudant declaration.
---
 gcc/config/riscv/riscv-vector-builtins-bases.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..f40b022239d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,6 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
-extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
-- 
2.34.1



RE: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches
Sure thing, will prepare it on the double.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, August 3, 2023 10:02 AM
To: Li, Pan2 ; gcc-patches 
Cc: Wang, Yanzhang ; kito.cheng 
Subject: Re: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic 
API

Could you split it into 2 patches ?

one is cleanup patch which is removing the redundant declaration.

The other is support VFMUL API.


juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2023-08-03 09:44
To: juzhe.zh...@rivai.ai; 
gcc-patches
CC: Wang, Yanzhang; 
kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
Yes, looks there is some I missed after the last cleanup. I will have a double 
check after rounding API support.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Thursday, August 3, 2023 9:40 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Li, Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>; kito.cheng 
mailto:kito.ch...@gmail.com>>
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;

It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- 

Re: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai
Could you split it into 2 patches ?

one is cleanup patch which is removing the redundant declaration.

The other is support VFMUL API.



juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-08-03 09:44
To: juzhe.zh...@rivai.ai; gcc-patches
CC: Wang, Yanzhang; kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
Yes, looks there is some I missed after the last cleanup. I will have a double 
check after rounding API support.
 
Pan
 
From: juzhe.zh...@rivai.ai  
Sent: Thursday, August 3, 2023 9:40 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
 
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
 
It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
 
 


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the VFMUL
for the below samples.
 
* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond 

[Bug other/110883] New: internal compiler error: in ipa_profile_write_edge_summary

2023-08-02 Thread hndxvon at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110883

Bug ID: 110883
   Summary: internal compiler error: in
ipa_profile_write_edge_summary
   Product: gcc
   Version: og10 (devel/omp/gcc-10)
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hndxvon at 163 dot com
  Target Milestone: ---

Hi,
gcc-10.2.1 compiles mysql with autofdo and lto optimization, then it reports
"internal compiler error: in ipa_profile_write_edge_summary, at
ipa-profile.c:355
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
"
cflags  '-fauto-profile=myfbdata.gcov -flto', but I try to compile
incrementally times, it maybe success.

$gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/10/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --disable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --without-isl --enable-gnu-indirect-function
--enable-cet --with-tune=generic --with-arch_32=x86-64
--build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20200825 (Redhat 10.2.1-3 2.17) (GCC)

moreover, gcc-9.2.1 build successful one time.

RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches
Yes, looks there is some I missed after the last cleanup. I will have a double 
check after rounding API support.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, August 3, 2023 9:40 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;

It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" 

Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;

It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the VFMUL
for the below samples.
 
* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmul_vv_f32m1_rm (vfloat32m1_t 

[PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  3 ++
 .../riscv/riscv-vector-builtins-bases.h   |  2 +-
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 gcc/config/riscv/vector.md|  2 +-
 .../riscv/rvv/base/float-point-single-mul.c   | 44 +++
 5 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
 
 /* Implements below instructions for now.
- vfadd
+   - vfmul
 */
 template
 class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
 static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
 static CONSTEXPR const binop vfdiv_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
 static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
 BASE (vfwsub)
 BASE (vfwsub_frm)
 BASE (vfmul)
+BASE (vfmul_frm)
 BASE (vfdiv)
 BASE (vfrdiv)
 BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
 DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
 
 // 13.5. Vector Widening Floating-Point Multiply
 DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
 
 ;; Defines rounding mode of an floating-point operation.
 (define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmul_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm (op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmul_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+ size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfmul_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t 

RE: [PATCH v1] RISC-V: Support RVV VFWSUB rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Wednesday, August 2, 2023 9:48 PM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFWSUB rounding mode intrinsic API

LGTM, thanks:)

Pan Li via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> 於 2023年8月2日 週三 18:19 
寫道:
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the VFWSUB
for the below samples.

* __riscv_vfwsub_vv_f64m2_rm
* __riscv_vfwsub_vv_f64m2_rm_m
* __riscv_vfwsub_vf_f64m2_rm
* __riscv_vfwsub_vf_f64m2_rm_m
* __riscv_vfwsub_wv_f64m2_rm
* __riscv_vfwsub_wv_f64m2_rm_m
* __riscv_vfwsub_wf_f64m2_rm
* __riscv_vfwsub_wf_f64m2_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (BASE): Add
vfwsub frm.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration.
* config/riscv/riscv-vector-builtins-functions.def (vfwsub_frm):
Add vfwsub function definitions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-sub.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  3 +
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  4 ++
 .../riscv/rvv/base/float-point-widening-sub.c | 66 +++
 4 files changed, 74 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 981a4a7ede8..ddf694c771c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -317,6 +317,7 @@ public:

 /* Implements below instructions for frm
- vfwadd
+   - vfwsub
 */
 template
 class widen_binop_frm : public function_base
@@ -2100,6 +2101,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrsub_frm_obj;
 static CONSTEXPR const widen_binop vfwadd_obj;
 static CONSTEXPR const widen_binop_frm vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
+static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
 static CONSTEXPR const binop vfdiv_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
@@ -2330,6 +2332,7 @@ BASE (vfrsub_frm)
 BASE (vfwadd)
 BASE (vfwadd_frm)
 BASE (vfwsub)
+BASE (vfwsub_frm)
 BASE (vfmul)
 BASE (vfdiv)
 BASE (vfrdiv)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index f9e1df5fe75..5800fca0169 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -150,6 +150,7 @@ extern const function_base *const vfrsub_frm;
 extern const function_base *const vfwadd;
 extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
+extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
 extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 743205a9b97..58a7224fe0c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -306,8 +306,12 @@ DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwf_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvv_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvf_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvv_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvf_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwf_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwv_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwf_ops)

 // 13.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
 DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
new file mode 100644
index 000..4325cc510a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat64m2_t
+test_vfwsub_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f64m2_rm (op1, op2, 0, vl);
+}
+
+vfloat64m2_t
+test_vfwsub_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+  size_t vl) {
+  return 

[Bug sanitizer/110876] AddressSanitizer: false positive bad-free

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110876

--- Comment #3 from Andrew Pinski  ---
I looked into the code and I think this is a boost issue, it is specifically
catching the abort signal and recovering and then exit is called and messes up.

[Bug sanitizer/110876] AddressSanitizer: false positive bad-free

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110876

--- Comment #2 from Andrew Pinski  ---
clang trunk has the same failure.
clang 11.0.0 has the same failure.
clang 10.0.0 references https://github.com/google/sanitizers/issues/189 .

I am thinking this is boost issue because both clang and GCC producing the
error.

[PATCH] MATCH: first of the value replacement moving from phiopt

2023-08-02 Thread Andrew Pinski via Gcc-patches
This moves a few simple patterns that are done in value replacement
in phiopt over to match.pd. Just the simple ones which might show up
in other code.

This allows some optimizations to happen even without depending
on sinking from happening and in some cases where phiopt is not
invoked (cond-1.c is an example there).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (`a == 0 ? b : b + a`,
`a == 0 ? b : b - a`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cond-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-33.c: New test.
* gcc.dg/tree-ssa/phi-opt-34.c: New test.
---
 gcc/match.pd   | 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/cond-1.c | 17 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c | 19 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c | 17 +
 4 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cond-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c62f205c13c..1ceff9691a0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3832,6 +3832,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
(op (mult (convert:type @0) @2) @1
 
+/* ?: Value replacement. */
+/* a == 0 ? b : b + a  -> b + a */
+(for op (plus bit_ior bit_xor)
+ (simplify
+  (cond (eq @0 integer_zerop) @1 (op:c@2 @1 @0))
+   @2))
+/* a == 0 ? b : b - a  -> b - a */
+/* a == 0 ? b : b ptr+ a  -> b ptr+ a */
+/* a == 0 ? b : b shift/rotate a -> b shift/rotate a */
+(for op (lrotate rrotate lshift rshift minus pointer_plus)
+ (simplify
+  (cond (eq @0 integer_zerop) @1 (op@2 @1 @0))
+   @2))
+
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cond-1.c
new file mode 100644
index 000..478a818b206
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+int sub(int a, int b, int c, int d) {
+  int e = (a == 0);
+  int f = !e;
+  c = b;
+  d = b - a ;
+  return ((-e & c) | (-f & d));
+}
+
+/* In the end we end up with `(a == 0) ? (b - a) : b`
+   which then can be optimized to just `(b - a)`. */
+
+/* { dg-final { scan-tree-dump-not "cond_expr," "optimized" } } */
+/* { dg-final { scan-tree-dump-not "eq_expr," "optimized" } } */
+/* { dg-final { scan-tree-dump-times "minus_expr," 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
new file mode 100644
index 000..809ccfe1479
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Phi-OPT should be able to optimize this without sinking being invoked. */
+/* { dg-options "-O -fdump-tree-phiopt2 -fdump-tree-optimized -fno-tree-sink" 
} */
+
+int f(int a, int b, int c) {
+  int d = a + b;
+  if (c > 5) return c;
+  if (a == 0) return b;
+  return d;
+}
+
+unsigned rot(unsigned x, int n) {
+  const int bits = __CHAR_BIT__ * __SIZEOF_INT__;
+  int t = ((x << n) | (x >> (bits - n)));
+  return (n == 0) ? x : t;
+}
+
+/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt2" } } */
+/* { dg-final { scan-tree-dump-times "goto" 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c
new file mode 100644
index 000..a90de8926c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* Phi-OPT should be able to optimize this without sinking being invoked. */
+/* { dg-options "-O -fdump-tree-phiopt2 -fdump-tree-optimized -fno-tree-sink" 
} */
+
+char *f(char *a, __SIZE_TYPE__ b) {
+  char *d = a + b;
+  if (b == 0) return a;
+  return d;
+}
+int sub(int a, int b, int c) {
+  int d = a - b;
+  if (b == 0) return a;
+  return d;
+}
+
+/* { dg-final { scan-tree-dump-not "goto" "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not "goto" "optimized" } } */
-- 
2.31.1



[Bug analyzer/110882] ICE with -fanalyzer on zero-sized array

2023-08-02 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110882

David Malcolm  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-08-02

--- Comment #2 from David Malcolm  ---
Reduced from downstream bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2228600

[Bug analyzer/110882] ICE with -fanalyzer on zero-sized array

2023-08-02 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110882

--- Comment #1 from David Malcolm  ---
It's failing this assertion:

#1  0x016e2295 in ana::binding_key::make (mgr=0x7fff91d8,
r=0x3275340) at ../../src/gcc/analyzer/store.cc:132
132   gcc_assert (bit_size > 0);
(gdb) list
127 {
128   bit_size_t bit_size;
129   if (r->get_bit_size (_size))
130 {
131   /* Must be non-empty.  */
132   gcc_assert (bit_size > 0);
133   return mgr->get_concrete_binding (offset.get_bit_offset (),
134 bit_size);
135 }
136   else

[Bug analyzer/110882] New: ICE with -fanalyzer on zero-sized array

2023-08-02 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110882

Bug ID: 110882
   Summary: ICE with -fanalyzer on zero-sized array
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

ICE seen with -fanalyzer on this code:

-

struct csv_row {
  char *columns[0];
};

void
parse_csv_line(int n_columns,
   const char *columns[])
{
  for (int n = 0; n < n_columns; n++) {
  columns[n] = ((void *)0);
  }
}

void parse_csv_data(int n_columns,
struct csv_row *entry)
{
  parse_csv_line(n_columns, (const char **)entry->columns);
}

-

ICE happens on gcc 13 onwards; specifically, affects:
- trunk: https://godbolt.org/z/To7c1r8ME
- gcc 13.2: https://godbolt.org/z/a5zr5Ga4b

gcc 12.3 is not affected

[x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword.

2023-08-02 Thread Roger Sayle

This patch is a conservative fix for PR target/110792, a wrong-code
regression affecting doubleword rotations by BITS_PER_WORD, which
effectively swaps the highpart and lowpart words, when the source to be
rotated resides in memory. The issue is that if the register used to
hold the lowpart of the destination is mentioned in the address of
the memory operand, the current define_insn_and_split unintentionally
clobbers it before reading the highpart.

Hence, for the testcase, the incorrectly generated code looks like:

salq$4, %rdi// calculate address
movqWHIRL_S+8(%rdi), %rdi   // accidentally clobber addr
movqWHIRL_S(%rdi), %rbp // load (wrong) lowpart

Traditionally, the textbook way to fix this would be to add an
explicit early clobber to the instruction's constraints.

 (define_insn_and_split "32di2_doubleword"
- [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+ [(set (match_operand:DI 0 "register_operand" "=r,r,")
(any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o")
   (const_int 32)))]

but unfortunately this currently generates significantly worse code,
due to a strange choice of reloads (effectively memcpy), which ends up
looking like:

salq$4, %rdi// calculate address
movdqa  WHIRL_S(%rdi), %xmm0// load the double word in SSE reg.
movaps  %xmm0, -16(%rsp)// store the SSE reg back to the
stack
movq-8(%rsp), %rdi  // load highpart
movq-16(%rsp), %rbp // load lowpart

Note that reload's "&" doesn't distinguish between the memory being
early clobbered, vs the registers used in an addressing mode being
early clobbered.

The fix proposed in this patch is to remove the third alternative, that
allowed offsetable memory as an operand, forcing reload to place the
operand into a register before the rotation.  This results in:

salq$4, %rdi
movqWHIRL_S(%rdi), %rax
movqWHIRL_S+8(%rdi), %rdi
movq%rax, %rbp

I believe there's a more advanced solution, by swapping the order of
the loads (if first destination register is mentioned in the address),
or inserting a lea insn (if both destination registers are mentioned
in the address), but this fix is a minimal "safe" solution, that
should hopefully be suitable for backporting.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-08-02  Roger Sayle  

gcc/ChangeLog
PR target/110792
* config/i386/i386.md (ti3): For rotations by 64 bits
place operand in a register before gen_64ti2_doubleword.
(di3): Likewise, for rotations by 32 bits, place
operand in a register before gen_32di2_doubleword.
(32di2_doubleword): Constrain operand to be in register.
(64ti2_doubleword): Likewise.

gcc/testsuite/ChangeLog
PR target/110792
* g++.target/i386/pr110792.C: New 32-bit C++ test case.
* gcc.target/i386/pr110792.c: New 64-bit C test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4db210c..849e1de 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15340,7 +15340,10 @@
 emit_insn (gen_ix86_ti3_doubleword
(operands[0], operands[1], operands[2]));
   else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 64)
-emit_insn (gen_64ti2_doubleword (operands[0], operands[1]));
+{
+  operands[1] = force_reg (TImode, operands[1]);
+  emit_insn (gen_64ti2_doubleword (operands[0], operands[1]));
+}
   else
 {
   rtx amount = force_reg (QImode, operands[2]);
@@ -15375,7 +15378,10 @@
 emit_insn (gen_ix86_di3_doubleword
(operands[0], operands[1], operands[2]));
   else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 32)
-emit_insn (gen_32di2_doubleword (operands[0], operands[1]));
+{
+  operands[1] = force_reg (DImode, operands[1]);
+  emit_insn (gen_32di2_doubleword (operands[0], operands[1]));
+}
   else
 FAIL;
 
@@ -15543,8 +15549,8 @@
 })
 
 (define_insn_and_split "32di2_doubleword"
- [(set (match_operand:DI 0 "register_operand" "=r,r,r")
-   (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o")
+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (any_rotate:DI (match_operand:DI 1 "register_operand" "0,r")
   (const_int 32)))]
  "!TARGET_64BIT"
  "#"
@@ -15561,8 +15567,8 @@
 })
 
 (define_insn_and_split "64ti2_doubleword"
- [(set (match_operand:TI 0 "register_operand" "=r,r,r")
-   (any_rotate:TI (match_operand:TI 1 "nonimmediate_operand" "0,r,o")
+ [(set (match_operand:TI 0 "register_operand" "=r,r")
+   (any_rotate:TI (match_operand:TI 1 "register_operand" "0,r")
   (const_int 64)))]
  

Re: [PATCH 1/2] Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_p

2023-08-02 Thread Andrew Pinski via Gcc-patches
On Wed, Aug 2, 2023 at 1:25 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Wed, Aug 02, 2023 at 10:04:26AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1157,8 +1157,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >
> > >  /* Simplify ~X & X as zero.  */
> > >  (simplify
> > > - (bit_and:c (convert? @0) (convert? (bit_not @0)))
> > > -  { build_zero_cst (type); })
> > > + (bit_and (convert? @0) (convert? @1))
> > > + (if (bitwise_inverted_equal_p (@0, @1))
> > > +  { build_zero_cst (type); }))
>
> I wonder if the above isn't incorrect.
> Without the possibility of widening converts it would be ok,
> but for widening conversions it is significant not just that
> the bits of @0 and @1 are inverted, but also that they are either
> both signed or both unsigned and so the MS bit (which is guaranteed
> to be different) extends to 0s in one case and to all 1s in the other
> one, so that even the upper bits are inverted.
> But that isn't the case here.  Something like (untested):
> long long
> foo (unsigned int x)
> {
>   int y = x;
>   y = ~y;
>   return ((long long) x) & y;
> }
> Actually maybe for this pattern it happens to be ok, because while
> the upper bits in this case might not be inverted between the extended
> operands (if x has msb set), it will be 0 & 0 in the upper bits.
>
> > >
> > >  /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
> > >  (simplify
> > > @@ -1395,8 +1396,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  /* ~x ^ x -> -1 */
> > >  (for op (bit_ior bit_xor)
> > >   (simplify
> > > -  (op:c (convert? @0) (convert? (bit_not @0)))
> > > -  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
> > > +  (op (convert? @0) (convert? @1))
> > > +  (if (bitwise_inverted_equal_p (@0, @1))
> > > +   (convert { build_all_ones_cst (TREE_TYPE (@0)); }
>
> But not here.
> long long
> bar (unsigned int x)
> {
>   int y = x;
>   y = ~y;
>   return ((long long) x) ^ y;
> }
>
> long long
> baz (unsigned int x)
> {
>   int y = x;
>   y = ~y;
>   return y ^ ((long long) x);
> }
> You pick TREE_TYPE (@0), but that is a random signedness if the two
> operands have different signedness.

Oh you are correct, I am testing a patch which adds the test to make
sure the types of @0 and @1 match which brings us back to basically
was done beforehand and still provides the benefit of using
bitwise_inverted_equal_p for the comparisons.

Thanks,
Andrew

>
> Jakub
>


Re: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread 钟居哲
Plz put your testcases into:

# widening operation only test on LMUL < 8
set AUTOVEC_TEST_OPTS [list \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
foreach op $AUTOVEC_TEST_OPTS {
  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
"" "$op"
}

You could either simpilfy put them into "widen" directory or create a new 
directly.
Anyway, make sure you have fully tested it with LMUL = 1/2/4.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread 钟居哲
I just checked LLVM:
https://godbolt.org/z/nMa6qnEeT 

This patch generally is reasonable so LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin
 


Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-02 Thread Andrew Pinski via Gcc-patches
On Wed, Aug 2, 2023 at 10:14 AM Andrew Pinski  wrote:
>
> On Wed, Aug 2, 2023 at 10:13 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
> >  wrote:
> > >
> > > This is a new version of the patch.
> > > Instead of doing the matching of inversion comparison directly inside
> > > match, creating a new function (bitwise_inverted_equal_p) to do it.
> > > It is very similar to bitwise_equal_p that was added in 
> > > r14-2751-g2a3556376c69a1fb
> > > but instead it says `expr1 == ~expr2`. A follow on patch, will
> > > use this function in other patterns where we try to match `@0` and 
> > > `(bit_not @0)`.
> > >
> > > Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
> > >
> > > Committed as approved after a Bootstrapped and test on x86_64-linux-gnu 
> > > with no regressions.
> > Hi Andrew,
> > Unfortunately, this patch (committed in
> > 2bae476b511dc441bf61da8a49cca655575e7dd6) causes
> > segmentation fault for pr33133.c on aarch64-linux-gnu because of
> > infinite recursion.
>
> A similar issue is recorded as PR 110874 which I am debugging right now.

Yes the issue is the same and is solved by the same patch.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
> >
> > Running the test under gdb shows:
> > Program received signal SIGSEGV, Segmentation fault.
> > operand_compare::operand_equal_p (this=0x29dc680
> > , arg0=0xf7789a68, arg1=0xf7789f30,
> > flags=16) at ../../gcc/gcc/fold-const.cc:3088
> > 3088{
> > (gdb) bt
> > #0  operand_compare::operand_equal_p (this=0x29dc680
> > , arg0=0xf7789a68, arg1=0xf7789f30,
> > flags=16) at ../../gcc/gcc/fold-const.cc:3088
> > #1  0x00a90394 in operand_compare::verify_hash_value
> > (this=this@entry=0x29dc680 ,
> > arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> > flags=flags@entry=0, ret=ret@entry=0xfc000157)
> > at ../../gcc/gcc/fold-const.cc:4074
> > #2  0x00a9351c in operand_compare::verify_hash_value
> > (ret=0xfc000157, flags=0, arg1=0xf7789f30,
> > arg0=0xf7789a68, this=0x29dc680 ) at
> > ../../gcc/gcc/fold-const.cc:4072
> > #3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
> > , arg0=arg0@entry=0xf7789a68,
> > arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
> > ../../gcc/gcc/fold-const.cc:3090
> > #4  0x00a9791c in operand_equal_p
> > (arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> > flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
> > #5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf7789f30, valueize=
> > 0x112d698 ) at
> > ../../gcc/gcc/gimple-match-head.cc:284
> > #6  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf77d0240,
> > valueize=0x112d698 ) at
> > ../../gcc/gcc/gimple-match-head.cc:296
> > #7  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf7789f30,
> > valueize=0x112d698 ) at
> > ../../gcc/gcc/gimple-match-head.cc:296
> > #8  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf77d0240,
> > ...
> >
> > It seems to recurse cyclically with expr2=0xf7789f30 ->
> > expr2=0xf77d0240 eventually leading to segfault.
> > while expr1=0xf7789a68 remains same throughout the stack frames.
> >
> > Thanks,
> > Prathamesh
> > >
> > > PR tree-optimization/100864
> > >
> > > gcc/ChangeLog:
> > >
> > > * generic-match-head.cc (bitwise_inverted_equal_p): New function.
> > > * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> > > (gimple_bitwise_inverted_equal_p): New function.
> > > * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> > > instead of direct matching bit_not.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/bitops-3.c: New test.
> > > ---
> > >  gcc/generic-match-head.cc| 42 ++
> > >  gcc/gimple-match-head.cc | 71 
> > >  gcc/match.pd |  5 +-
> > >  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
> > >  4 files changed, 183 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > >
> > > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > > index a71c0727b0b..ddaf22f2179 100644
> > > --- a/gcc/generic-match-head.cc
> > > +++ b/gcc/generic-match-head.cc
> > > @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
> > >  return wi::to_wide (expr1) == wi::to_wide (expr2);
> > >return operand_equal_p (expr1, expr2, 0);
> > >  }
> > > +
> > > +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> > > +   but not necessarily same type.
> > > +   The types can differ through nop conversions.  */
> > > +
> > > +static inline bool
> > > 

Re: Where to place warning about non-optimized tail and sibling calls

2023-08-02 Thread David Malcolm via Gcc
On Wed, 2023-08-02 at 13:16 -0400, Bradley Lucier wrote:
> On 8/1/23 6:08 PM, David Malcolm wrote:
> > FWIW I added it to support Scheme from libgccjit;
> 
> Do you know of any Scheme using libgccjit?

I don't.

It's not Scheme, but in case it's relevant, Emacs is doing ahead-of-
time compilation of its Emacs Lisp using libgccjit; see:
  https://akrl.sdf.org/gccemacs.html


> 
> BTW, I tried to build mainline with --enable-coverage to see which
> code 
> is executed with -foptimize-sibling-calls, but bootstrap fails with
> 
[...snip...]

Sorry, I don't have any special knowledge of this build failure.

Dave



[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #10 from Andrew Pinski  ---
I am not happy with the patch I came up with but it does reduce the amount of
iterating and 100% makes sure it is bound so it will work.

Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-02 at 14:46 -0400, Eric Feng wrote:
> On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek 
> wrote:
> > 
> > On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> > > 

[Dropping Joseph and Marek from the CC]

[...snip...]

> 
> 
> Thank you, everyone. I've submitted a new patch with the described
> changes. 

Thanks.

> As I do not yet have write access, could someone please help
> me commit it?

I've pushed the v3 trunk to patch, as r14-2933-gfafe2d18f791c6; you can
see it at [1], so you're now officially a GCC contributor,
congratulation!

FWIW I had to do a little whitespace fixing on the ChangeLog entries
before the server-side hooks.commit-extra-checker would pass, as they
were indented with spaces, rather than tabs, so it complained thusly:

remote: *** The following commit was rejected by your 
hooks.commit-extra-checker script (status: 1)
remote: *** commit: 0a4a2dc7dad1dfe22be0b48fe0d8c50d216c8349
remote: *** ChangeLog format failed:
remote: *** ERR: line should start with a tab: "PR analyzer/107646"
remote: *** ERR: line should start with a tab: "* analyzer-language.cc 
(run_callbacks): New function."
remote: *** ERR: line should start with a tab: "
(on_finish_translation_unit): New function."
remote: *** ERR: line should start with a tab: "* analyzer-language.h 
(GCC_ANALYZER_LANGUAGE_H): New include."
remote: *** ERR: line should start with a tab: "(class 
translation_unit): New vfuncs."
remote: *** ERR: line should start with a tab: "PR analyzer/107646"
remote: *** ERR: line should start with a tab: "* c-parser.cc: New 
functions on stashing values for the"
remote: *** ERR: line should start with a tab: "  analyzer."
remote: *** ERR: line should start with a tab: "PR analyzer/107646"
remote: *** ERR: line should start with a tab: "* 
gcc.dg/plugin/plugin.exp: Add new plugin and test."
remote: *** ERR: line should start with a tab: "* 
gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin."
remote: *** ERR: line should start with a tab: "* 
gcc.dg/plugin/cpython-plugin-test-1.c: New test."
remote: *** ERR: PR 107646 in subject but not in changelog: "analyzer: stash 
values for CPython plugin [PR107646]"
remote: *** 
remote: *** Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
remote: *** 
remote: error: hook declined to update refs/heads/master
To git+ssh://gcc.gnu.org/git/gcc.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git+ssh://dmalc...@gcc.gnu.org/git/gcc.git'

...but this was a trivial fix.  You can test that patches are properly
formatted by running:

  ./contrib/gcc-changelog/git_check_commit.py HEAD

locally.


>  Otherwise, please let me know if I should request write
> access first (the GettingStarted page suggested requesting someone
> commit the patch for the first few patches before requesting write
> access).

Please go ahead and request write access now; we should have done this
in the "community bonding" phase of GSoC; sorry for not catching this.

Thanks again for the patch.  How's the followup work?  Are you close to
being able to post one or more of the simpler known_function
subclasses?

Dave

[1] 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fafe2d18f791c6b97b49af7c84b1b5703681c3af



[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #9 from Andrew Pinski  ---
(gdb) p debug_tree(expr1)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0x778215e8 precision:32 min  max

pointer_to_this >
visited
def_stmt g_227.1_3 = (int) g_227_8;
version:3>
$3 = void
(gdb) up -1
#0  gimple_bitwise_inverted_equal_p (expr1=0x779db048,
expr2=0x779db3a8, valueize=0x11ba180 ) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/gimple-match-head.cc:278
278   if (expr1 == expr2)
(gdb) p debug_tree(expr1)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x77821690 precision:32 min  max 
pointer_to_this >
visited
def_stmt g_161.3_5 = (unsigned int) g_161.2_4;
version:5>
$4 = void



Value numbering stmt = g_161.2_4 = g_161;
Setting value number of g_161.2_4 to g_227.1_3 (changed)
g_227.1_3 is available for g_227.1_3
Value numbering stmt = g_161.3_5 = (unsigned int) g_161.2_4;
g_227.1_3 is available for g_227.1_3
Applying pattern match.pd:4207, gimple-match-4.cc:4064
Applying pattern match.pd:4299, gimple-match-4.cc:4122
Match-and-simplified (unsigned int) g_161.2_4 to g_227_8
RHS (unsigned int) g_161.2_4 simplified to g_227_8
Setting value number of g_161.3_5 to g_227_15 (changed)
Making available beyond BB6 g_161.3_5 for value g_227_15
Value numbering stmt = _6 = g_161.3_5 & t_17(D);



So when we go and handle gimple_nop_convert here and we loop between the two.
So we just need to limit gimple_bitwise_inverted_equal_p to only one look
through I think ...

[Bug analyzer/107646] RFE: can we reimplement gcc-python-plugin's cpychecker as a -fanalyzer plugin?

2023-08-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646

--- Comment #2 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:fafe2d18f791c6b97b49af7c84b1b5703681c3af

commit r14-2933-gfafe2d18f791c6b97b49af7c84b1b5703681c3af
Author: Eric Feng 
Date:   Wed Aug 2 16:54:55 2023 -0400

analyzer: stash values for CPython plugin [PR107646]

This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 

[Bug target/109977] [14 Regression] ICE: output_operand: incompatible floating point / vector register operand for '%d' at -Og

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109977

Andrew Pinski  changed:

   What|Removed |Added

 CC||slyfox at gcc dot gnu.org

--- Comment #3 from Andrew Pinski  ---
*** Bug 110880 has been marked as a duplicate of this bug. ***

[Bug target/110880] [14 Regression] aarch64 ICE on highway-1.0.5: internal compiler error: output_operand: incompatible floating point / vector register operand for '%s'

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110880

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Andrew Pinski  ---
Yes it is a dup of bug 109977:
(insn:TI 14 11 21 (set (mem/c:V2SF (plus:DI (reg/f:DI 31 sp)
(const_int 24 [0x18])) [1 __trans_tmp_1.raw_+0 S8 A64])
(vec_duplicate:V2SF (reg:SF 2 x2 [orig:94 _7 ] [94])))
"/app/example.cpp":18:29 1391 {aarch64_simd_stpv2sf}
 (nil))

Here we have V2SF and in that one we had V2DF but the problem is the same. The
use of the `vw` iterator in this pattern.

*** This bug has been marked as a duplicate of bug 109977 ***

[Bug target/110880] [14 Regression] aarch64 ICE on highway-1.0.5: internal compiler error: output_operand: incompatible floating point / vector register operand for '%s'

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110880

--- Comment #2 from Andrew Pinski  ---
Yes it is a dup of bug 109977:
(insn:TI 14 11 21 (set (mem/c:V2SF (plus:DI (reg/f:DI 31 sp)
(const_int 24 [0x18])) [1 __trans_tmp_1.raw_+0 S8 A64])
(vec_duplicate:V2SF (reg:SF 2 x2 [orig:94 _7 ] [94])))
"/app/example.cpp":18:29 1391 {aarch64_simd_stpv2sf}
 (nil))

Here we have V2SF and in that one we had V2DF but the problem is the same. The
use of the `vw` iterator in this pattern.

[Bug c++/110881] New: Feature request: an attribute for enum members that would skip the -Wswitch-enum warning

2023-08-02 Thread valentyn.pavliuchenko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110881

Bug ID: 110881
   Summary: Feature request: an attribute for enum members that
would skip the -Wswitch-enum warning
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: valentyn.pavliuchenko at gmail dot com
  Target Milestone: ---

Sometimes enum members are added only for static_assert() checks, but not for
actual use. The problem is that it triggers the -Wswitch-enum warning if these
members are not explicitly specified in a switch() statement.

The request is to add an attribute like [[unused]] for such enum members that
would not cause a -Wswitch-enum warning when the member is omitted in a
switch().

Example case that demonstrates the problem:

// build with -Wall
#include 

enum MyEnum
{
eMyEnum_One,
eMyEnum_Two,
eMyEnum_Three,
eMyEnum_Four,

eMyEnum_Count, // not a real enum value, but rather a marker for
static_assert() statements like in the bar() function below
};

void doEven();

void bar(MyEnum inValue)
{
static_assert(eMyEnum_Count == 4, "review the code below"); // would
trigger if new members are added to the enum
if (inValue == eMyEnum_Two || inValue == eMyEnum_Four)
doEven();
}

void foo(MyEnum inValue)
{
switch (inValue) // warning: enumeration value 'eMyEnum_Count' not handled
in switch [-Wswitch]
{
case eMyEnum_One: std::cout << "1\n";
case eMyEnum_Two: std::cout << "2\n";
case eMyEnum_Three: std::cout << "3\n";
case eMyEnum_Four: std::cout << "4\n";
}
}

Possible code after a fix:


enum MyEnum
{
eMyEnum_One,
eMyEnum_Two,
eMyEnum_Three,
eMyEnum_Four,

[[unused]]eMyEnum_Count,
};

[Bug target/110880] [14 Regression] aarch64 ICE on highway-1.0.5: internal compiler error: output_operand: incompatible floating point / vector register operand for '%s'

2023-08-02 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110880

--- Comment #1 from Sergei Trofimovich  ---
Compiler details:

$ aarch64-unknown-linux-gnu-g++ -v

Using built-in specs.
COLLECT_GCC=/<>/aarch64-unknown-linux-gnu-stage-final-gcc-14.0.0/bin/aarch64-unknown-linux-gnu-g++
COLLECT_LTO_WRAPPER=/<>/aarch64-unknown-linux-gnu-stage-final-gcc-14.0.0/libexec/gcc/aarch64-unknown-linux-gnu/14.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../source/configure
--prefix=/<>/aarch64-unknown-linux-gnu-stage-final-gcc-14.0.0
--with-gmp-include=/<>/gmp-with-cxx-6.2.1-dev/include
--with-gmp-lib=/<>/gmp-with-cxx-6.2.1/lib
--with-mpfr-include=/<>/mpfr-4.2.0-dev/include
--with-mpfr-lib=/<>/mpfr-4.2.0/lib --with-mpc=/<>/libmpc-1.3.1
--with-native-system-header-dir=/<>/glibc-aarch64-unknown-linux-gnu-2.38-dev/include
--with-build-sysroot=/ --program-prefix=aarch64-unknown-linux-gnu- --enable-lto
--disable-libstdcxx-pch --without-included-gettext --with-system-zlib
--enable-checking=release --enable-static --enable-languages=c,c++
--disable-multilib --enable-plugin --with-isl=/<>/isl-0.20
--with-arch=armv8-a
--with-as=/<>/aarch64-unknown-linux-gnu-binutils-wrapper-2.41/bin/aarch64-unknown-linux-gnu-as
--with-ld=/<>/aarch64-unknown-linux-gnu-binutils-wrapper-2.41/bin/aarch64-unknown-linux-gnu-ld
--with-headers=/<>/glibc-aarch64-unknown-linux-gnu-2.38-dev/include
--enable-__cxa_atexit --enable-long-long --enable-threads=posix --enable-nls
--disable-bootstrap --build=x86_64-unknown-linux-gnu
--host=x86_64-unknown-linux-gnu --target=aarch64-unknown-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0  (experimental) (GCC)

[Bug target/110880] New: [14 Regression] aarch64 ICE on highway-1.0.5: internal compiler error: output_operand: incompatible floating point / vector register operand for '%s'

2023-08-02 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110880

Bug ID: 110880
   Summary: [14 Regression] aarch64 ICE on highway-1.0.5: internal
compiler error: output_operand: incompatible floating
point / vector register operand for '%s'
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: slyfox at gcc dot gnu.org
  Target Milestone: ---

Don't know if a duplicate of PR109977 or not. FIling just in case. Initially
observed ICE on `highway-1.0.5` against gcc r14-2930-g0460c122162793.

Extracted small example as:

// $ cat math_test.cc.cc
void CopyBytes(bool *from, float *to) { __builtin_memcpy(to, from, 4); }
int TestMath_d;
typedef __Float32x2_t float32x2_t;
struct Vec128 {
  Vec128(float32x2_t raw) : raw_(raw) {}
  float32x2_t raw_;
};
Vec128 NativeSet(float t) {
  float32x2_t __trans_tmp_2{t, t};
  return __trans_tmp_2;
}
Vec128 Set(float t) { return NativeSet(t); }
float BitCast_out;
bool TestMath_in[4];
void TestMath(int fxN(int, Vec128 &)) {
  CopyBytes(TestMath_in, _out);
  Vec128 __trans_tmp_1 = Set(BitCast_out);
  fxN(TestMath_d, __trans_tmp_1);
}

$
aarch64-unknown-linux-gnu-stage-final-gcc-wrapper-14.0.0/bin/aarch64-unknown-linux-gnu-g++
-O2  -c math_test.cc.cc -o bug.o -O1

during RTL pass: final
math_test.cc.cc: In function 'void TestMath(int (*)(int, Vec128&))':
math_test.cc.cc:19:1: internal compiler error: output_operand: incompatible
floating point / vector register operand for '%s'
   19 | }
  | ^
0x1c13274 diagnostic_impl(rich_location*, diagnostic_metadata const*, int, char
const*, __va_list_tag (*) [1], diagnostic_t)
???:0
0x1c137e7 internal_error(char const*, ...)
???:0
0xb8b410 output_operand_lossage(char const*, ...)
???:0
0xb8b5d1 output_operand(rtx_def*, int)
???:0
0xb8c23d output_asm_insn(char const*, rtx_def**) [clone .part.0]
???:0
0xb8d564 final_scan_insn_1(rtx_insn*, _IO_FILE*, int, int, int*) [clone
.isra.0]
???:0
0xb8d99b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
???:0
0xb8dc37 final_1(rtx_insn*, _IO_FILE*, int, int)
???:0
0xb8e496 (anonymous namespace)::pass_final::execute(function*)
???:0

[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #8 from Andrew Pinski  ---
Oh the difference between -O1 and -O2 is due to in fre:
  /* At -O[1g] use the cheap non-iterating mode.  */
  bool iterate_p = may_iterate && (optimize > 1);

[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #7 from Andrew Pinski  ---
(In reply to David Binderman from comment #6)
> (In reply to Andrew Pinski from comment #4)
> > There is a stack overflow while executing the FRE pass.
> 
> I can confirm over 100,000 stack frames.
> 
> Which -f flag causes the FRE pass to be executed ?
> I assume it is in -O2, but not in -O1.

Fre is enabled at -O1 but the IR changes before fre. I have not looked into
what changes to the IR on why there is a difference between -O1 and -O2 yet.

[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #6 from David Binderman  ---
(In reply to Andrew Pinski from comment #4)
> There is a stack overflow while executing the FRE pass.

I can confirm over 100,000 stack frames.

Which -f flag causes the FRE pass to be executed ?
I assume it is in -O2, but not in -O1.

[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #5 from David Binderman  ---
Created attachment 55679
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55679=edit
C source code

Another test case.

[Bug c/110878] -Wstringop-overflow incorrectly warns about arguments to functions with static array parameter declarations

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110878

--- Comment #4 from Andrew Pinski  ---
(In reply to Taylor R Campbell from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > This is basically a dup of bug 108154 I think.
> 
> That one appears to be different: it trips -Wstringop-overread, not
> -Wstringop-overflow.

The infrastructure for both are the same for this static array parameters
though. So 

[Bug c/110878] -Wstringop-overflow incorrectly warns about arguments to functions with static array parameter declarations

2023-08-02 Thread campbell+gcc-bugzilla at mumble dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110878

--- Comment #3 from Taylor R Campbell  
---
(In reply to Andrew Pinski from comment #1)
> There is another bug report dealing with this. But IIRC this is an expected
> warning as foo is being passed an array which is size 16 but then passed to
> bar as size 128 which would be undefined.

There is nothing undefined here.

The caller's requirement as noted in the comment (which is not formally
expressible in C, as far as I know, but is obviously extremely
widespread practice) is that for foo(p, n) or bar(p, n), p must point
to the first element of an array of at least n elements.

foo additionally imposes the requirement that p have at least 16
elements.  bar additionally imposes the requirement that p have at
least 128 elements.

When the caller meets foo's contract, foo meets bar's contract.  So
there is nothing undefined.

>From C11, Sec. 6.7.6.3 `Function declarators (including prototypes)',
paragraph 7, p. 133:

> A declaration of a parameter as ``array of type'' shall be adjusted
> to ``qualified pointer to type'', where the type qualifiers (if any)
> are those specified within the [ and ] of the array type derivation.
> If the keyword static also appears within the [ and ] of the array
> type derivation, then for each call to the function, the value of the
> corresponding actual argument shall provide access to the first
> element of an array with at least as many elements as specified by
> the size expression.

Here, as required, the value of the corresponding actual argument does
provide access to the first element of an array with at least as many
elements as specified by the size expression.

In other words, this states a requirement about run-time values, which
the code meets, not about compile-time parameter declarations, which is
what GCC appears to object to.

(In reply to Andrew Pinski from comment #2)
> This is basically a dup of bug 108154 I think.

That one appears to be different: it trips -Wstringop-overread, not
-Wstringop-overflow.

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> [...]
>> >> in vect_determine_precisions_from_range.  Maybe we should drop
>> >> the shift handling from there and instead rely on
>> >> vect_determine_precisions_from_users, extending:
>> >> 
>> >>   if (TREE_CODE (shift) != INTEGER_CST
>> >>   || !wi::ltu_p (wi::to_widest (shift), precision))
>> >> return;
>> >> 
>> >> to handle ranges where the max is known to be < precision.
>> >> 
>> >> There again, if masking is enough for right shifts and right rotates,
>> >> maybe we should keep the current handling for then (with your fix)
>> >> and skip the types_compatible_p check for those cases.
>> >
>> > I think it should be enough for left-shifts as well?  If we lshift
>> > out like 0x100 << 9 so the lhs range is [0,0] the input range from
>> > op0 will still make us use HImode.  I think we only ever get overly
>> > conservative answers for left-shifts from this function?
>> 
>> But if we have:
>> 
>>   short x, y;
>>   int z = (int) x << (int) y;
>> 
>> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
>> whereas x << y would invoke UB and x << (y & 15) would be 1.
>
> True, but we start with the range of the LHS which in this case
> would be of type 'int' and thus 1 << 16 and not zero.  You
> might call that a failure of vect_determine_precisions_from_range
> of course, since it makes it not exactly a forward propagation ...

Ah, right, sorry.  I should have done more checking.

> [...]
>> > Originally I completely disabled shift support but that regressed
>> > the over-widen testcases a lot which at least have widened shifts
>> > by constants a lot.
>> >
>> > x86 has vector rotates only for AMD XOP (which is dead) plus
>> > some for V1TImode AFAICS, but I think we pattern-match rotates
>> > to shifts, so maybe the precision stuff is interesting for the
>> > case where we match the pattern rotate sequence for widenings?
>> >
>> > So for the types_compatible_p issue something along
>> > the following?  We could also exempt the shift operand from
>> > being covered by min_precision so the consumer would have
>> > to make sure it can be represented (I think that's never going
>> > to be an issue in practice until we get 256bit integers vectorized).
>> > It will have to fixup the shift operands anyway.
>> >
>> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>> > index e4ab8c2d65b..cdeeaf98a47 100644
>> > --- a/gcc/tree-vect-patterns.cc
>> > +++ b/gcc/tree-vect-patterns.cc
>> > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range 
>> > (stmt_vec_info stmt_info, gassign *stmt)
>> >   }
>> > else if (TREE_CODE (op) == SSA_NAME)
>> >   {
>> > -   /* Ignore codes that don't take uniform arguments.  */
>> > -   if (!types_compatible_p (TREE_TYPE (op), type))
>> > +   /* Ignore codes that don't take uniform arguments.  For shifts
>> > +  the shift amount is known to be in-range.  */
>> 
>> I guess it's more "we can assume that the amount is in range"?
>
> Yes.
>
>> > +   if (code == LSHIFT_EXPR
>> > +   || code == RSHIFT_EXPR
>> > +   || code == LROTATE_EXPR
>> > +   || code == RROTATE_EXPR)
>> > + {
>> > +   min_value = wi::min (min_value, 0, sign);
>> > +   max_value = wi::max (max_value, TYPE_PRECISION (type), 
>> > sign);
>> 
>> LGTM for shifts right.  Because of the above lshift thing, I think we
>> need something like:
>> 
>>   if (code == LSHIFT_EXPR || code == LROTATE_EXPR)
>> {
>>   wide_int op_min_value, op_max_value;
>>   if (!vect_get_range_info (op, _min_value, op_max_value))
>> return;
>> 
>>   /* We can ignore left shifts by negative amounts, which are UB.  */
>>   min_value = wi::min (min_value, 0, sign);
>> 
>>   /* Make sure the highest non-UB shift amount doesn't become UB.  */
>>   op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type));
>>   auto mask = wi::mask (TYPE_PRECISION (type), false,
>>  op_max_value.to_uhwi ());
>>   max_value = wi::max (max_value, mask, sign);
>> }
>> 
>> Does that look right?
>
> As said it looks overly conservative to me?  For example with my patch
> for
>
> void foo (signed char *v, int s)
> {
>   if (s < 1 || s > 7)
> return;
>   for (int i = 0; i < 1024; ++i)
> v[i] = v[i] << s;
> }
>
> I get
>
> t.c:5:21: note:   _7 has range [0xc000, 0x3f80]
> t.c:5:21: note:   can narrow to signed:15 without loss of precision: _7 = 
> _6 << s_12(D);
> t.c:5:21: note:   only the low 15 bits of _6 are significant
> t.c:5:21: note:   _6 has range [0xff80, 0x7f]
> ...
> t.c:5:21: note:   vect_recog_over_widening_pattern: detected: _7 = _6 << 
> s_12(D);
> t.c:5:21: note:   demoting int to signed short
> t.c:5:21: note:   Splitting statement: _6 = (int) _5;
> t.c:5:21: note:   into pattern statements: patt_24 = (signed short) _5;
> t.c:5:21: note:   and: patt_23 = (int) 

[Bug rtl-optimization/110867] [14 Regression] ICE in combine after 7cdd0860949c6c3232e6cff1d7ca37bb5234074c

2023-08-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110867

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Stefan Schulze Frielinghaus
:

https://gcc.gnu.org/g:41ef5a34161356817807be3a2e51fbdbe575ae85

commit r14-2932-g41ef5a34161356817807be3a2e51fbdbe575ae85
Author: Stefan Schulze Frielinghaus 
Date:   Wed Aug 2 21:43:22 2023 +0200

rtl-optimization/110867 Fix narrow comparison of memory and constant

In certain cases a constant may not fit into the mode used to perform a
comparison.  This may be the case for sign-extended constants which are
used during an unsigned comparison as e.g. in

(set (reg:CC 100 cc)
(compare:CC (mem:SI (reg/v/f:SI 115 [ a ]) [1 *a_4(D)+0 S4 A64])
(const_int -2147483648 [0x8000])))

Fixed by ensuring that the constant fits into comparison mode.

Furthermore, on some targets as e.g. sparc the constant used in a
comparison is chopped off before combine which leads to failing test
cases (see PR 110869).  Fixed by not requiring that the source mode has
to be DImode, and excluding sparc from the last two test cases entirely
since there the constant cannot be further reduced.

gcc/ChangeLog:

PR rtl-optimization/110867
* combine.cc (simplify_compare_const): Try the optimization only
in case the constant fits into the comparison mode.

gcc/testsuite/ChangeLog:

PR rtl-optimization/110869
* gcc.dg/cmp-mem-const-1.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-2.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-3.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-4.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-5.c: Exclude sparc since here the
constant is already reduced.
* gcc.dg/cmp-mem-const-6.c: Exclude sparc since here the
constant is already reduced.

[Bug middle-end/110869] [14 regression] ICE in decompose, at rtl.h:2297

2023-08-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110869

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Stefan Schulze Frielinghaus
:

https://gcc.gnu.org/g:41ef5a34161356817807be3a2e51fbdbe575ae85

commit r14-2932-g41ef5a34161356817807be3a2e51fbdbe575ae85
Author: Stefan Schulze Frielinghaus 
Date:   Wed Aug 2 21:43:22 2023 +0200

rtl-optimization/110867 Fix narrow comparison of memory and constant

In certain cases a constant may not fit into the mode used to perform a
comparison.  This may be the case for sign-extended constants which are
used during an unsigned comparison as e.g. in

(set (reg:CC 100 cc)
(compare:CC (mem:SI (reg/v/f:SI 115 [ a ]) [1 *a_4(D)+0 S4 A64])
(const_int -2147483648 [0x8000])))

Fixed by ensuring that the constant fits into comparison mode.

Furthermore, on some targets as e.g. sparc the constant used in a
comparison is chopped off before combine which leads to failing test
cases (see PR 110869).  Fixed by not requiring that the source mode has
to be DImode, and excluding sparc from the last two test cases entirely
since there the constant cannot be further reduced.

gcc/ChangeLog:

PR rtl-optimization/110867
* combine.cc (simplify_compare_const): Try the optimization only
in case the constant fits into the comparison mode.

gcc/testsuite/ChangeLog:

PR rtl-optimization/110869
* gcc.dg/cmp-mem-const-1.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-2.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-3.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-4.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-5.c: Exclude sparc since here the
constant is already reduced.
* gcc.dg/cmp-mem-const-6.c: Exclude sparc since here the
constant is already reduced.

[Bug c/110878] -Wstringop-overflow incorrectly warns about arguments to functions with static array parameter declarations

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110878

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||108154

--- Comment #2 from Andrew Pinski  ---
This is basically a dup of bug 108154 I think.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108154
[Bug 108154] Inappropriate -Wstringop-overread in the C99 [static n] func param
decl

[Bug c/110878] -Wstringop-overflow incorrectly warns about arguments to functions with static array parameter declarations

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110878

--- Comment #1 from Andrew Pinski  ---
There is another bug report dealing with this. But IIRC this is an expected
warning as foo is being passed an array which is size 16 but then passed to bar
as size 128 which would be undefined.

[Bug c++/110879] New: Unnecessary reread from memory in a loop

2023-08-02 Thread palevichva at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110879

Bug ID: 110879
   Summary: Unnecessary reread from memory in a loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palevichva at gmail dot com
  Target Milestone: ---

Created attachment 55678
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55678=edit
preprocessed file by g++ from revision dd2eb972a

I've found a strange regression in optimization. Trunk version of g++ produces
less optimal assembly. It rereads same memory location in every iteration of a
loop. More specifically, it rereads fields _M_finish and _M_end_of_storage of a
vector from memory every push_back call, although it is not necessary.
Released version 13.2 doesn't do that, and just uses values from registers.

I'm compiling following code:

#include 

std::vector f(std::size_t n) {
std::vector res;
res.reserve(n);
for (std::size_t i = 0; i < n; ++i) {
res.push_back(i*i);
}
return res;
}

The main body of a loop looks like this:
~/.local/gcc/bin/g++ -S -fverbose-asm -O3 -std=c++20 pb.cpp

>.L41:
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/stl_construct.h:97: { 
>return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
>movl%r15d, (%rbx)   # _3, *prephitmp_51
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:119:  
>++this->_M_impl._M_finish;
>addq$4, %rbx#, tmp135
>movq%rbx, 8(%rbp)   # tmp135, 
> res_8(D)->D.35756._M_impl.D.35067._M_finish
>.L8:
># pb.cpp:6: for (std::size_t i = 0; i < n; ++i) {
>addq$1, %r13#, i
># pb.cpp:6: for (std::size_t i = 0; i < n; ++i) {
>cmpq%r13, %r12  # i, n
>je  .L1 #,
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:114:  if 
>(this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>movq8(%rbp), %rbx   # res_8(D)->D.35756._M_impl.D.35067._M_finish, 
> prephitmp_51
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:114:  if 
>(this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>movq16(%rbp), %rax  # 
> res_8(D)->D.35756._M_impl.D.35067._M_end_of_storage, pretmp_52
>.L16:
># pb.cpp:7: res.push_back(i*i);
>movl%r13d, %r15d# i, _3
>imull   %r13d, %r15d# i, _3
># /home/scaiper/.local/gcc/include/c++/14.0.0/bits/vector.tcc:114:  if 
>(this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>cmpq%rax, %rbx  # pretmp_52, prephitmp_51
>jne .L41#,

Same loop as produced by 13.2:
~/.local/gcc-13.2/bin/g++ -v -S -fverbose-asm -O3 -std=c++20 pb.cpp

>.L43:
># /home/scaiper/.local/gcc-13.2/include/c++/13.2.0/bits/stl_construct.h:97:
> { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
>movl%r12d, (%rcx)   # _3, *prephitmp_4
># /home/scaiper/.local/gcc-13.2/include/c++/13.2.0/bits/vector.tcc:119:
> ++this->_M_impl._M_finish;
>addq$4, %rcx#, prephitmp_4
>movq%rcx, 8(%rbp)   # prephitmp_4, 
> res_8(D)->D.35699._M_impl.D.35010._M_finish
>.L8:
># pb.cpp:6: for (std::size_t i = 0; i < n; ++i) {
>addq$1, %rbx#, i
># pb.cpp:6: for (std::size_t i = 0; i < n; ++i) {
>cmpq%rbx, %r13  # i, n
>je  .L1 #,
>.L18:
># pb.cpp:7: res.push_back(i*i);
>movl%ebx, %r12d # i, _3
>imull   %ebx, %r12d # i, _3
># /home/scaiper/.local/gcc-13.2/include/c++/13.2.0/bits/vector.tcc:114:
> if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
>cmpq%r8, %rcx   # prephitmp_74, prephitmp_4
>jne .L43#,

Notice this extra commands in the first snippet:
movq8(%rbp), %rbx
movq16(%rbp), %rax

I've bisected this problem to the commit dd2eb972a (libstdc++: Use RAII in
std::vector::_M_realloc_insert)
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=dd2eb972a5b063e10c83878d5c9336a818fa8291).
It doesn't look like commit is the problem. Code looks pretty equivalent. But
for some reason compiler produces different result.

I'm using version built from aforementioned commit dd2eb972a:
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-languages=c++ --disable-multilib
--prefix=/home/scaiper/.local/gcc
gcc version 14.0.0 20230623 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-S' '-fverbose-asm' '-O3' '-std=c++20'
'-shared-libgcc' '-mtune=generic' '-march=x86-64'

Comparing with 13.2:
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --enable-languages=c++ --disable-multilib
--prefix=/home/scaiper/.local/gcc-13.2
gcc version 13.2.0 (GCC)
COLLECT_GCC_OPTIONS='-v' '-S' '-fverbose-asm' '-O3' '-std=c++20'
'-shared-libgcc' '-mtune=generic' 

[Bug c/110878] New: -Wstringop-overread incorrectly warns about arguments to functions with static array parameter declarations

2023-08-02 Thread campbell+gcc-bugzilla at mumble dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110878

Bug ID: 110878
   Summary: -Wstringop-overread incorrectly warns about arguments
to functions with static array parameter declarations
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: campbell+gcc-bugzilla at mumble dot net
  Target Milestone: ---

Isolated from code passing a pointer into an array and the length of the array
as separate arguments, where each function has the minimum length of the array
encoded in its parameter declaration, and uses runtime conditionals to
guarantee the minimum is met:

// bar(p, n) may access p[0], p[1], ..., p[n-1], and requires n >= 128
void bar(unsigned char[static 128], unsigned);

// foo(p, n) may access p[0], p[1], ..., p[n-1], and requires n >= 16
void
foo(unsigned char p[static 16], unsigned n)
{

if (n % 128)
n -= n % 128;
if (n)
bar(p, n);
}

: In function 'foo':
:12:17: error: 'bar' accessing 128 bytes in a region of size 16
[-Werror=stringop-overflow=]
   12 | bar(p, n);
  | ^
:12:17: note: referencing argument 1 of type 'unsigned char[128]'
:2:6: note: in a call to function 'bar'
2 | void bar(unsigned char[static 128], unsigned n);
  |  ^~~
cc1: all warnings being treated as errors
Compiler returned: 1

Reproduced in GCC 10.5, 11.4, and 12.3.  Not reproduced in any earlier versions
of GCC.

Using `if (n >= 128)' doesn't change anything, presumably because GCC doesn't
know the connection between p and n.

GCC support for extensions from later standards

2023-08-02 Thread Nikolas Klauser
Hi everyone!

I'm working on libc++ and we are currently discussing using language extensions 
from later standards 
(https://discourse.llvm.org/t/rfc-use-language-extensions-from-future-standards-in-libc/71898/4).
 By that I mean things like using `if constexpr` with `-std=c++11`. GCC has 
quite a lot of these kinds of conforming extensions, but doesn't document them 
AFAICT. While discussing using these extensions, the question came up what GCCs 
support policy for these is. Aaron was kind enough to answer these questions 
for us on the Clang side. Since I couldn't find anything in the documentation, 
I thought I'd ask here.

So, here are my questions:

Do you expect that these extensions will ever be removed for some reason? If 
yes, what could those reasons be?

Would you be interested in documenting them?

Aaron noted that we should ask the Clang folks before using them, so they can 
evaluated whether the extension makes sense, since they might not be aware of 
them, and some might be broken. So I'd be interested whether you would also 
like us to ask whether you want to actually support these extensions.

Thanks,
Nikolas

Re: [PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2023-08-02 Thread Qing Zhao via Gcc-patches
Ping…

thanks.

Qing

> On Jul 10, 2023, at 3:11 PM, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the change for the GCC14 releaes Notes on the deprecating of a C
> extension about flexible array members.
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> 
> 
> 
> *htdocs/gcc-14/changes.html (Caveats): Add notice about deprecating a C
> extension about flexible array members.
> ---
> htdocs/gcc-14/changes.html | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 3f797642..c7f2ce4d 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -30,7 +30,15 @@ a work-in-progress.
> 
> Caveats
> 
> -  ...
> +  C:
> +  Support for the GCC extension, a structure containing a C99 flexible 
> array
> +  member, or a union containing such a structure, is not the last field 
> of
> +  another structure, is deprecated. Refer to
> +  https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html;>
> +  Zero Length Arrays.
> +  Any code relying on this extension should be modifed to ensure that
> +  C99 flexible array members only end up at the ends of structures.
> +  
> 
> 
> 
> -- 
> 2.31.1
> 



Re: [PATCH] gcc-13/changes.html: Add and fix URL to -fstrict-flex-array option.

2023-08-02 Thread Qing Zhao via Gcc-patches
Ping.

This is a very simple patch to correct a URL address in GCC13’s changes.html.
Currently, it’s pointing to a wrong address.

Okay for committing? 

> On Jul 21, 2023, at 3:02 PM, Qing Zhao  wrote:
> 
> Hi,
> 
> In the current GCC13 release note, the URL to the option -fstrict-flex-array
> is wrong (pointing to -Wstrict-flex-array).
> This is the change to correct the URL and also add the URL in another place
> where -fstrict-flex-array is mentioned.
> 
> I have checked the resulting HTML file, works well.
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> ---
> htdocs/gcc-13/changes.html | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 68e8c5cc..39b63a84 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -46,7 +46,7 @@ You may also want to check out our
>   will no longer issue warnings for out of
>   bounds accesses to trailing struct members of one-element array type
>   anymore. Instead it diagnoses accesses to trailing arrays according to
> -  -fstrict-flex-arrays. 
> +   href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays;>-fstrict-flex-arrays.
>  
>  href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Static-Analyzer-Options.html;>-fanalyzer
>   is still only suitable for analyzing C code.
>   In particular, using it on C++ is unlikely to give meaningful 
> output.
> @@ -213,7 +213,7 @@ You may also want to check out our
>  flexible array member for the purpose of accessing the elements of such
>  an array. By default, all trailing arrays in aggregates are treated as
>  flexible array members. Use the new command-line option
> -  href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Warning-Options.html#index-Wstrict-flex-arrays;>-fstrict-flex-arrays
> +  href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays;>-fstrict-flex-arrays
>  to control which array members are treated as flexible arrays.
>  
> 
> -- 
> 2.31.1
> 



Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread Robin Dapp via Gcc-patches
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?

That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.

> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.

Regards
 Robin


Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek  wrote:
>
> On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> >
> > Hi Eric, thanks for the updated patch.
> >
> > Overall, looks good to me, although I'd drop the "Exited." from the
> > "sorry" message (and thus from the dg-message directive), since the
> > compiler is not exiting, it's just the particular plugin that's giving
> > up (but let's not hold up the patch with a "bikeshed" discussion on the
> > precise wording).
> >
> > If Joseph or Marek approves the C parts of the patch, this will be OK
> > to push to trunk.
>
Sounds good. Revised.
>
> > > index cf82b0306d1..617111b0f0a 100644
> > > --- a/gcc/c/c-parser.cc
> > > +++ b/gcc/c/c-parser.cc
> > > @@ -1695,6 +1695,32 @@ public:
> > >  return NULL_TREE;
> > >}
> > >
> > > +  tree
> > > +  lookup_type_by_id (tree id) const final override
> > > +  {
> > > +if (tree type_decl = lookup_name (id))
> > > +  {
> > > +   if (TREE_CODE (type_decl) == TYPE_DECL)
> > > + {
> > > +   tree record_type = TREE_TYPE (type_decl);
> > > +   if (TREE_CODE (record_type) == RECORD_TYPE)
> > > + return record_type;
> > > + }
> > > +  }
>
> I'd drop this set of { }, like below.  OK with that adjusted, thanks.
Sounds good — fixed.
>
> > > +
> > > +return NULL_TREE;
> > > +  }
> > > +
> > > +  tree
> > > +  lookup_global_var_by_id (tree id) const final override
> > > +  {
> > > +if (tree var_decl = lookup_name (id))
> > > +  if (TREE_CODE (var_decl) == VAR_DECL)
> > > +   return var_decl;
> > > +
> > > +return NULL_TREE;
> > > +  }
> > > +
> > >  private:
> > >/* Attempt to get an INTEGER_CST from MACRO.
> > >   Only handle the simplest cases: where MACRO's definition is a
>
> Marek
>

Thank you, everyone. I've submitted a new patch with the described
changes. As I do not yet have write access, could someone please help
me commit it? Otherwise, please let me know if I should request write
access first (the GettingStarted page suggested requesting someone
commit the patch for the first few patches before requesting write
access).

Best,
Eric


[PATCH v3] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
Revised:
-- Remove superfluous { }
-- Reword diagnostic

---

This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
  analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  24 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230 ++
 .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
 6 files changed, 295 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c

diff --git a/gcc/analyzer/analyzer-language.cc 
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..85400288a93 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map  
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER
 
 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+static void
+run_callbacks (logger *logger, const translation_unit )
+{
+  for (auto const  : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}
 
 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit )
 the_logger.set_logger (new logger (logfile, 0, 0,
   *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }
 
 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H
 
+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER
 
 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };
 
+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */
 
 void on_finish_translation_unit (const translation_unit );
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index cf82b0306d1..a3f216d90f8 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,30 @@ public:
 return NULL_TREE;
   }
 
+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+   if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+   tree record_type = TREE_TYPE (type_decl);
+   if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+   return var_decl;
+
+return NULL_TREE;
+  }
+
 private:
   /* Attempt to get an INTEGER_CST from MACRO.
  Only handle the simplest cases: where MACRO's definition is a single
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-08-02 Thread Martin Uecker via Gcc-patches
Am Mittwoch, dem 02.08.2023 um 16:45 + schrieb Qing Zhao:
> 
> > On Aug 1, 2023, at 10:31 AM, Martin Uecker  wrote:
> > 
> > Am Dienstag, dem 01.08.2023 um 13:27 + schrieb Qing Zhao:
> > > 
> > > > On Aug 1, 2023, at 3:51 AM, Martin Uecker via Gcc-patches 
> > > >  wrote:
> > > > 
> > 
> > 
> > > > > Hi Martin,
> > > > > Just wondering if it'd be a good idea perhaps to warn if alloc size is
> > > > > not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
> > > > > So it can catch cases like:
> > > > > int *p = malloc (sizeof (int) + 2); // probably intended malloc
> > > > > (sizeof (int) * 2)
> > > > > 
> > > > > FWIW, this is caught using -fanalyzer:
> > > > > f.c: In function 'f':
> > > > > f.c:3:12: warning: allocated buffer size is not a multiple of the
> > > > > pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > > > >3 |   int *p = __builtin_malloc (sizeof(int) + 2);
> > > > >  |^~
> > > > > 
> > > > > Thanks,
> > > > > Prathamesh
> > > > 
> > > > Yes, this is probably a good idea.  It might need special
> > > > logic for flexible array members then...
> > > 
> > > Why special logic for FAM on such warning? (Not a multiple of 
> > > TYPE_SIZE_UNIT for the element).
> > > 
> > 
> > For
> > 
> > struct { int n; char buf[]; } *p = malloc(sizeof *p + n);
> > p->n = n;
> > 
> > the size would not be a multiple.
> 
> But n is still a multiple of sizeof (char), right? Do I miss anything here?

Right, for a struct with FAM we could check that it is
sizeof () plus a multiple of the element size of the FAM.
Still special logic... 

Martin


> Qing
> > 
> > Martin
> > 
> > 
> > 
> > 
> 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging




RE: [PATCH 2/2][frontend]: Add novector C pragma

2023-08-02 Thread Joseph Myers
On Wed, 2 Aug 2023, Tamar Christina via Gcc-patches wrote:

> Ping.
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, July 26, 2023 8:35 PM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: nd ; jos...@codesourcery.com
> > Subject: RE: [PATCH 2/2][frontend]: Add novector C pragma
> > 
> > Hi, This is a respin of the patch taking in the feedback received from the 
> > C++
> > part.
> > 
> > Simultaneously it's also a ping 

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Biener via Gcc-patches
On Wed, 2 Aug 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 1 Aug 2023, Richard Sandiford wrote:
> >
> >> Richard Sandiford  writes:
> >> > Richard Biener via Gcc-patches  writes:
> >> >> The following makes sure to limit the shift operand when vectorizing
> >> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
> >> >> operand otherwise invokes undefined behavior.  When we determine
> >> >> whether we can demote the operand we know we at most shift in the
> >> >> sign bit so we can adjust the shift amount.
> >> >>
> >> >> Note this has the possibility of un-CSEing common shift operands
> >> >> as there's no good way to share pattern stmts between patterns.
> >> >> We'd have to separately pattern recognize the definition.
> >> >>
> >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >> >>
> >> >> Not sure about LSHIFT_EXPR, it probably has the same issue but
> >> >> the fallback optimistic zero for out-of-range shifts is at least
> >> >> "corrrect".  Not sure we ever try to demote rotates (probably not).
> >> >
> >> > I guess you mean "correct" for x86?  But that's just a quirk of x86.
> >> > IMO the behaviour is equally wrong for LSHIFT_EXPR.
> >
> > I meant "correct" for the constant folding that evaluates out-of-bound
> > shifts as zero.
> >
> >> Sorry for the multiple messages.  Wanted to get something out quickly
> >> because I wasn't sure how long it would take me to write this...
> >> 
> >> On rotates, for:
> >> 
> >> void
> >> foo (unsigned short *restrict ptr)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> {
> >>   unsigned int x = ptr[i] & 0xff0;
> >>   ptr[i] = (x << 1) | (x >> 31);
> >> }
> >> }
> >> 
> >> we do get:
> >> 
> >> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;
> >> 
> >> although aarch64 doesn't provide rrotate patterns, so nothing actually
> >> comes of it.
> >
> > I think it's still correct that we only need unsigned:13 for the input,
> > we know other bits are zero.  But of course when actually applying
> > this as documented
> >
> > /* Record that STMT_INFO could be changed from operating on TYPE to
> >operating on a type with the precision and sign given by PRECISION
> >and SIGN respectively.
> >
> > the operation itself has to be altered (the above doesn't suggest
> > promoting/demoting the operands to TYPE is the only thing to do).
> >
> > So it seems to be the burden is on the consumers of the information?
> 
> Yeah, textually that seems fair.  Not sure I was thinking of it in
> those terms at the time though. :)
> 
> >> I think the handling of variable shifts is flawed for other reasons.  
> >> Given:
> >> 
> >> void
> >> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> void
> >> us (unsigned short *restrict ptr1, short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> void
> >> su (short *restrict ptr1, unsigned short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> void
> >> ss (short *restrict ptr1, short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> we only narrow uu and ss, due to:
> >> 
> >>/* Ignore codes that don't take uniform arguments.  */
> >>if (!types_compatible_p (TREE_TYPE (op), type))
> >>  return;
> >
> > I suppose that's because we care about the shift operand at all here.
> > We could possibly use [0 .. precision-1] as known range for it
> > and only if that doesn't fit 'type' give up (and otherwise simply
> > ignore the input range of the shift operands here).
> >
> >> in vect_determine_precisions_from_range.  Maybe we should drop
> >> the shift handling from there and instead rely on
> >> vect_determine_precisions_from_users, extending:
> >> 
> >>if (TREE_CODE (shift) != INTEGER_CST
> >>|| !wi::ltu_p (wi::to_widest (shift), precision))
> >>  return;
> >> 
> >> to handle ranges where the max is known to be < precision.
> >> 
> >> There again, if masking is enough for right shifts and right rotates,
> >> maybe we should keep the current handling for then (with your fix)
> >> and skip the types_compatible_p check for those cases.
> >
> > I think it should be enough for left-shifts as well?  If we lshift
> > out like 0x100 << 9 so the lhs range is [0,0] the input range from
> > op0 will still make us use HImode.  I think we only ever get overly
> > conservative answers for left-shifts from this function?
> 
> But if we have:
> 
>   short x, y;
>   int z = (int) x << (int) y;
> 
> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
> whereas x << y would invoke UB and x << (y & 15) would be 1.

True, but we start with the range of the LHS which in this case
would be of type 

[Bug fortran/110877] New: Incorrect copy of allocatable component in polymorphic assignment from array dummy argument

2023-08-02 Thread townsend at astro dot wisc.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110877

Bug ID: 110877
   Summary: Incorrect copy of allocatable component in polymorphic
assignment from array dummy argument
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: townsend at astro dot wisc.edu
  Target Milestone: ---

I've run into a problem that's demonstrated by the following code:

--
module avs_m

   type :: foo_t
   end type foo_t

   type, extends(foo_t) :: bar_t
  real, allocatable :: a
   end type bar_t

end module avs_m

program assign_vs_source

   use avs_m

   implicit none

   class(foo_t), allocatable :: foo(:)

   allocate(bar_t::foo(1))
   select type(foo)
   class is (bar_t)
  allocate(foo(1)%a)
   end select

   call check_assign(foo)

contains

   subroutine check_assign(f)

  class(foo_t), intent(in)  :: f(:)
  class(foo_t), allocatable :: g(:)

  g = f

  select type(g)
  class is (bar_t)
 print *,'is allocated?', allocated(g(1)%a)
  end select

  deallocate(g)
  allocate(g, SOURCE=f)

  select type(g)
  class is (bar_t)
 print *,'is allocated?', allocated(g(1)%a)
  end select

   end subroutine check_assign

end program assign_vs_source
--

Expected output is 

 is allocated? T
 is allocated? T

but instead I get (gfortran 13.1.0, MacOS 13.4 x86_64):

 is allocated? F
 is allocated? T

It seems that the polymorphic assignment g=f is not correctly allocating the %a
component -- but the sourced allocation is. The problem seems to go away if (1)
I use scalars for foo, f and g, or (2) if I move the code from the check_assign
subroutine to the main program.

cheers,

Rich

[committed][RISC-V] Fix 20010221-1.c with zicond

2023-08-02 Thread Jeff Law via Gcc-patches



So we're being a bit too aggressive with the .opt zicond patterns.



(define_insn "*czero.eqz..opt1"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "1")
  (match_operand:GPR 3 "register_operand" "r")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[2])"
  "czero.eqz\t%0,%3,%1"
)

The RTL semantics here are op0 = (op1 == 0) ? op1 : op2.  That maps 
directly to czero.eqz.  ie, we select op1 when we know it's zero, op2 
otherwise.  So this pattern is fine.





(define_insn "*czero.eqz..opt2"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "r")
  (match_operand:GPR 3 "register_operand" "1")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1],  operands[3])"
  "czero.nez\t%0,%2,%1"
)


The RTL semantics of this pattern are are: op0 = (op1 == 0) ? op2 : op1;

That's not something that can be expressed by the zicond extension as it 
selects op1 if and only if op1 is not equal to zero.





(define_insn "*czero.nez..opt3"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "r")
  (match_operand:GPR 3 "register_operand" "1")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[3])"
  "czero.eqz\t%0,%2,%1"
)
The RTL semantics of this pattern are op0 = (op1 != 0) ? op2 : op1. 
That maps to czero.nez.  But the output template uses czero.eqz.  Opps.



(define_insn "*czero.nez..opt4"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "1")
  (match_operand:GPR 3 "register_operand" "r")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[2])"
  "czero.nez\t%0,%3,%1"
)
The RTL semantics of this pattern are op0 = (op1 != 0) ? op1 : op2 which 
obviously doesn't match to any zicond instruction as op1 is selected 
when it is not zero.



So two of the patterns are just totally bogus as they are not 
implementable with zicond.  They are removed.  The asm template for the 
.opt3 pattern is fixed to use czero.nez and its name is changed to .opt2.


This fixes the known issues with the zicond.md bits.  Onward to the rest 
of the expansion work :-)


Committed to the trunk,

jeff

commit 1d5bc3285e8a115538442dc2aaa34d2b509e1f6e
Author: Jeff Law 
Date:   Wed Aug 2 13:16:23 2023 -0400

[committed][RISC-V] Fix 20010221-1.c with zicond

So we're being a bit too aggressive with the .opt zicond patterns.

> (define_insn "*czero.eqz..opt1"
>   [(set (match_operand:GPR 0 "register_operand"   "=r")
> (if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
>   (const_int 0))
>   (match_operand:GPR 2 "register_operand" "1")
>   (match_operand:GPR 3 "register_operand" "r")))]
>   "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[2])"
>   "czero.eqz\t%0,%3,%1"
> )
The RTL semantics here are op0 = (op1 == 0) ? op1 : op2.  That maps
directly to czero.eqz.  ie, we select op1 when we know it's zero, op2
otherwise.  So this pattern is fine.

> (define_insn "*czero.eqz..opt2"
>   [(set (match_operand:GPR 0 "register_operand"   "=r")
> (if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
>   (const_int 0))
>   (match_operand:GPR 2 "register_operand" "r")
>   (match_operand:GPR 3 "register_operand" "1")))]
>   "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1],  operands[3])"
>   "czero.nez\t%0,%2,%1"
> )

The RTL semantics of this pattern are are: op0 = (op1 == 0) ? op2 : op1;

That's not something that can be expressed by the zicond extension as it
selects op1 if and only if op1 is not equal to zero.

> (define_insn "*czero.nez..opt3"
>   [(set (match_operand:GPR 0 "register_operand"   "=r")
> (if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
>   (const_int 0))
>   (match_operand:GPR 2 "register_operand" "r")
>   

Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Marek Polacek via Gcc-patches
On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> 
> Hi Eric, thanks for the updated patch.
> 
> Overall, looks good to me, although I'd drop the "Exited." from the
> "sorry" message (and thus from the dg-message directive), since the
> compiler is not exiting, it's just the particular plugin that's giving
> up (but let's not hold up the patch with a "bikeshed" discussion on the
> precise wording).
> 
> If Joseph or Marek approves the C parts of the patch, this will be OK
> to push to trunk.

[...]

> > index cf82b0306d1..617111b0f0a 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -1695,6 +1695,32 @@ public:
> >  return NULL_TREE;
> >    }
> >  
> > +  tree
> > +  lookup_type_by_id (tree id) const final override
> > +  {
> > +    if (tree type_decl = lookup_name (id))
> > +  {
> > +   if (TREE_CODE (type_decl) == TYPE_DECL)
> > + {
> > +   tree record_type = TREE_TYPE (type_decl);
> > +   if (TREE_CODE (record_type) == RECORD_TYPE)
> > + return record_type;
> > + }
> > +  }

I'd drop this set of { }, like below.  OK with that adjusted, thanks.

> > +
> > +    return NULL_TREE;
> > +  }
> > +
> > +  tree
> > +  lookup_global_var_by_id (tree id) const final override
> > +  {
> > +    if (tree var_decl = lookup_name (id))
> > +  if (TREE_CODE (var_decl) == VAR_DECL)
> > +   return var_decl;
> > +
> > +    return NULL_TREE;
> > +  }
> > +
> >  private:
> >    /* Attempt to get an INTEGER_CST from MACRO.
> >   Only handle the simplest cases: where MACRO's definition is a

Marek



Re: Where to place warning about non-optimized tail and sibling calls

2023-08-02 Thread Bradley Lucier via Gcc

On 8/1/23 6:08 PM, David Malcolm wrote:

FWIW I added it to support Scheme from libgccjit;


Do you know of any Scheme using libgccjit?

BTW, I tried to build mainline with --enable-coverage to see which code 
is executed with -foptimize-sibling-calls, but bootstrap fails with


/home/lucier/programs/gcc/objdirs/gcc-mainline/./prev-gcc/xg++ 
-B/home/lucier/programs/gcc/objdirs/gcc-mainline/./prev-gcc/ 
-B/pkgs/gcc-mainline/x86_64-pc-linux-gnu/bin/ -nostdinc++ 
-B/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-B/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs 

-I/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu 

-I/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/include 
 -I/home/lucier/programs/gcc/gcc-mainline/libstdc++-v3/libsupc++ 
-L/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-L/home/lucier/programs/gcc/objdirs/gcc-mainline/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs 
 -fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC  -fprofile-arcs 
-ftest-coverage -frandom-seed=opts.o -O0 -fkeep-static-functions 
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
-Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common 
-DHAVE_CONFIG_H -fno-PIE -I. -I. -I../../../gcc-mainline/gcc 
-I../../../gcc-mainline/gcc/. -I../../../gcc-mainline/gcc/../include 
-I../../../gcc-mainline/gcc/../libcpp/include 
-I../../../gcc-mainline/gcc/../libcody 
-I../../../gcc-mainline/gcc/../libdecnumber 
-I../../../gcc-mainline/gcc/../libdecnumber/bid -I../libdecnumber 
-I../../../gcc-mainline/gcc/../libbacktrace   -o opts.o -MT opts.o -MMD 
-MP -MF ./.deps/opts.TPo ../../../gcc-mainline/gcc/opts.cc
../../../gcc-mainline/gcc/opts.cc: In function 'void 
print_filtered_help(unsigned int, unsigned int, unsigned int, unsigned 
int, gcc_options*, unsigned int)':
../../../gcc-mainline/gcc/opts.cc:1687:26: error: '  ' directive output 
may be truncated writing 2 bytes into a region of size between 1 and 256 
[-Werror=format-truncation=]

 1687 |   "%s  %s", help, _(use_diagnosed_msg));
  |  ^~
../../../gcc-mainline/gcc/opts.cc:1686:22: note: 'snprintf' output 3 or 
more bytes (assuming 258) into a destination of size 256

 1686 | snprintf (new_help, sizeof new_help,
  | ~^~~
 1687 |   "%s  %s", help, _(use_diagnosed_msg));
  |   ~
cc1plus: all warnings being treated as errors



Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-02 Thread Andrew Pinski via Gcc-patches
On Wed, Aug 2, 2023 at 10:13 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
>  wrote:
> >
> > This is a new version of the patch.
> > Instead of doing the matching of inversion comparison directly inside
> > match, creating a new function (bitwise_inverted_equal_p) to do it.
> > It is very similar to bitwise_equal_p that was added in 
> > r14-2751-g2a3556376c69a1fb
> > but instead it says `expr1 == ~expr2`. A follow on patch, will
> > use this function in other patterns where we try to match `@0` and 
> > `(bit_not @0)`.
> >
> > Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
> >
> > Committed as approved after a Bootstrapped and test on x86_64-linux-gnu 
> > with no regressions.
> Hi Andrew,
> Unfortunately, this patch (committed in
> 2bae476b511dc441bf61da8a49cca655575e7dd6) causes
> segmentation fault for pr33133.c on aarch64-linux-gnu because of
> infinite recursion.

A similar issue is recorded as PR 110874 which I am debugging right now.

Thanks,
Andrew

>
> Running the test under gdb shows:
> Program received signal SIGSEGV, Segmentation fault.
> operand_compare::operand_equal_p (this=0x29dc680
> , arg0=0xf7789a68, arg1=0xf7789f30,
> flags=16) at ../../gcc/gcc/fold-const.cc:3088
> 3088{
> (gdb) bt
> #0  operand_compare::operand_equal_p (this=0x29dc680
> , arg0=0xf7789a68, arg1=0xf7789f30,
> flags=16) at ../../gcc/gcc/fold-const.cc:3088
> #1  0x00a90394 in operand_compare::verify_hash_value
> (this=this@entry=0x29dc680 ,
> arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> flags=flags@entry=0, ret=ret@entry=0xfc000157)
> at ../../gcc/gcc/fold-const.cc:4074
> #2  0x00a9351c in operand_compare::verify_hash_value
> (ret=0xfc000157, flags=0, arg1=0xf7789f30,
> arg0=0xf7789a68, this=0x29dc680 ) at
> ../../gcc/gcc/fold-const.cc:4072
> #3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
> , arg0=arg0@entry=0xf7789a68,
> arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
> ../../gcc/gcc/fold-const.cc:3090
> #4  0x00a9791c in operand_equal_p
> (arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
> #5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf7789f30, valueize=
> 0x112d698 ) at
> ../../gcc/gcc/gimple-match-head.cc:284
> #6  0x01d38e80 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf77d0240,
> valueize=0x112d698 ) at
> ../../gcc/gcc/gimple-match-head.cc:296
> #7  0x01d38e80 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf7789f30,
> valueize=0x112d698 ) at
> ../../gcc/gcc/gimple-match-head.cc:296
> #8  0x01d38e80 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf77d0240,
> ...
>
> It seems to recurse cyclically with expr2=0xf7789f30 ->
> expr2=0xf77d0240 eventually leading to segfault.
> while expr1=0xf7789a68 remains same throughout the stack frames.
>
> Thanks,
> Prathamesh
> >
> > PR tree-optimization/100864
> >
> > gcc/ChangeLog:
> >
> > * generic-match-head.cc (bitwise_inverted_equal_p): New function.
> > * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> > (gimple_bitwise_inverted_equal_p): New function.
> > * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> > instead of direct matching bit_not.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/bitops-3.c: New test.
> > ---
> >  gcc/generic-match-head.cc| 42 ++
> >  gcc/gimple-match-head.cc | 71 
> >  gcc/match.pd |  5 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
> >  4 files changed, 183 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> >
> > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > index a71c0727b0b..ddaf22f2179 100644
> > --- a/gcc/generic-match-head.cc
> > +++ b/gcc/generic-match-head.cc
> > @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
> >  return wi::to_wide (expr1) == wi::to_wide (expr2);
> >return operand_equal_p (expr1, expr2, 0);
> >  }
> > +
> > +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> > +   but not necessarily same type.
> > +   The types can differ through nop conversions.  */
> > +
> > +static inline bool
> > +bitwise_inverted_equal_p (tree expr1, tree expr2)
> > +{
> > +  STRIP_NOPS (expr1);
> > +  STRIP_NOPS (expr2);
> > +  if (expr1 == expr2)
> > +return false;
> > +  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> > +return false;
> > +  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
> > +return 

Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-02 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
 wrote:
>
> This is a new version of the patch.
> Instead of doing the matching of inversion comparison directly inside
> match, creating a new function (bitwise_inverted_equal_p) to do it.
> It is very similar to bitwise_equal_p that was added in 
> r14-2751-g2a3556376c69a1fb
> but instead it says `expr1 == ~expr2`. A follow on patch, will
> use this function in other patterns where we try to match `@0` and `(bit_not 
> @0)`.
>
> Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
>
> Committed as approved after a Bootstrapped and test on x86_64-linux-gnu with 
> no regressions.
Hi Andrew,
Unfortunately, this patch (committed in
2bae476b511dc441bf61da8a49cca655575e7dd6) causes
segmentation fault for pr33133.c on aarch64-linux-gnu because of
infinite recursion.

Running the test under gdb shows:
Program received signal SIGSEGV, Segmentation fault.
operand_compare::operand_equal_p (this=0x29dc680
, arg0=0xf7789a68, arg1=0xf7789f30,
flags=16) at ../../gcc/gcc/fold-const.cc:3088
3088{
(gdb) bt
#0  operand_compare::operand_equal_p (this=0x29dc680
, arg0=0xf7789a68, arg1=0xf7789f30,
flags=16) at ../../gcc/gcc/fold-const.cc:3088
#1  0x00a90394 in operand_compare::verify_hash_value
(this=this@entry=0x29dc680 ,
arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
flags=flags@entry=0, ret=ret@entry=0xfc000157)
at ../../gcc/gcc/fold-const.cc:4074
#2  0x00a9351c in operand_compare::verify_hash_value
(ret=0xfc000157, flags=0, arg1=0xf7789f30,
arg0=0xf7789a68, this=0x29dc680 ) at
../../gcc/gcc/fold-const.cc:4072
#3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
, arg0=arg0@entry=0xf7789a68,
arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
../../gcc/gcc/fold-const.cc:3090
#4  0x00a9791c in operand_equal_p
(arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
#5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf7789f30, valueize=
0x112d698 ) at
../../gcc/gcc/gimple-match-head.cc:284
#6  0x01d38e80 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf77d0240,
valueize=0x112d698 ) at
../../gcc/gcc/gimple-match-head.cc:296
#7  0x01d38e80 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf7789f30,
valueize=0x112d698 ) at
../../gcc/gcc/gimple-match-head.cc:296
#8  0x01d38e80 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf77d0240,
...

It seems to recurse cyclically with expr2=0xf7789f30 ->
expr2=0xf77d0240 eventually leading to segfault.
while expr1=0xf7789a68 remains same throughout the stack frames.

Thanks,
Prathamesh
>
> PR tree-optimization/100864
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (bitwise_inverted_equal_p): New function.
> * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> (gimple_bitwise_inverted_equal_p): New function.
> * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> instead of direct matching bit_not.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bitops-3.c: New test.
> ---
>  gcc/generic-match-head.cc| 42 ++
>  gcc/gimple-match-head.cc | 71 
>  gcc/match.pd |  5 +-
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
>  4 files changed, 183 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index a71c0727b0b..ddaf22f2179 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
>  return wi::to_wide (expr1) == wi::to_wide (expr2);
>return operand_equal_p (expr1, expr2, 0);
>  }
> +
> +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> +   but not necessarily same type.
> +   The types can differ through nop conversions.  */
> +
> +static inline bool
> +bitwise_inverted_equal_p (tree expr1, tree expr2)
> +{
> +  STRIP_NOPS (expr1);
> +  STRIP_NOPS (expr2);
> +  if (expr1 == expr2)
> +return false;
> +  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> +return false;
> +  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
> +return wi::to_wide (expr1) == ~wi::to_wide (expr2);
> +  if (operand_equal_p (expr1, expr2, 0))
> +return false;
> +  if (TREE_CODE (expr1) == BIT_NOT_EXPR
> +  && bitwise_equal_p (TREE_OPERAND (expr1, 0), expr2))
> +return true;
> +  if (TREE_CODE (expr2) == BIT_NOT_EXPR
> +  && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
> +return true;
> +  if (COMPARISON_CLASS_P (expr1)
> +  

[PATCH v2 3/3] amdgcn, libgomp: low-latency allocator

2023-08-02 Thread Andrew Stubbs

This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).

Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible.  This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories).  A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.

gcc/ChangeLog:

* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.

libgomp/ChangeLog:

* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
---
 gcc/config/gcn/gcn-builtins.def   |   2 +
 gcc/config/gcn/gcn.cc |  16 ++-
 libgomp/config/gcn/allocator.c| 123 ++
 libgomp/config/gcn/libgomp-gcn.h  |   6 +
 libgomp/config/gcn/team.c |  12 ++
 libgomp/libgomp.h |   3 -
 libgomp/plugin/plugin-gcn.c   |  35 -
 .../testsuite/libgomp.c/omp_alloc-traits.c|   2 +-
 8 files changed, 188 insertions(+), 11 deletions(-)
 create mode 100644 libgomp/config/gcn/allocator.c

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index 636a8e7a1a9..471457d7c23 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -164,6 +164,8 @@ DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
 	 _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
 DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
 	 gcn_expand_builtin_1)
+DEF_BUILTIN (DISPATCH_PTR, -1, "dispatch_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
+	 gcn_expand_builtin_1)
 DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
 	 _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
 
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 02f4dedec42..c4bf0e6ab92 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -109,7 +109,8 @@ gcn_init_machine_status (void)
 
   f = ggc_cleared_alloc ();
 
-  if (TARGET_GCN3)
+  // FIXME: re-enable global addressing with safety for LDS-flat addresses
+  //if (TARGET_GCN3)
 f->use_flat_addressing = true;
 
   return f;
@@ -4881,6 +4882,19 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	  }
 	return ptr;
   }
+case GCN_BUILTIN_DISPATCH_PTR:
+  {
+	rtx ptr;
+	if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0)
+	   ptr = gen_rtx_REG (DImode,
+			  cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+	else
+	  {
+	ptr = gen_reg_rtx (DImode);
+	emit_move_insn (ptr, const0_rtx);
+	  }
+	return ptr;
+  }
 case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
   {
 	/* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
diff --git a/libgomp/config/gcn/allocator.c b/libgomp/config/gcn/allocator.c
new file mode 100644
index 000..151086ea225
--- /dev/null
+++ b/libgomp/config/gcn/allocator.c
@@ -0,0 +1,123 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this 

[PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

2023-08-02 Thread Andrew Stubbs

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(predefined_alloc_mapping): New array.
(omp_aligned_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
(omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE.
* config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable.
(__nvptx_lowlat_init): New prototype.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* basic-allocator.c: New file.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/omp_alloc-1.c: New test.
* testsuite/libgomp.c/omp_alloc-2.c: New test.
* testsuite/libgomp.c/omp_alloc-3.c: New test.
* testsuite/libgomp.c/omp_alloc-4.c: New test.
* testsuite/libgomp.c/omp_alloc-5.c: New test.
* testsuite/libgomp.c/omp_alloc-6.c: New test.

Co-authored-by: Kwok Cheung Yeung  
Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c   | 253 +-
 libgomp/basic-allocator.c | 380 ++
 libgomp/config/nvptx/allocator.c  | 120 +++
 libgomp/config/nvptx/team.c   |  18 +
 libgomp/plugin/plugin-nvptx.c |  23 +-
 libgomp/testsuite/libgomp.c/omp_alloc-1.c |  56 
 libgomp/testsuite/libgomp.c/omp_alloc-2.c |  64 
 libgomp/testsuite/libgomp.c/omp_alloc-3.c |  42 +++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c | 196 +++
 libgomp/testsuite/libgomp.c/omp_alloc-5.c |  63 
 libgomp/testsuite/libgomp.c/omp_alloc-6.c | 117 +++
 11 files changed, 1244 insertions(+), 88 deletions(-)
 create mode 100644 libgomp/basic-allocator.c
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-6.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 90f2dcb60d6..fbf7b1ab061 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -37,6 +37,42 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.
+   The following definitions (ab)use comma operators to avoid unused
+   variable errors.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  malloc (((void)(MEMSPACE), (SIZE)))
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  calloc (1, (((void)(MEMSPACE), (SIZE
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+  realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+  free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.
+   When the user calls omp_alloc with a predefined allocator this
+   table determines what memory they get.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_const_mem_space, /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 enum gomp_numa_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
@@ -522,7 +558,7 @@ retry:
 	}
   else
 #endif
-	ptr = malloc (new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -554,7 +590,13 @@ retry:
 	}
   else
 

[PATCH v2 2/3] openmp, nvptx: low-lat memory access traits

2023-08-02 Thread Andrew Stubbs

The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-traits.c: New test.
---
 libgomp/allocator.c   | 16 +
 libgomp/config/nvptx/allocator.c  | 11 +++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c |  7 +-
 libgomp/testsuite/libgomp.c/omp_alloc-6.c |  7 +-
 .../testsuite/libgomp.c/omp_alloc-traits.c| 68 +++
 5 files changed, 103 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-traits.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index fbf7b1ab061..35b8ec71480 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -56,6 +56,10 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
 #endif
+#ifndef MEMSPACE_VALIDATE
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  (((void)(MEMSPACE), (void)(ACCESS), 1))
+#endif
 
 /* Map the predefined allocators to the correct memory space.
The index to this table is the omp_allocator_handle_t enum value.
@@ -507,6 +511,10 @@ retry:
   if (__builtin_add_overflow (size, new_size, _size))
 goto fail;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
@@ -817,6 +825,10 @@ retry:
   if (__builtin_add_overflow (size_temp, new_size, _size))
 goto fail;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
@@ -1063,6 +1075,10 @@ retry:
 goto fail;
   old_size = data->size;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 6014fba177f..f19ac28d32a 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -108,6 +108,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 return realloc (addr, size);
 }
 
+static inline int
+nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+  /* Disallow use of low-latency memory when it must be accessible by
+ all threads.  */
+  return (memspace != omp_low_lat_mem_space
+	  || access != omp_atv_all);
+}
+
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
 #define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
@@ -116,5 +125,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  nvptx_memspace_validate (MEMSPACE, ACCESS)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/omp_alloc-4.c b/libgomp/testsuite/libgomp.c/omp_alloc-4.c
index 66e13c09234..9d169858151 100644
--- a/libgomp/testsuite/libgomp.c/omp_alloc-4.c
+++ b/libgomp/testsuite/libgomp.c/omp_alloc-4.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
 /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-omp_alloctrait_t traits[1]
-  = { { omp_atk_fallback, omp_atv_null_fb } };
+omp_alloctrait_t traits[2]
+  = { { omp_atk_fallback, omp_atv_null_fb },
+  { omp_atk_access, omp_atv_pteam } };
 omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-			1, traits);
+			2, traits);
 
 int size = 4;
 
diff --git a/libgomp/testsuite/libgomp.c/omp_alloc-6.c b/libgomp/testsuite/libgomp.c/omp_alloc-6.c
index 66bf69b0455..b5f0a296998 100644
--- a/libgomp/testsuite/libgomp.c/omp_alloc-6.c
+++ b/libgomp/testsuite/libgomp.c/omp_alloc-6.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
 /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-omp_alloctrait_t traits[1]
-  = { { omp_atk_fallback, omp_atv_null_fb } };
+omp_alloctrait_t traits[2]
+  

[PATCH v2 0/3] libgomp: OpenMP low-latency omp_alloc

2023-08-02 Thread Andrew Stubbs
This patch series is an updated and reworked version of some of the patch set
posted about a year ago (the other features will be posted soon), this
time supporting amdgcn, in addition to nvptx:

https://patchwork.sourceware.org/project/gcc/list/?series=10748=%2A=both

The series implements device-specific allocators and adds a low-latency
allocator for both GPUs architectures.

The previous review comments have been addressed, I hope, plus a lot of
bugs have been found and fixed since the original post.  With the
addition of amdgcn I have broken out the heap implementation so both
architectures can share the code.

Andrew

Andrew Stubbs (3):
  libgomp, nvptx: low-latency memory allocator
  openmp, nvptx: low-lat memory access traits
  amdgcn, libgomp: low-latency allocator

 gcc/config/gcn/gcn-builtins.def   |   2 +
 gcc/config/gcn/gcn.cc |  16 +-
 libgomp/allocator.c   | 269 +
 libgomp/basic-allocator.c | 380 ++
 libgomp/config/gcn/allocator.c| 123 ++
 libgomp/config/gcn/libgomp-gcn.h  |   6 +
 libgomp/config/gcn/team.c |  12 +
 libgomp/config/nvptx/allocator.c  | 131 ++
 libgomp/config/nvptx/team.c   |  18 +
 libgomp/libgomp.h |   3 -
 libgomp/plugin/plugin-gcn.c   |  35 +-
 libgomp/plugin/plugin-nvptx.c |  23 +-
 libgomp/testsuite/libgomp.c/omp_alloc-1.c |  56 +++
 libgomp/testsuite/libgomp.c/omp_alloc-2.c |  64 +++
 libgomp/testsuite/libgomp.c/omp_alloc-3.c |  42 ++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c | 197 +
 libgomp/testsuite/libgomp.c/omp_alloc-5.c |  63 +++
 libgomp/testsuite/libgomp.c/omp_alloc-6.c | 118 ++
 .../testsuite/libgomp.c/omp_alloc-traits.c|  68 
 19 files changed, 1528 insertions(+), 98 deletions(-)
 create mode 100644 libgomp/basic-allocator.c
 create mode 100644 libgomp/config/gcn/allocator.c
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-traits.c

-- 
2.41.0



Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:

Hi Eric, thanks for the updated patch.

Overall, looks good to me, although I'd drop the "Exited." from the
"sorry" message (and thus from the dg-message directive), since the
compiler is not exiting, it's just the particular plugin that's giving
up (but let's not hold up the patch with a "bikeshed" discussion on the
precise wording).

If Joseph or Marek approves the C parts of the patch, this will be OK
to push to trunk.

Dave

> Revised:
> -- Fix indentation problems
> -- Add more detail to Changelog
> -- Add new test on handling non-CPython code case
> -- Turn off debugging inform by default
> -- Make on_finish_translation_unit() static
> -- Remove superfluous null checks in init_py_structs()
> 
> Changes have been bootstrapped and tested against trunk on aarch64-
> unknown-linux-gnu.
> 
> ---
> This patch adds a hook to the end of ana::on_finish_translation_unit
> which calls relevant stashing-related callbacks registered during
> plugin
> initialization. This feature is used to stash named types and global
> variables for a CPython analyzer plugin [PR107646].
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/107646
>     * analyzer-language.cc (run_callbacks): New function.
>     (on_finish_translation_unit): New function.
>     * analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
>     (class translation_unit): New vfuncs.
> 
> gcc/c/ChangeLog:
> PR analyzer/107646
>     * c-parser.cc: New functions on stashing values for the
>   analyzer.
> 
> gcc/testsuite/ChangeLog:
> PR analyzer/107646
>     * gcc.dg/plugin/plugin.exp: Add new plugin and test.
>     * gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
>     * gcc.dg/plugin/cpython-plugin-test-1.c: New test.
> 
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/analyzer-language.cc |  22 ++
>  gcc/analyzer/analyzer-language.h  |   9 +
>  gcc/c/c-parser.cc |  26 ++
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230
> ++
>  .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
>  gcc/testsuite/gcc.dg/plugin/plugin.exp    |   2 +
>  6 files changed, 297 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> 1.c
> 
> diff --git a/gcc/analyzer/analyzer-language.cc
> b/gcc/analyzer/analyzer-language.cc
> index 2c8910906ee..85400288a93 100644
> --- a/gcc/analyzer/analyzer-language.cc
> +++ b/gcc/analyzer/analyzer-language.cc
> @@ -35,6 +35,26 @@ static GTY (()) hash_map 
> *analyzer_stashed_constants;
>  #if ENABLE_ANALYZER
>  
>  namespace ana {
> +static vec
> +    *finish_translation_unit_callbacks;
> +
> +void
> +register_finish_translation_unit_callback (
> +    finish_translation_unit_callback callback)
> +{
> +  if (!finish_translation_unit_callbacks)
> +    vec_alloc (finish_translation_unit_callbacks, 1);
> +  finish_translation_unit_callbacks->safe_push (callback);
> +}
> +
> +static void
> +run_callbacks (logger *logger, const translation_unit )
> +{
> +  for (auto const  : finish_translation_unit_callbacks)
> +    {
> +  cb (logger, tu);
> +    }
> +}
>  
>  /* Call into TU to try to find a value for NAME.
>     If found, stash its value within analyzer_stashed_constants.  */
> @@ -102,6 +122,8 @@ on_finish_translation_unit (const
> translation_unit )
>  the_logger.set_logger (new logger (logfile, 0, 0,
>    *global_dc->printer));
>    stash_named_constants (the_logger.get_logger (), tu);
> +
> +  run_callbacks (the_logger.get_logger (), tu);
>  }
>  
>  /* Lookup NAME in the named constants stashed when the frontend TU
> finished.
> diff --git a/gcc/analyzer/analyzer-language.h
> b/gcc/analyzer/analyzer-language.h
> index 00f85aba041..8deea52d627 100644
> --- a/gcc/analyzer/analyzer-language.h
> +++ b/gcc/analyzer/analyzer-language.h
> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_ANALYZER_LANGUAGE_H
>  #define GCC_ANALYZER_LANGUAGE_H
>  
> +#include "analyzer/analyzer-logging.h"
> +
>  #if ENABLE_ANALYZER
>  
>  namespace ana {
> @@ -35,8 +37,15 @@ class translation_unit
>   have been seen).  If it is defined and an integer (e.g. either
> as a
>   macro or enum), return the INTEGER_CST value, otherwise return
> NULL.  */
>    virtual tree lookup_constant_by_id (tree id) const = 0;
> +  virtual tree lookup_type_by_id (tree id) const = 0;
> +  virtual tree lookup_global_var_by_id (tree id) const = 0;
>  };
>  
> +typedef void (*finish_translation_unit_callback)
> +   (logger *, const translation_unit &);
> +void register_finish_translation_unit_callback (
> +    finish_translation_unit_callback callback);
> +
>  /* Analyzer hook for frontends to call at the end of the TU.  */
>  
>  void on_finish_translation_unit (const translation_unit );
> diff --git 

[Bug middle-end/110874] [14 Regression] ice with -O2 with recent gcc

2023-08-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110874

--- Comment #4 from Andrew Pinski  ---
Reduced testcase:
```
struct S1 {
  unsigned f0;
};
static int g_161;
void func_109(unsigned g_227, unsigned t) {
  struct S1 l_178;
  int l_160 = 0x1FAE99D5L;
  int *l_230[] = {_160};
  if (l_160) {
for (l_178.f0 = -7; l_178.f0;) {
  ++g_227;
  break;
}
(g_161) = g_227;
  }
  (g_161) &= t;
}
```
There is a stack overflow while executing the FRE pass.

Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-08-02 Thread Jeff Law via Gcc-patches




On 8/2/23 04:05, Richard Sandiford wrote:

Jeff Law via Gcc-patches  writes:

On 8/1/23 05:18, Richard Sandiford wrote:


Where were you seeing the requirement for pointer equality?  genrecog.cc
at least uses rtx_equal_p, and I think it has to.  E.g. some patterns
use (match_dup ...) to match output and input mems, and mem rtxes
shouldn't be shared.

It's a general concern due to the way we handle transforming pseudos
into hard registers after allocation is complete.   We can end up with
two REG expressions that will compare equal according to rtx_equal_p,
but which are not pointer equal.


But isn't that OK?  I don't think there's a requirement for match_dup
pointer equality either before or after RA.  Or at least, there
shouldn't be.  If something happens to rely on pointer equality
for match_dups then I think we should fix it.





So IMO, like you said originally, match_dup would be the right way to
handle this kind of pattern.
I'd assumed that match_dup required pointer equality.  If it doesn't, 
then great, we can adjust the pattern to use match_dup.  I'm about to 
submit some bits to simplify/correct a bit of zicond.md, then I can do 
some testing with match_dup in place now that things seem to be more 
stable on the code generation correctness side.





I don't want to labour the point though.
No worries about that on my end!  I probably don't say it enough, but 
when you raise an issue, it's worth the time to make sure I understand 
your point thoroughly.


In this case I'd assumed that match_dup relied on pointer equality which 
doesn't seem to be the case.  30+ years into this codebase and I'm still 
learning new stuff!


Jeff


Re: LRA for avr: Handling hard regs set directly at expand

2023-08-02 Thread Vladimir Makarov via Gcc



On 7/17/23 07:33, senthilkumar.selva...@microchip.com wrote:

Hi,

   The avr target has a bunch of patterns that directly set hard regs at expand 
time, like so

(define_expand "cpymemhi"
   [(parallel [(set (match_operand:BLK 0 "memory_operand" "")
(match_operand:BLK 1 "memory_operand" ""))
   (use (match_operand:HI 2 "const_int_operand" ""))
   (use (match_operand:HI 3 "const_int_operand" ""))])]
   ""
   {
 if (avr_emit_cpymemhi (operands))
   DONE;

 FAIL;
   })

where avr_emit_cpymemhi generates

(insn 14 13 15 4 (set (reg:HI 30 r30)
 (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1
  (nil))
(insn 15 14 16 4 (set (reg:HI 26 r26)
 (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1
  (nil))
(insn 16 15 17 4 (parallel [
 (set (mem:BLK (reg:HI 26 r26) [0  A8])
 (mem:BLK (reg:HI 30 r30) [0  A8]))
 (unspec [
 (const_int 0 [0])
 ] UNSPEC_CPYMEM)
 (use (reg:QI 52))
 (clobber (reg:HI 26 r26))
 (clobber (reg:HI 30 r30))
 (clobber (reg:QI 0 r0))
 (clobber (reg:QI 52))
 ]) "pr53505.c":21:22 -1
  (nil))

Classic reload knows about these - find_reg masks out bad_spill_regs, and 
bad_spill_regs
when ORed with chain->live_throughout in order_regs_for_reload picks up r30.

LRA, however, appears to not consider that, and proceeds to use such regs as 
reload regs.
For the same source, it generates

  Choosing alt 0 in insn 15:  (0) =r  (1) r {*movhi_split}
   Creating newreg=70, assigning class GENERAL_REGS to r70
15: r26:HI=r70:HI
   REG_EQUAL r28:HI+0x1
 Inserting insn reload before:
58: r70:HI=r28:HI+0x1

Choosing alt 3 in insn 58:  (0) d  (1) 0  (2) nYnn {*addhi3_split}
   Creating newreg=71 from oldreg=70, assigning class LD_REGS to r71
58: r71:HI=r71:HI+0x1
 Inserting insn reload before:
59: r71:HI=r28:HI
 Inserting insn reload after:
60: r70:HI=r71:HI

** Assignment #1: **

 Assigning to 71 (cl=LD_REGS, orig=70, freq=3000, tfirst=71, 
tfreq=3000)...
   Assign 30 to reload r71 (freq=3000)
Hard reg 26 is preferable by r70 with profit 1000
Hard reg 30 is preferable by r70 with profit 1000
 Assigning to 70 (cl=GENERAL_REGS, orig=70, freq=2000, tfirst=70, 
tfreq=2000)...
   Assign 30 to reload r70 (freq=2000)


(insn 14 13 59 3 (set (reg:HI 30 r30)
 (reg:HI 18 r18 [orig:48 ivtmp.10 ] [48])) "pr53505.c":21:22 101 
{*movhi_split}
  (nil))
(insn 59 14 58 3 (set (reg:HI 30 r30 [70])
 (reg/f:HI 28 r28)) "pr53505.c":21:22 101 {*movhi_split}
  (nil))
(insn 58 59 15 3 (set (reg:HI 30 r30 [70])
 (plus:HI (reg:HI 30 r30 [70])
 (const_int 1 [0x1]))) "pr53505.c":21:22 165 {*addhi3_split}
  (nil))
(insn 15 58 16 3 (set (reg:HI 26 r26)
 (reg:HI 30 r30 [70])) "pr53505.c":21:22 101 {*movhi_split}
  (expr_list:REG_EQUAL (plus:HI (reg/f:HI 28 r28)
 (const_int 1 [0x1]))
 (nil)))
(insn 16 15 17 3 (parallel [
 (set (mem:BLK (reg:HI 26 r26) [0  A8])
 (mem:BLK (reg:HI 30 r30) [0  A8]))
 (unspec [
 (const_int 0 [0])
 ] UNSPEC_CPYMEM)
 (use (reg:QI 22 r22 [52]))
 (clobber (reg:HI 26 r26))
 (clobber (reg:HI 30 r30))
 (clobber (reg:QI 0 r0))
 (clobber (reg:QI 22 r22 [52]))
 ]) "pr53505.c":21:22 132 {cpymem_qi}
  (nil))

LRA generates insn 59 that clobbers r30 set in insn 14, causing an execution
failure down the line.

How should the avr backend deal with this?


Sorry for the big delay with the answer.  I was on vacation.

There are probably some ways to fix it by changing patterns as other 
people suggested but I'd like to see the current patterns work for LRA 
as well.


Could you send me the test case on which I could reproduce the problem 
and work on implementing such functionality.





Re: [PATCH 04/14] c++: use _P() defines from tree.h

2023-08-02 Thread Patrick Palka via Gcc-patches
On Thu, Jun 1, 2023 at 2:11 PM Bernhard Reutner-Fischer
 wrote:
>
> Hi David, Patrick,
>
> On Thu, 1 Jun 2023 18:33:46 +0200
> Bernhard Reutner-Fischer  wrote:
>
> > On Thu, 1 Jun 2023 11:24:06 -0400
> > Patrick Palka  wrote:
> >
> > > On Sat, May 13, 2023 at 7:26 PM Bernhard Reutner-Fischer via
> > > Gcc-patches  wrote:
> >
> > > > diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> > > > index 131b212ff73..19dfb3ed782 100644
> > > > --- a/gcc/cp/tree.cc
> > > > +++ b/gcc/cp/tree.cc
> > > > @@ -1173,7 +1173,7 @@ build_cplus_array_type (tree elt_type, tree 
> > > > index_type, int dependent)
> > > >  }
> > > >
> > > >/* Avoid spurious warnings with VLAs (c++/54583).  */
> > > > -  if (TYPE_SIZE (t) && EXPR_P (TYPE_SIZE (t)))
> > > > +  if (CAN_HAVE_LOCATION_P (TYPE_SIZE (t)))
> > >
> > > Hmm, this change seems undesirable...
> >
> > mhm, yes that is misleading. I'll prepare a patch to revert this.
> > Let me have a look if there were other such CAN_HAVE_LOCATION_P changes
> > that we'd want to revert.
>
> Sorry for that!
> I'd revert the hunk above and the one in gcc-rich-location.cc
> (maybe_range_label_for_tree_type_mismatch::get_text), please see
> attached. Bootstrap running, ok for trunk if it passes?

LGTM!

>
> thanks,



[Bug fortran/88286] [OOP] gfortran reports conflicting intent(in) with an intent(in) declared class variable

2023-08-02 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88286

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #3 from kargl at gcc dot gnu.org ---
(In reply to anlauf from comment #2)
> Appears fixed in 12-branch and later.  Adding known-to work.
> 
> Can we close this one?

I think the answer is "yes".  It comes down to "too many bugs and too few
contributors".  If someone can identify the commit that fixed this bug, then by
all means it can be back-ported.  It's unclear to me if it is worth the effort
to find the commit.

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-08-02 Thread Qing Zhao via Gcc-patches



> On Aug 1, 2023, at 10:31 AM, Martin Uecker  wrote:
> 
> Am Dienstag, dem 01.08.2023 um 13:27 + schrieb Qing Zhao:
>> 
>>> On Aug 1, 2023, at 3:51 AM, Martin Uecker via Gcc-patches 
>>>  wrote:
>>> 
> 
> 
 Hi Martin,
 Just wondering if it'd be a good idea perhaps to warn if alloc size is
 not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
 So it can catch cases like:
 int *p = malloc (sizeof (int) + 2); // probably intended malloc
 (sizeof (int) * 2)
 
 FWIW, this is caught using -fanalyzer:
 f.c: In function 'f':
 f.c:3:12: warning: allocated buffer size is not a multiple of the
 pointee's size [CWE-131] [-Wanalyzer-allocation-size]
3 |   int *p = __builtin_malloc (sizeof(int) + 2);
  |^~
 
 Thanks,
 Prathamesh
>>> 
>>> Yes, this is probably a good idea.  It might need special
>>> logic for flexible array members then...
>> 
>> Why special logic for FAM on such warning? (Not a multiple of TYPE_SIZE_UNIT 
>> for the element).
>> 
> 
> For
> 
> struct { int n; char buf[]; } *p = malloc(sizeof *p + n);
> p->n = n;
> 
> the size would not be a multiple.

But n is still a multiple of sizeof (char), right? Do I miss anything here?

Qing
> 
> Martin
> 
> 
> 
> 



Re: [PATCH] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
Hi Dave,

Thank you for the feedback! I've incorporated the changes and sent a
revised version of the patch.

On Tue, Aug 1, 2023 at 1:02 PM David Malcolm  wrote:
>
> On Tue, 2023-08-01 at 09:52 -0400, Eric Feng wrote:
> > Hi all,
> >
> > This patch adds a hook to the end of ana::on_finish_translation_unit
> > which calls relevant stashing-related callbacks registered during
> > plugin
> > initialization. This feature is used to stash named types and global
> > variables for a CPython analyzer plugin [PR107646].
> >
> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Does it look
> > okay?
>
> Hi Eric, thanks for the patch.
>
> The patch touches the C frontend, so those parts would need approval
> from the C FE maintainers/reviewers; I've CCed them.
>
> Overall, I like the patch, but it's not ready for trunk yet; various
> comments inline below...
>
> >
> > ---
> >
> > gcc/analyzer/ChangeLog:
>
> You could add: PR analyzer/107646 to these ChangeLog entries; have a
> look at how other ChangeLog entries refer to such bugzilla entries.
>
> >
> > * analyzer-language.cc (run_callbacks): New function.
> > (on_finish_translation_unit): New function.
> > * analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
> > (class translation_unit): New vfuncs.
> >
> > gcc/c/ChangeLog:
> >
> > * c-parser.cc: New functions.
>
> I think this ChangeLog entry needs more detail.
Added in revised version of the patch.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/plugin/analyzer_cpython_plugin.c: New test.
> >
> > Signed-off-by: Eric Feng 
> > ---
> >  gcc/analyzer/analyzer-language.cc |  22 ++
> >  gcc/analyzer/analyzer-language.h  |   9 +
> >  gcc/c/c-parser.cc |  26 ++
> >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 224
> > ++
> >  4 files changed, 281 insertions(+)
> >  create mode 100644
> > gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> >
> > diff --git a/gcc/analyzer/analyzer-language.cc
> > b/gcc/analyzer/analyzer-language.cc
> > index 2c8910906ee..fc41b9c17b8 100644
> > --- a/gcc/analyzer/analyzer-language.cc
> > +++ b/gcc/analyzer/analyzer-language.cc
> > @@ -35,6 +35,26 @@ static GTY (()) hash_map 
> > *analyzer_stashed_constants;
> >  #if ENABLE_ANALYZER
> >
> >  namespace ana {
> > +static vec
> > +*finish_translation_unit_callbacks;
> > +
> > +void
> > +register_finish_translation_unit_callback (
> > +finish_translation_unit_callback callback)
> > +{
> > +  if (!finish_translation_unit_callbacks)
> > +vec_alloc (finish_translation_unit_callbacks, 1);
> > +  finish_translation_unit_callbacks->safe_push (callback);
> > +}
> > +
> > +void
> > +run_callbacks (logger *logger, const translation_unit )
>
> This function could be "static" since it's not needed outside of
> analyzer-language.cc
>
> > +{
> > +  for (auto const  : finish_translation_unit_callbacks)
> > +{
> > +  cb (logger, tu);
> > +}
> > +}
> >
> >  /* Call into TU to try to find a value for NAME.
> > If found, stash its value within analyzer_stashed_constants.  */
> > @@ -102,6 +122,8 @@ on_finish_translation_unit (const
> > translation_unit )
> >  the_logger.set_logger (new logger (logfile, 0, 0,
> >  *global_dc->printer));
> >stash_named_constants (the_logger.get_logger (), tu);
> > +
> > +  run_callbacks (the_logger.get_logger (), tu);
> >  }
> >
> >  /* Lookup NAME in the named constants stashed when the frontend TU
> > finished.
> > diff --git a/gcc/analyzer/analyzer-language.h
> > b/gcc/analyzer/analyzer-language.h
> > index 00f85aba041..8deea52d627 100644
> > --- a/gcc/analyzer/analyzer-language.h
> > +++ b/gcc/analyzer/analyzer-language.h
> > @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #ifndef GCC_ANALYZER_LANGUAGE_H
> >  #define GCC_ANALYZER_LANGUAGE_H
> >
> > +#include "analyzer/analyzer-logging.h"
> > +
> >  #if ENABLE_ANALYZER
> >
> >  namespace ana {
> > @@ -35,8 +37,15 @@ class translation_unit
> >   have been seen).  If it is defined and an integer (e.g. either
> > as a
> >   macro or enum), return the INTEGER_CST value, otherwise return
> > NULL.  */
> >virtual tree lookup_constant_by_id (tree id) const = 0;
> > +  virtual tree lookup_type_by_id (tree id) const = 0;
> > +  virtual tree lookup_global_var_by_id (tree id) const = 0;
> >  };
> >
> > +typedef void (*finish_translation_unit_callback)
> > +   (logger *, const translation_unit &);
> > +void register_finish_translation_unit_callback (
> > +finish_translation_unit_callback callback);
> > +
> >  /* Analyzer hook for frontends to call at the end of the TU.  */
> >
> >  void on_finish_translation_unit (const translation_unit );
> > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> > index 80920b31f83..f0ee55e416b 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -1695,6 +1695,32 @@ public:
> >  return NULL_TREE;
> >}
> >
> > 

[PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]

2023-08-02 Thread Drew Ross via Gcc-patches
Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
x86_64 and x86 targets.

  PR middle-end/101955

gcc/ChangeLog:

  * match.pd ((signed x << c) >> c): New canonicalization.

gcc/testsuite/ChangeLog:

  * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd| 20 +++
 gcc/testsuite/gcc.dg/pr101955.c | 63 +
 2 files changed, 77 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..62de97f0186 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
- TYPE_PRECISION (TREE_TYPE (@2)
   (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1
 
-/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
-   types.  */
+/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
+   unsigned x OR truncate into the precision(type) - c lowest bits
+   of signed x (if they have mode precision or a precision of 1).  */
 (simplify
- (rshift (lshift @0 INTEGER_CST@1) @1)
- (if (TYPE_UNSIGNED (type)
-  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
-  (bit_and @0 (rshift { build_minus_one_cst (type); } @1
+ (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
+ (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
+  (if (TYPE_UNSIGNED (type))
+   (bit_and (convert @0) (rshift { build_minus_one_cst (type); } @1))
+   (if (INTEGRAL_TYPE_P (type))
+(with {
+  int width = element_precision (type) - tree_to_uhwi (@1);
+  tree stype = build_nonstandard_integer_type (width, 0);
+ }
+ (if (width == 1 || type_has_mode_precision_p (stype))
+  (convert (convert:stype @0
 
 /* Optimize x >> x into 0 */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 000..6a04288511f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,63 @@
+/* { dg-do compile { target int32 } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t2 (unsigned int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t4 (int x)
+{
+  return (x << 24) >> 24;
+}
+
+__attribute__((noipa)) int
+t5 (int x)
+{
+  return (x << 16) >> 16;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t7 (long long x)
+{
+  return (x << 56) >> 56;
+}
+
+__attribute__((noipa)) long long
+t8 (long long x)
+{
+  return (x << 48) >> 48;
+}
+
+__attribute__((noipa)) long long
+t9 (long long x)
+{
+  return (x << 32) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " << " "optimized" } } */
-- 
2.39.3



[PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches
Revised:
-- Fix indentation problems
-- Add more detail to Changelog
-- Add new test on handling non-CPython code case
-- Turn off debugging inform by default
-- Make on_finish_translation_unit() static
-- Remove superfluous null checks in init_py_structs()

Changes have been bootstrapped and tested against trunk on 
aarch64-unknown-linux-gnu.

---
This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
  analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  26 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230 ++
 .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
 6 files changed, 297 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c

diff --git a/gcc/analyzer/analyzer-language.cc 
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..85400288a93 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map  
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER
 
 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+static void
+run_callbacks (logger *logger, const translation_unit )
+{
+  for (auto const  : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}
 
 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit )
 the_logger.set_logger (new logger (logfile, 0, 0,
   *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }
 
 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H
 
+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER
 
 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };
 
+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */
 
 void on_finish_translation_unit (const translation_unit );
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index cf82b0306d1..617111b0f0a 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,32 @@ public:
 return NULL_TREE;
   }
 
+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+  {
+   if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+   tree record_type = TREE_TYPE (type_decl);
+   if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+  }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+   return 

Re: [RFC] light expander sra for parameters and returns

2023-08-02 Thread guojiufu via Gcc-patches

On 2023-08-02 20:41, Richard Biener wrote:

On Tue, 1 Aug 2023, Jiufu Guo wrote:



Hi,

Richard Biener  writes:

> On Mon, 24 Jul 2023, Jiufu Guo wrote:
>
>>
>> Hi Martin,
>>
>> Not sure about your current option about re-using the ipa-sra code
>> in the light-expander-sra. And if anything I could input please
>> let me know.
>>
>> And I'm thinking about the difference between the expander-sra, ipa-sra
>> and tree-sra. 1. For stmts walking, expander-sra has special behavior
>> for return-stmt, and also a little special on assign-stmt. And phi
>> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
>> I'm also thinking if we need a tree structure; it would be useful when
>> checking overlaps, it was not used now in the expander-sra.
>>
>> For ipa-sra and tree-sra, I notice that there is some similar code,
>> but of cause there are differences. While it seems the difference
>> is 'intended', for example: 1. when creating and accessing,
>> 'size != max_size' is acceptable in tree-sra but not for ipa-sra.
>> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
>> not ok for tree-ipa.
>> I'm wondering if those slight difference blocks re-use the code
>> between ipa-sra and tree-sra.
>>
>> The expander-sra may be more light, for example, maybe we can use
>> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
>> need to walk all the stmts.
>
> What I was hoping for is shared stmt-level analysis and a shared
> data structure for the "access"(es) a stmt performs.  Because that
> can come up handy in multiple places.  The existing SRA data
> structures could easily embed that subset for example if sharing
> the whole data structure of [IPA] SRA seems too unwieldly.

Understand.
The stmt-level analysis and "access" data structure are similar
between ipa-sra/tree-sra and the expander-sra.

I just update the patch, this version does not change the behaviors of
the previous version.  It is just cleaning/merging some functions 
only.

The patch is attached.

This version (and tree-sra/ipa-sra) is still using the similar
"stmt analyze" and "access struct"".  This could be extracted as
shared code.
I'm thinking to update the code to use the same "base_access" and
"walk function".

>
> With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
> possible (though RTL expansion pre-walks all stmts anyway).

Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
For struct parameters, walking stmt is needed.


I think I mentioned this before, RTL expansion already
pre-walks the whole function looking for variables it has to
expand to the stack in discover_nonconstant_array_refs (which is
now badly named), I'd appreciate if the "SRA" walk would piggy-back
on that existing walk.


Yes.  I also had a look at discover_nonconstant_array_refs, it seems
this function takes care only of 'call_internal' and 'vdef' stmt for
array access.   But sra cares more about 'assign/call'.
The common thing is just the loop header between these two "stmt-walk"s.

  FOR_EACH_BB_FN (bb, cfun)
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
  {
gimple *stmt = gsi_stmt (gsi);

So, the existing walk is not used.
Another reason to have a new walk is that: the sra walk code may be
shared for tree-sra/ipa-sra.



For RTL expansion I think a critical part is to create accesses
based on the incoming/outgoing RTL which is specified by the ABI.
As I understand we are optimizing the argument setup code which
assigns the incoming arguments to either pseudo(s) or the stack
and thus we get to choose an optimized "mode" for that virtual
location of the incoming arguments (but we can't alter their
hardregs/stack assignment obviously).


Yes, this is what I'm trying to do.
It is "set_scalar_rtx_for_aggregate_access", which is called after
incoming arguments are set up, and then assign the incoming hard 
registers

to pseudo(s).  Those pseudo(s) are the scalarized rtx for the argument.


 So when we have an
incoming register pair we should create an artificial access
for the pieces those two registers represent.

You seem to do quite some adjustment to the parameter setup
where I was hoping we get away with simply choosing a different
mode for the virtual argument representation?


I insert the code in the parameter setup, where the incoming registers
are computed, and assigning the incoming regs to scalar pseudo(s).
(copying the incoming registers to stack would be optimized out by rtl 
passes.

Yes, it would be better to avoid generating them.)



But I'm not too familiar with the innards of parameter/return
value initial RTL expansion.  I hope somebody else can chime
in here as well.


Thanks so much for your very helpful comments!

BR,
Jeff (Jiufu Guo)



Richard.




BR,
Jeff (Jiufu Guo)

-
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index edf292cfbe9..8c36ad5df79 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -97,6 +97,502 @@ 

  1   2   >