date:20230427

Re: Ping: [PATCH] testsuite/C++: suppress filename canonicalization in module tests

2023-04-27 Thread Jan Beulich via Gcc-patches

On 28.04.2023 00:24, Nathan Sidwell wrote:
> On 4/25/23 11:04, Jan Beulich wrote:
>> On 28.06.2022 16:06, Jan Beulich wrote:
>>> The pathname underneath gcm.cache/ is determined from the effective name
>>> used for the main input file of a particular module. When modules are
>>> built, no canonicalization occurs for the main input file. Hence the
>>> module file wouldn't be found if a different (the canonicalized) file
>>> name was used when importing that same module. (This is an effect of
>>> importing happening in the preprocessor, just like #include handling.)
>>>
>>> Since it doesn't look easy to make module generation use libcpp's
>>> maybe_shorter_path() (in fact I'd consider this a layering violation,
>>> while cloning the logic would - at least in principle - be prone to both
>>> going out of sync), simply suppress system header path canonicalization
>>> for the respective tests.
>>
>> Ping: This still looks to apply as is.
> 
> ok -- I was unaware of this.  might be sensible to file a defect about this?

Sure: 109660.

Jan

RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Li, Pan2 via Gcc-patches

Thanks, kito.

Yes, you are right. I am investigating this right now from simplify rtl. Given 
we have one similar case VMORN in previous.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but 
don't know why it's not evaluated

(eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
   (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM  wrote:
>
> From: Pan Li 
>
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the 
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24<- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less 
> vector registers.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Add new define split to perform
>   the simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li 
> Co-authored-by: kito-cheng 
> ---
>  gcc/config/riscv/vector.md|  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291 
> ++
>  2 files changed, 323 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load"
>"vleff.v\t%0,%3%p1"
>[(set_attr "type" "vldff")
> (set_attr "mode" "")])
> +
> +;; 
> +-
> + ;;  Integer Compare Instructions Simplification ;; 
> +-
> +
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;; 
> +-
> +
> +(define_split
> +  [(set (match_operand:VB  0 "register_operand")
> +   (if_then_else:VB
> + (unspec:VB
> +   [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +(match_operand4 "vector_length_operand")
> +(match_operand5 "const_int_operand")
> +(match_operand6 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operand:VB3 "vector_move_operand")
> + (match_operand:VB2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX 
> (mode),
> +RVV_VUNDEF (mode), operands[3],
> +operands[4], operands[5]));
> +DONE;
> +  }
> +)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_

Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Kito Cheng via Gcc-patches

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl)
too, but don't know why it's not evaluated

(eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
   (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM  wrote:
>
> From: Pan Li 
>
> When some RVV integer compare operators act on the same vector
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
> of the simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
> }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24<- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less
> vector registers.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Add new define split to perform
>   the simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li 
> Co-authored-by: kito-cheng 
> ---
>  gcc/config/riscv/vector.md|  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++
>  2 files changed, 323 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load"
>"vleff.v\t%0,%3%p1"
>[(set_attr "type" "vldff")
> (set_attr "mode" "")])
> +
> +;; 
> -
> +;;  Integer Compare Instructions Simplification
> +;; 
> -
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;; 
> -
> +(define_split
> +  [(set (match_operand:VB  0 "register_operand")
> +   (if_then_else:VB
> + (unspec:VB
> +   [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +(match_operand4 "vector_length_operand")
> +(match_operand5 "const_int_operand")
> +(match_operand6 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operand:VB3 "vector_move_operand")
> + (match_operand:VB2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX 
> (mode),
> +RVV_VUNDEF (mode), operands[3],
> +operands[4], operands[5]));
> +DONE;
> +  }
> +)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
> new file mode 100644
> index 000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Kito Cheng via Gcc-patches

> The defined predicate of vector_move_operand composes of (non-imm || (const 
> vector && (reload_completed ? constraint_vi (op) : constraint_wc0(op))).
I may not quit understand why we group them together and named as vector_move.

I forgot the detail reason about that, but vaguely remember that is
for optimization, maybe need Ju-Zhe back and tell us the reason :P

On Fri, Apr 28, 2023 at 10:06 AM Li, Pan2  wrote:
>
> Thanks Kito for the better approach. It works well with the prepared test 
> cases but I may have one question about the semantics of the 
> vector_move_operand.
>
> The defined predicate of vector_move_operand composes of (non-imm || (const 
> vector && (reload_completed ? constraint_vi (op) : constraint_wc0(op))).
> I may not quit understand why we group them together and named as vector_move.
>
> Another difference is that it will act on combine pass which is more generic 
> than the PATCH v1 (which acts on split2 pass).
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Thursday, April 27, 2023 11:00 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
>
> > Could you try something like this? that should be more generic:
> >
> > (define_split
> >  [(set (match_operand:VB 0 "register_operand")
> >(if_then_else:VB
> >  (unspec:VB
> >[(match_operand:VB 1 "vector_all_trues_mask_operand")
> > (match_operand 4 "vector_length_operand")
> > (match_operand 5 "const_int_operand")
> > (match_operand 6 "const_int_operand")
> > (reg:SI VL_REGNUM)
> > (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >  (match_operand:VB 3 "vector_move_operand")
> >  (match_operand:VB 2 "vector_undef_operand")))]
> > "TARGET_VECTOR && reload_completed"
>
> Remove the reload_completed should work well, but you might need more test, I 
> didn't run full test on this change :P
>
> >  [(const_int 0)]
> >  {
> >emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX 
> > (mode),
> > RVV_VUNDEF (mode), CONST0_RTX 
> > (mode),
> > operands[4], operands[5]));
> >DONE;
> >  }
> > )

[PATCH 10/11] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

The XTheadMemIdx ISA extension provides a additional load and store
instructions with new addressing modes.

The following memory accesses types are supported:
* ltype = [b,bu,h,hu,w,wu,d]
* stype = [b,h,w,d]

The following addressing modes are supported:
* immediate offset with PRE_MODIFY or POST_MODIFY (22 instructions):
  l.ia, l.ib, s.ia, s.ib
* register offset with additional immediate offset (11 instructions):
  lr, sr
* zero-extended register offset with additional immediate offset
  (11 instructions): lur, sur

The RISC-V base ISA does not support index registers, so the changes
are kept separate from the RISC-V standard support.

Similar like other extensions (Zbb, XTheadBb), this patch needs to
prevent the conversion of sign-extensions/zero-extensions into
shift instructions. The case of the zero-extended register offset
addressing mode is handled by a new peephole pass.

Handling the different cases of extensions results in a couple of INSNs
that look redundant on first view, but they are just the equivalent
of what we already have for Zbb as well. The only difference is, that
we have much more load instructions.

To fully utilize the capabilities of the instructions, there are
a few new peephole passes which fold shift amounts into the RTX
if possible. The added tests ensure that this feature won't
regress without notice.

We already have a constraint with the name 'th_f_fmv', therefore,
the new constraints follow this pattern and have the same length
as required ('th_m_mia', 'th_m_mib', 'th_m_mir', 'th_m_miu').

gcc/ChangeLog:

* config/riscv/constraints.md (th_m_mia): New constraint.
(th_m_mib): Likewise.
(th_m_mir): Likewise.
(th_m_miu): Likewise.
* config/riscv/riscv-protos.h (enum riscv_address_type):
Add new address types ADDRESS_REG_REG, ADDRESS_REG_UREG,
and ADDRESS_REG_WB and their documentation.
(struct riscv_address_info): Add new field 'shift' and
document the field usage for the new address types.
(riscv_valid_base_register_p): New prototype.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadMemIdx.
(riscv_classify_address): Call th_classify_address() on top.
(riscv_output_move): Call th_output_move() on top.
(riscv_print_operand_address): Call th_print_operand_address()
on top.
* config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): New macro.
(HAVE_PRE_MODIFY_DISP): Likewise.
* config/riscv/riscv.md (zero_extendqi2): Disable
for XTheadMemIdx.
(*zero_extendqi2_internal): Convert to expand,
create INSN with same name and disable it for XTheadMemIdx.
(extendsidi2): Likewise.
(*extendsidi2_internal): Disable for XTheadMemIdx.
* config/riscv/thead-peephole.md: Add helper peephole passes.
* config/riscv/thead.cc (valid_signed_immediate): New helper
function.
(th_memidx_classify_address_modify): New function.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_output_modify): Likewise.
(is_memidx_mode): Likewise.
(th_memidx_classify_address_index): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_memidx_output_index): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/thead.md (*th_memidx_mov2):
New INSN.
(*th_memidx_zero_extendqi2): Likewise.
(*th_memidx_extendsidi2): Likewise
(*th_memidx_zero_extendsidi2): Likewise.
(*th_memidx_zero_extendhi2): Likewise.
(*th_memidx_extend2): Likewise
(*th_memidx_bb_zero_extendsidi2): Likewise.
(*th_memidx_bb_zero_extendhi2): Likewise.
(*th_memidx_bb_extendhi2): Likewise.
(*th_memidx_bb_extendqi2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadmemidx-helpers.h: New test.
* gcc.target/riscv/xtheadmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-index.c: New test.
* gcc.target/riscv/xtheadmemidx-modify-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-modify.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex.c: New test.

Signed-off-by:

[PATCH 11/11] riscv: thead: Add support for the XTheadFMemIdx ISA extension

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

The XTheadFMemIdx ISA extension provides additional load and store
instructions for floating-point registers with new addressing modes.

The following memory accesses types are supported:
* ftype = [w,d] (single-precision, double-precision)

The following addressing modes are supported:
* register offset with additional immediate offset (4 instructions):
  flr, fsr
* zero-extended register offset with additional immediate offset
  (4 instructions): flur, fsur

These addressing modes are also part of the similar XTheadMemIdx
ISA extension support, whose code is reused and extended to support
floating-point registers.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_index_reg_class): Also allow
for XTheadFMemIdx.
(riscv_regno_ok_for_index_p): Likewise.
* config/riscv/thead-peephole.md (TARGET_64BIT):
Generalize peepholes for XTheadFMemIdx.
* config/riscv/thead.cc (is_fmemidx_mode): New function.
(th_memidx_classify_address_index): Add support for
XTheadFMemIdx.
(th_fmemidx_output_index): New function.
(th_output_move): Add support for XTheadFMemIdx.
* config/riscv/thead.md (*th_fmemidx_movsf_hardfloat): New INSN.
(*th_fmemidx_movdf_hardfloat_rv64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadmemidx-helpers.h: Add helpers for
  XTheadMemFIdx.
* gcc.target/riscv/xtheadfmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-index.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc |  4 +-
 gcc/config/riscv/thead-peephole.md| 24 ---
 gcc/config/riscv/thead.cc | 68 +-
 gcc/config/riscv/thead.md | 22 ++
 .../riscv/xtheadfmemidx-index-update.c| 20 ++
 .../xtheadfmemidx-index-xtheadbb-update.c | 20 ++
 .../riscv/xtheadfmemidx-index-xtheadbb.c  | 22 ++
 .../gcc.target/riscv/xtheadfmemidx-index.c| 22 ++
 .../riscv/xtheadfmemidx-uindex-update.c   | 20 ++
 .../xtheadfmemidx-uindex-xtheadbb-update.c| 20 ++
 .../riscv/xtheadfmemidx-uindex-xtheadbb.c | 24 +++
 .../gcc.target/riscv/xtheadfmemidx-uindex.c   | 25 +++
 .../gcc.target/riscv/xtheadmemidx-helpers.h   | 70 +++
 13 files changed, 348 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1691ecf3a94..d1e08974023 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -834,7 +834,7 @@ riscv_regno_mode_ok_for_base_p (int regno,
 enum reg_class
 riscv_index_reg_class ()
 {
-  if (TARGET_XTHEADMEMIDX)
+  if (TARGET_XTHEADMEMIDX || TARGET_XTHEADFMEMIDX)
 return GR_REGS;
 
   return NO_REGS;
@@ -847,7 +847,7 @@ riscv_index_reg_class ()
 int
 riscv_regno_ok_for_index_p (int regno)
 {
-  if (TARGET_XTHEADMEMIDX)
+  if (TARGET_XTHEADMEMIDX || TARGET_XTHEADFMEMIDX)
 return riscv_regno_mode_ok_for_base_p (regno, VOIDmode, 1);
 
   return 0;
diff --git a/gcc/config/riscv/thead-peephole.md 
b/gcc/config/riscv/thead-peephole.md
index 2a4c734a220..e3de9b8e8d3 100644
--- a/gcc/config/riscv/thead-peephole.md
+++ b/gcc/config/riscv/thead-peephole.md
@@ -73,11 +73,15 @@ (define_peephole2
   th_mempair_order_operands (operands, true, SImode);
 })
 
-;; All modes that are supported by XTheadMemIdx
-(define_mode_iterator TH_M_ANY [QI HI SI (DI "TARGET_64BIT")])
+;; All modes that are supported by XTheadMemIdx and XTheadFMemIdx
+(define_mode_iterator TH_M_ANY [QI HI SI (DI "TARGET_64BIT")
+(SF "TARGET_HARD_FLOAT")
+(DF "TARGET_DOUBLE_FLOAT")])
 
-;; All non-extension modes that are supported by XTheadMemIdx
-(define_mode_iterator TH_M_NOEXT [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")])
+;; All non-extension modes that

Re: [PATCH] Synchronize include/ctf.h with upstream binutils/libctf.

2023-04-27 Thread Richard Biener via Gcc-patches

On Thu, Apr 27, 2023 at 5:15 PM Roger Sayle  wrote:
>
>
> This patch updates include/ctf.h to match the current libctf version in
> binutils' include/.  I recently attempted to build a uber tree (following
> some notes that are so old they used CVS) and noticed that binutils won't
> build with GCC's top-level include, due to CTF_F_IDXSORTED not being
> defined in ctf.h.  There was also a discrepancy with ansidecl.h but
> I'm unsure how (best) to resolve that issue.
>
> This patch was tested on x86_64-pc-linux-gnu with a make bootstrap and
> make -k check, both with and without --target_board=unix{-m32}, with
> no new failures, to confirm that the usage of ctf.h in ctfout.cc and
> dwarf2ctf.cc is compatible with the new version.  Ok for mainline?

OK.

>
> 2023-04-27  Roger Sayle  
>
> include/ChangeLog
> * ctf.h: Import latest version from binutils/libctf.
>
>
> Thanks in advance,
> Roger
> --
>

[PATCH 08/11] riscv: Prepare backend for index registers

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

RISC-V does currently not support index registers.
However, there are some vendor extensions that specify them.
Let's do the necessary changes in the backend so that we can
add support for such a vendor extension in the future.

This is a non-functional change without any intended side-effects.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_regno_ok_for_index_p):
New prototype.
(riscv_index_reg_class): Likewise.
* config/riscv/riscv.cc (riscv_regno_ok_for_index_p): New function.
(riscv_index_reg_class): New function.
* config/riscv/riscv.h (INDEX_REG_CLASS): Call new function
riscv_index_reg_class().
(REGNO_OK_FOR_INDEX_P): Call new function
riscv_regno_ok_for_index_p().

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h |  2 ++
 gcc/config/riscv/riscv.cc   | 20 
 gcc/config/riscv/riscv.h|  6 --
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 628c64cf628..b7417e97d99 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -82,6 +82,8 @@ struct riscv_address_info {
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
 extern int riscv_regno_mode_ok_for_base_p (int, machine_mode, bool);
+extern enum reg_class riscv_index_reg_class ();
+extern int riscv_regno_ok_for_index_p (int);
 extern int riscv_address_insns (rtx, machine_mode, bool);
 extern int riscv_const_insns (rtx);
 extern int riscv_split_const_insns (rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8388235d8cc..a33f0fff8ea 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -827,6 +827,26 @@ riscv_regno_mode_ok_for_base_p (int regno,
   return GP_REG_P (regno);
 }
 
+/* Get valid index register class.
+   The RISC-V base instructions don't support index registers,
+   but extensions might support that.  */
+
+enum reg_class
+riscv_index_reg_class ()
+{
+  return NO_REGS;
+}
+
+/* Return true if register REGNO is a valid index register.
+   The RISC-V base instructions don't support index registers,
+   but extensions might support that.  */
+
+int
+riscv_regno_ok_for_index_p (int regno)
+{
+  return 0;
+}
+
 /* Return true if X is a valid base register for mode MODE.
STRICT_P is true if REG_OK_STRICT is in effect.  */
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 90746fe14e3..21b81c22dea 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -535,7 +535,7 @@ enum reg_class
factor or added to another register (as well as added to a
displacement).  */
 
-#define INDEX_REG_CLASS NO_REGS
+#define INDEX_REG_CLASS riscv_index_reg_class()
 
 /* We generally want to put call-clobbered registers ahead of
call-saved ones.  (IRA expects this.)  */
@@ -705,7 +705,9 @@ typedef struct {
 
 /* Addressing modes, and classification of registers for them.  */
 
-#define REGNO_OK_FOR_INDEX_P(REGNO) 0
+#define REGNO_OK_FOR_INDEX_P(REGNO) \
+  riscv_regno_ok_for_index_p (REGNO)
+
 #define REGNO_MODE_OK_FOR_BASE_P(REGNO, MODE) \
   riscv_regno_mode_ok_for_base_p (REGNO, MODE, 1)
 
-- 
2.40.1

[PATCH 03/11] riscv: xtheadmempair: Fix doc for th_mempair_order_operands()

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

There is an incorrect sentence in the documentation of the function
th_mempair_order_operands(). Let's remove it.

gcc/ChangeLog:

* config/riscv/thead.cc (th_mempair_operands_p):
Fix documentation of th_mempair_order_operands().

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/thead.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index d7e3cf80d9b..507c912bc39 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -336,8 +336,8 @@ th_mempair_operands_p (rtx operands[4], bool load_p,
 }
 
 /* Given OPERANDS of consecutive load/store that can be merged,
-   swap them if they are not in ascending order.
-   Return true if swap was performed.  */
+   swap them if they are not in ascending order.  */
+
 void
 th_mempair_order_operands (rtx operands[4], bool load_p, machine_mode mode)
 {
-- 
2.40.1

[PATCH 09/11] riscv: thead: Factor out XThead*-specific peepholes

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

This patch moves the XThead*-specific peephole passes
into thead-peephole.md with the intend to keep vendor-specific
code separated from RISC-V standard code.

This patch does not contain any functional changes.

gcc/ChangeLog:

* config/riscv/peephole.md: Remove XThead* peephole passes.
* config/riscv/thead.md: Include thead-peephole.md.
* config/riscv/thead-peephole.md: New file.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/peephole.md   | 56 --
 gcc/config/riscv/thead-peephole.md | 74 ++
 gcc/config/riscv/thead.md  |  2 +
 3 files changed, 76 insertions(+), 56 deletions(-)
 create mode 100644 gcc/config/riscv/thead-peephole.md

diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
index 67e7046d7e6..0ef0c04410b 100644
--- a/gcc/config/riscv/peephole.md
+++ b/gcc/config/riscv/peephole.md
@@ -38,59 +38,3 @@ (define_peephole2
 {
   operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5]));
 })
-
-;; XTheadMemPair: merge two SI or DI loads
-(define_peephole2
-  [(set (match_operand:GPR 0 "register_operand" "")
-   (match_operand:GPR 1 "memory_operand" ""))
-   (set (match_operand:GPR 2 "register_operand" "")
-   (match_operand:GPR 3 "memory_operand" ""))]
-  "TARGET_XTHEADMEMPAIR
-  && th_mempair_operands_p (operands, true, mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
- (set (match_dup 2) (match_dup 3))])]
-{
-  th_mempair_order_operands (operands, true, mode);
-})
-
-;; XTheadMemPair: merge two SI or DI stores
-(define_peephole2
-  [(set (match_operand:GPR 0 "memory_operand" "")
-   (match_operand:GPR 1 "register_operand" ""))
-   (set (match_operand:GPR 2 "memory_operand" "")
-   (match_operand:GPR 3 "register_operand" ""))]
-  "TARGET_XTHEADMEMPAIR
-  && th_mempair_operands_p (operands, false, mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-  (set (match_dup 2) (match_dup 3))])]
-{
-  th_mempair_order_operands (operands, false, mode);
-})
-
-;; XTheadMemPair: merge two SI loads with sign-extension
-(define_peephole2
-  [(set (match_operand:DI 0 "register_operand" "")
-   (sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
-   (set (match_operand:DI 2 "register_operand" "")
-   (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "TARGET_XTHEADMEMPAIR && TARGET_64BIT
-  && th_mempair_operands_p (operands, true, SImode)"
-  [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1)))
-  (set (match_dup 2) (sign_extend:DI (match_dup 3)))])]
-{
-  th_mempair_order_operands (operands, true, SImode);
-})
-
-;; XTheadMemPair: merge two SI loads with zero-extension
-(define_peephole2
-  [(set (match_operand:DI 0 "register_operand" "")
-   (zero_extend:DI (match_operand:SI 1 "memory_operand" "")))
-   (set (match_operand:DI 2 "register_operand" "")
-   (zero_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "TARGET_XTHEADMEMPAIR && TARGET_64BIT
-  && th_mempair_operands_p (operands, true, SImode)"
-  [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1)))
-  (set (match_dup 2) (zero_extend:DI (match_dup 3)))])]
-{
-  th_mempair_order_operands (operands, true, SImode);
-})
diff --git a/gcc/config/riscv/thead-peephole.md 
b/gcc/config/riscv/thead-peephole.md
new file mode 100644
index 000..5b829b5b968
--- /dev/null
+++ b/gcc/config/riscv/thead-peephole.md
@@ -0,0 +1,74 @@
+;; Machine description for T-Head vendor extensions
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; XTheadMemPair: merge two SI or DI loads
+(define_peephole2
+  [(set (match_operand:GPR 0 "register_operand" "")
+   (match_operand:GPR 1 "memory_operand" ""))
+   (set (match_operand:GPR 2 "register_operand" "")
+   (match_operand:GPR 3 "memory_operand" ""))]
+  "TARGET_XTHEADMEMPAIR
+  && th_mempair_operands_p (operands, true, mode)"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+{
+  th_mempair_order_operands (operands, true, mode);
+})
+
+;; XTheadMemPair: merge two SI or DI stores
+(define_peephole2
+  [(set (match_operand:GPR 0 "memory_operand" "")
+   (match_operand:GPR 1 "register_operand" ""))
+   (set (match_o

[PATCH 06/11] riscv: Define Xmode macro

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

Define a Xmode macro that specifies the registers size (XLEN)
similar to Pmode. This allows the backend code to write generic
RV32/RV64 C code (under certain circumstances).

gcc/ChangeLog:

* config/riscv/riscv.h (Xmode): New macro.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 66fb07d6652..90746fe14e3 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -791,6 +791,10 @@ typedef struct {
 
 #define Pmode word_mode
 
+/* Specify the machine mode that registers have.  */
+
+#define Xmode (TARGET_64BIT ? DImode : SImode)
+
 /* Give call MEMs SImode since it is the "most permissive" mode
for both 32-bit and 64-bit targets.  */
 
-- 
2.40.1

[PATCH 07/11] riscv: Move address classification info types to riscv-protos.h

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

enum riscv_address_type and struct riscv_address_info are used
to store address classification information. Let's move this types
into our common header file in order to share them with other
compilation units.

This is a non-functional change without any intendet side-effects.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum riscv_address_type):
New location of type definition.
(struct riscv_address_info): Likewise.
* config/riscv/riscv.cc (enum riscv_address_type):
Old location of type definition.
(struct riscv_address_info): Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h | 43 +
 gcc/config/riscv/riscv.cc   | 43 -
 2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..628c64cf628 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -35,6 +35,49 @@ enum riscv_symbol_type {
 };
 #define NUM_SYMBOL_TYPES (SYMBOL_TLS_GD + 1)
 
+/* Classifies an address.
+
+   ADDRESS_REG
+   A natural register + offset address.  The register satisfies
+   riscv_valid_base_register_p and the offset is a const_arith_operand.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx.  The first operand is a valid base register and
+   the second operand is a symbolic address.
+
+   ADDRESS_CONST_INT
+   A signed 16-bit constant address.
+
+   ADDRESS_SYMBOLIC:
+   A constant symbolic address.  */
+enum riscv_address_type {
+  ADDRESS_REG,
+  ADDRESS_LO_SUM,
+  ADDRESS_CONST_INT,
+  ADDRESS_SYMBOLIC
+};
+
+/* Information about an address described by riscv_address_type.
+
+   ADDRESS_CONST_INT
+   No fields are used.
+
+   ADDRESS_REG
+   REG is the base register and OFFSET is the constant offset.
+
+   ADDRESS_LO_SUM
+   REG and OFFSET are the operands to the LO_SUM and SYMBOL_TYPE
+   is the type of symbol it references.
+
+   ADDRESS_SYMBOLIC
+   SYMBOL_TYPE is the type of symbol that the address references.  */
+struct riscv_address_info {
+  enum riscv_address_type type;
+  rtx reg;
+  rtx offset;
+  enum riscv_symbol_type symbol_type;
+};
+
 /* Routines implemented in riscv.cc.  */
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 92043236b17..8388235d8cc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -81,28 +81,6 @@ along with GCC; see the file COPYING3.  If not see
 /* True if bit BIT is set in VALUE.  */
 #define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0)
 
-/* Classifies an address.
-
-   ADDRESS_REG
-   A natural register + offset address.  The register satisfies
-   riscv_valid_base_register_p and the offset is a const_arith_operand.
-
-   ADDRESS_LO_SUM
-   A LO_SUM rtx.  The first operand is a valid base register and
-   the second operand is a symbolic address.
-
-   ADDRESS_CONST_INT
-   A signed 16-bit constant address.
-
-   ADDRESS_SYMBOLIC:
-   A constant symbolic address.  */
-enum riscv_address_type {
-  ADDRESS_REG,
-  ADDRESS_LO_SUM,
-  ADDRESS_CONST_INT,
-  ADDRESS_SYMBOLIC
-};
-
 /* Information about a function's frame layout.  */
 struct GTY(())  riscv_frame_info {
   /* The size of the frame in bytes.  */
@@ -182,27 +160,6 @@ struct riscv_arg_info {
   unsigned int fpr_offset;
 };
 
-/* Information about an address described by riscv_address_type.
-
-   ADDRESS_CONST_INT
-   No fields are used.
-
-   ADDRESS_REG
-   REG is the base register and OFFSET is the constant offset.
-
-   ADDRESS_LO_SUM
-   REG and OFFSET are the operands to the LO_SUM and SYMBOL_TYPE
-   is the type of symbol it references.
-
-   ADDRESS_SYMBOLIC
-   SYMBOL_TYPE is the type of symbol that the address references.  */
-struct riscv_address_info {
-  enum riscv_address_type type;
-  rtx reg;
-  rtx offset;
-  enum riscv_symbol_type symbol_type;
-};
-
 /* One stage in a constant building sequence.  These sequences have
the form:
 
-- 
2.40.1

[PATCH 02/11] riscv: xtheadmempair: Fix CFA reg notes

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.

gcc/ChangeLog:

* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/thead.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 75203805310..d7e3cf80d9b 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -368,8 +368,12 @@ th_mempair_save_regs (rtx operands[4])
   rtx set2 = gen_rtx_SET (operands[2], operands[3]);
   rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, 
set2)));
   RTX_FRAME_RELATED_P (insn) = 1;
-  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
-  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
+
+  REG_NOTES (insn) = alloc_EXPR_LIST (REG_FRAME_RELATED_EXPR,
+ copy_rtx (set1), REG_NOTES (insn));
+
+  REG_NOTES (insn) = alloc_EXPR_LIST (REG_FRAME_RELATED_EXPR,
+ copy_rtx (set2), REG_NOTES (insn));
 }
 
 /* Similar like riscv_restore_reg, but restores two registers from memory
-- 
2.40.1

[PATCH 04/11] riscv: thead: Adjust constraints of th_addsl INSN

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

A recent change adjusted the constraints of ZBA's shNadd INSN.
Let's mirror this change here as well.

gcc/ChangeLog:

* config/riscv/thead.md: Adjust constraints of th_addsl.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/thead.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 6a06d0dfcf2..aa933960a98 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -22,10 +22,9 @@
 (define_insn "*th_addsl4"
   [(set (match_operand:X 0 "register_operand" "=r")
(plus:X (ashift:X (match_operand:X 1 "register_operand" "r")
- (match_operand 2 "const_int_operand" "n"))
+ (match_operand:QI 2 "imm123_operand" "Ds3"))
(match_operand:X 3 "register_operand" "r")))]
-  "TARGET_XTHEADBA
-   && (INTVAL (operands[2]) >= 0) && (INTVAL (operands[2]) <= 3)"
+  "TARGET_XTHEADBA"
   "th.addsl\t%0,%3,%1,%2"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
-- 
2.40.1

[PATCH 05/11] riscv: Simplify output of MEM addresses

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

We have the following situation for MEM RTX objects:
* TARGET_PRINT_OPERAND expands to riscv_print_operand()
* This falls into the default case (unknown or on letter) of the outer
  switch-case-block and the MEM case of the inner switch-case-block and
  calls output_address() in final.cc with XEXP (op, 0) (the address)
* This calls targetm.asm_out.print_operand_address() which is
  riscv_print_operand_address()
* riscv_print_operand_address() is targeting the address of a MEM RTX
* riscv_print_operand_address() calls riscv_print_operand() for the offset
  and directly prints the register if the address is classified as ADDRESS_REG
* This falls into the default case (unknown or on letter) of the outer
  switch-case-block and the default case of the inner switch-case-block and
  calls output_addr_const().

However, since we know that offset must be a CONST_INT (which will be
followed by a '()' string), there is no need to call
riscv_print_operand() for the offset.
Instead we can take the shortcut and use output_addr_const().

This change also brings the code in riscv_print_operand_address()
in line with the other cases, where output_addr_const() is used
to print offsets.

Tested with GCC regression test suite and SPEC intrate.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..92043236b17 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4581,7 +4581,7 @@ riscv_print_operand_address (FILE *file, machine_mode 
mode ATTRIBUTE_UNUSED, rtx
 switch (addr.type)
   {
   case ADDRESS_REG:
-   riscv_print_operand (file, addr.offset, 0);
+   output_addr_const (file, riscv_strip_unspec_address (addr.offset));
fprintf (file, "(%s)", reg_names[REGNO (addr.reg)]);
return;
 
-- 
2.40.1

[PATCH 01/11] riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extu

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

The current support of the bitfield-extraction instructions
th.ext and th.extu (XTheadBb extension) only covers sign_extract
and zero_extract. This patch add support for sign_extend and
zero_extend to avoid any shifts for sign or zero extensions.

gcc/ChangeLog:

* config/riscv/riscv.md: No base-ISA extension splitter for XThead*.
* config/riscv/thead.md (*extend2_th_ext):
New XThead extension INSN.
(*zero_extendsidi2_th_extu): New XThead extension INSN.
(*zero_extendhi2_th_extu): New XThead extension INSN.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-ext-1.c: New test.
* gcc.target/riscv/xtheadbb-extu-1.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md |  6 +-
 gcc/config/riscv/thead.md | 31 +
 .../gcc.target/riscv/xtheadbb-ext-1.c | 67 +++
 .../gcc.target/riscv/xtheadbb-extu-1.c| 67 +++
 4 files changed, 168 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-ext-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-extu-1.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 1fb29da8a0b..f4cc99187ed 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1368,7 +1368,7 @@ (define_insn_and_split "*zero_extendsidi2_internal"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI
(match_operand:SI 1 "nonimmediate_operand" " r,m")))]
-  "TARGET_64BIT && !TARGET_ZBA
+  "TARGET_64BIT && !TARGET_ZBA && !TARGET_XTHEADBB
&& !(register_operand (operands[1], SImode)
 && reg_or_subregno (operands[1]) == VL_REGNUM)"
   "@
@@ -1395,7 +1395,7 @@ (define_insn_and_split "*zero_extendhi2"
   [(set (match_operand:GPR0 "register_operand" "=r,r")
(zero_extend:GPR
(match_operand:HI 1 "nonimmediate_operand" " r,m")))]
-  "!TARGET_ZBB"
+  "!TARGET_ZBB && !TARGET_XTHEADBB"
   "@
#
lhu\t%0,%1"
@@ -1451,7 +1451,7 @@ (define_insn_and_split 
"*extend2"
   [(set (match_operand:SUPERQI   0 "register_operand" "=r,r")
(sign_extend:SUPERQI
(match_operand:SHORT 1 "nonimmediate_operand" " r,m")))]
-  "!TARGET_ZBB"
+  "!TARGET_ZBB && !TARGET_XTHEADBB"
   "@
#
l\t%0,%1"
diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 0623607d3dc..6a06d0dfcf2 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -59,6 +59,17 @@ (define_insn "*th_ext4"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
+(define_insn "*extend2_th_ext"
+  [(set (match_operand:SUPERQI 0 "register_operand" "=r,r")
+   (sign_extend:SUPERQI
+   (match_operand:SHORT 1 "nonimmediate_operand" "r,m")))]
+  "TARGET_XTHEADBB"
+  "@
+   th.ext\t%0,%1,15,0
+   l\t%0,%1"
+  [(set_attr "type" "bitmanip,load")
+   (set_attr "mode" "")])
+
 (define_insn "*th_extu4"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(zero_extract:GPR (match_operand:GPR 1 "register_operand" "r")
@@ -72,6 +83,26 @@ (define_insn "*th_extu4"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
+(define_insn "*zero_extendsidi2_th_extu"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m")))]
+  "TARGET_64BIT && TARGET_XTHEADBB"
+  "@
+   th.extu\t%0,%1,31,0
+   lwu\t%0,%1"
+  [(set_attr "type" "bitmanip,load")
+   (set_attr "mode" "SI")])
+
+(define_insn "*zero_extendhi2_th_extu"
+  [(set (match_operand:GPR 0 "register_operand" "=r,r")
+   (zero_extend:GPR (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
+  "TARGET_XTHEADBB"
+  "@
+   th.extu\t%0,%1,15,0
+   lhu\t%0,%1"
+  [(set_attr "type" "bitmanip,load")
+   (set_attr "mode" "HI")])
+
 (define_insn "*th_clz2"
   [(set (match_operand:X 0 "register_operand" "=r")
(clz:X (match_operand:X 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-1.c 
b/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-1.c
new file mode 100644
index 000..02f6ec1417d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-1.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_xtheadbb" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadbb" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" } } */
+
+long sext64_32(int s32)
+{
+return s32;
+}
+
+long sext64_16(short s16)
+{
+return s16;
+}
+
+long sext64_8(char s8)
+{
+return s8;
+}
+
+int sext32_64(long s64)
+{
+return s64;
+}
+
+int sext32_16(short s16)
+{
+return s16;
+}
+
+int sext32_8(char s8)
+{
+return s8;
+}
+
+short sext16_64(long s64)
+{
+return s64;
+}
+
+short sext16_32(int s32)
+{
+return s32;
+}
+
+short sext16_8(char s8)
+{
+return s8;
+}
+
+char sext8_64(long s64)
+{
+return s64;
+}
+
+

[PATCH 00/11] Improvements for XThead* support

2023-04-27 Thread Christoph Muellner

From: Christoph Müllner 

This series improves the support for the XThead* ISA extensions
which are available e.g. on the T-Head XuanTie C906.

The ISA spec can be found here:
  https://github.com/T-head-Semi/thead-extension-spec

So far the following extension support has been merged in GCC:
* XTheadBa
* XTheadBb
* XTheadBs
* XTheadCmo
* XTheadCondMov
* XTheadMemPair

This patchset builds upon that and contains the following changes:
* Fix for sign/zero extension support for th.ext and th.extu
  This is actually a resend, that has not been merged.
  Jeff Law acked the patch last Friday.
* Fix for CFA reg notes creation
* Small fix for documentation of th_mempair_order_operands()
* Introduction of Xmode macro
* Two non-functional preparation commits for additional addressing modes
* A patch that moves XThead* specific peephole passes in its own file
* Support for XTheadMemIdx and its addressing modes
* Support for XTheadFMemIdx, which is similar to XTheadMemIdx

All patches have been tested and don't introduce regressions
for RV32 or RV64. The patches have also been tested with
SPEC CPU2017 on QEMU (multiple combinations of extensions).

Support patches of these extensions for Binutils, QEMU, and
LLVM have already been merged in the corresponding upstream
projects.

Support patches for XTheadMemIdx and XTheadFMemIdx have been
submitted in an earlier series as well and received a couple of
rework-comments from Kito. We rewrote the whole support to
better meet the (reasonable) goal of keeping vendor extension
code separated from RISC-V standard code and to address other issues.
The resulting code is structured much better, which can be seen
in the small number of changes that are required for the last patch
(XTheadFMemIdx support).

Christoph Müllner (11):
  riscv: xtheadbb: Add sign/zero extension support for th.ext and
th.extu
  riscv: xtheadmempair: Fix CFA reg notes
  riscv: xtheadmempair: Fix doc for th_mempair_order_operands()
  riscv: thead: Adjust constraints of th_addsl INSN
  riscv: Simplify output of MEM addresses
  riscv: Define Xmode macro
  riscv: Move address classification info types to riscv-protos.h
  riscv: Prepare backend for index registers
  riscv: thead: Factor out XThead*-specific peepholes
  riscv: thead: Add support for the XTheadMemIdx ISA extension
  riscv: thead: Add support for the XTheadFMemIdx ISA extension

 gcc/config/riscv/constraints.md   |  24 +
 gcc/config/riscv/peephole.md  |  56 --
 gcc/config/riscv/riscv-protos.h   |  74 +++
 gcc/config/riscv/riscv.cc |  87 ++-
 gcc/config/riscv/riscv.h  |  13 +-
 gcc/config/riscv/riscv.md |  26 +-
 gcc/config/riscv/thead-peephole.md| 292 ++
 gcc/config/riscv/thead.cc | 506 +-
 gcc/config/riscv/thead.md | 240 -
 .../gcc.target/riscv/xtheadbb-ext-1.c |  67 +++
 .../gcc.target/riscv/xtheadbb-extu-1.c|  67 +++
 .../riscv/xtheadfmemidx-index-update.c|  20 +
 .../xtheadfmemidx-index-xtheadbb-update.c |  20 +
 .../riscv/xtheadfmemidx-index-xtheadbb.c  |  22 +
 .../gcc.target/riscv/xtheadfmemidx-index.c|  22 +
 .../riscv/xtheadfmemidx-uindex-update.c   |  20 +
 .../xtheadfmemidx-uindex-xtheadbb-update.c|  20 +
 .../riscv/xtheadfmemidx-uindex-xtheadbb.c |  24 +
 .../gcc.target/riscv/xtheadfmemidx-uindex.c   |  25 +
 .../gcc.target/riscv/xtheadmemidx-helpers.h   | 222 
 .../riscv/xtheadmemidx-index-update.c |  27 +
 .../xtheadmemidx-index-xtheadbb-update.c  |  27 +
 .../riscv/xtheadmemidx-index-xtheadbb.c   |  36 ++
 .../gcc.target/riscv/xtheadmemidx-index.c |  36 ++
 .../riscv/xtheadmemidx-modify-xtheadbb.c  |  74 +++
 .../gcc.target/riscv/xtheadmemidx-modify.c|  74 +++
 .../riscv/xtheadmemidx-uindex-update.c|  27 +
 .../xtheadmemidx-uindex-xtheadbb-update.c |  27 +
 .../riscv/xtheadmemidx-uindex-xtheadbb.c  |  44 ++
 .../gcc.target/riscv/xtheadmemidx-uindex.c|  44 ++
 30 files changed, 2146 insertions(+), 117 deletions(-)
 create mode 100644 gcc/config/riscv/thead-peephole.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-ext-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-extu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex.c

[PATCH] PHIOPT: Move two_value_replacement to match.pd

2023-04-27 Thread Andrew Pinski via Gcc-patches

This patch converts two_value_replacement function
into a match.pd pattern.
It is a direct translation with only one minor change,
does not check for the {0,+-1} case as that is handled
before in match.pd so there is no reason to do the extra
check for it.

OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.

gcc/ChangeLog:

PR tree-optimization/100958
* tree-ssa-phiopt.cc (two_value_replacement): Remove.
(pass_phiopt::execute): Don't call two_value_replacement.
* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
handle what two_value_replacement did.
---
 gcc/match.pd   |  94 
 gcc/tree-ssa-phiopt.cc | 157 +
 2 files changed, 96 insertions(+), 155 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 31fe5093218..e17597ead26 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4632,6 +4632,100 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   )
  )
 )
+
+/* Optimize
+   # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
+   x_5 ? cstN ? cst4 : cst3
+   # op is == or != and N is 1 or 2
+   to r_6 = x_5 + (min (cst3, cst4) - cst1) or
+   r_6 = (min (cst3, cst4) + cst1) - x_5 depending on op, N and which
+   of cst3 and cst4 is smaller.
+   This was originally done by two_value_replacement in phiopt (PR 88676).  */
+(for eqne (ne eq)
+ (simplify
+  (cond (eqne SSA_NAME@0 INTEGER_CST@1) INTEGER_CST@2 INTEGER_CST@3)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (type)
+   && (wi::to_widest (@2) + 1 == wi::to_widest (@3)
+   || wi::to_widest (@2) == wi::to_widest (@3) + 1))
+   (with {
+ value_range r;
+ get_range_query (cfun)->range_of_expr (r, @0);
+ if (r.undefined_p ())
+   r.set_varying (TREE_TYPE (@0));
+
+ wide_int min = r.lower_bound ();
+ wide_int max = r.upper_bound ();
+}
+(if (min + 1 == max
+&& (wi::to_wide (@1) == min
+|| wi::to_wide (@1) == max))
+ (with {
+   tree arg0 = @2, arg1 = @3;
+   tree type1;
+   if ((eqne == EQ_EXPR) ^ (wi::to_wide (@1) == min))
+ std::swap (arg0, arg1);
+   if (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (type))
+{
+  /* Avoid performing the arithmetics in bool type which has different
+ semantics, otherwise prefer unsigned types from the two with
+the same precision.  */
+  if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE
+  || !TYPE_UNSIGNED (type))
+type1 = TREE_TYPE (@0);
+  else
+type1 = TREE_TYPE (arg0);
+}
+   else if (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (type))
+type1 = TREE_TYPE (@0);
+   else
+type1 = type;
+   min = wide_int::from (min, TYPE_PRECISION (type1),
+TYPE_SIGN (TREE_TYPE (@0)));
+   wide_int a = wide_int::from (wi::to_wide (arg0), TYPE_PRECISION (type1),
+   TYPE_SIGN (type));
+   enum tree_code code;
+   wi::overflow_type ovf;
+   if (tree_int_cst_lt (arg0, arg1))
+{
+  code = PLUS_EXPR;
+  a -= min;
+  if (!TYPE_UNSIGNED (type1))
+{
+  /* lhs is known to be in range [min, min+1] and we want to add a
+ to it.  Check if that operation can overflow for those 2 
values
+ and if yes, force unsigned type.  */
+  wi::add (min + (wi::neg_p (a) ? 0 : 1), a, SIGNED, &ovf);
+  if (ovf)
+type1 = unsigned_type_for (type1);
+}
+}
+   else
+{
+  code = MINUS_EXPR;
+  a += min;
+  if (!TYPE_UNSIGNED (type1))
+{
+  /* lhs is known to be in range [min, min+1] and we want to 
subtract
+ it from a.  Check if that operation can overflow for those 2
+ values and if yes, force unsigned type.  */
+  wi::sub (a, min + (wi::neg_p (min) ? 0 : 1), SIGNED, &ovf);
+  if (ovf)
+   type1 = unsigned_type_for (type1);
+}
+}
+   tree arg = wide_int_to_tree (type1, a);
+  }
+  (if (code == PLUS_EXPR)
+   (convert (plus (convert:type1 @0) { arg; }))
+   (convert (minus { arg; } (convert:type1 @0)))
+  )
+ )
+)
+   )
+  )
+ )
+)
 #endif
 
 (simplify
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 7fc6ac17b4a..4b43f1abdbc 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -373,155 +373,6 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi 
*phi,
   return newphi;
 }
 
-/* Optimize
-   # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
-   if (x_5 op cstN) # where op is == or != and N is 1 or 2
- goto bb3;
-   else
- goto bb4;
-   bb3:
-   bb4:
-   # r_6 = PHI # where cst3 == cst4 + 1 or cst4 == cst3 + 1
-
-   to r_6 = x_5 + (min (cst3, cst4) - cst1) or
-

[PATCHv2] MATCH: Factor out code that for min max detection with constants

2023-04-27 Thread Andrew Pinski via Gcc-patches

This factors out some of the code from the min/max detection
from match.pd into a function so it can be reused in other
places. This is mainly used to detect the conversions
of >= to > which causes the integer values to be changed by
one.

Changes since v1:
* factor out the checks for INTEGER_CSTs so it is more obvious.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Factor out the deciding the min/max from
the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
pattern to ...
* fold-const.cc (minmax_from_comparison): this new function.
* fold-const.h (minmax_from_comparison): New prototype.
---
 gcc/fold-const.cc | 44 
 gcc/fold-const.h  |  3 +++
 gcc/match.pd  | 29 +
 3 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 3b397ae2941..7d2352dbcdd 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -150,6 +150,50 @@ static tree fold_convert_const (enum tree_code, tree, 
tree);
 static tree fold_view_convert_expr (tree, tree);
 static tree fold_negate_expr (location_t, tree);
 
+/* This is a helper function to detect min/max for some operands of COND_EXPR.
+   The form is "(EXP0 CMP EXP1) ? EXP2 : EXP3". */
+tree_code
+minmax_from_comparison (tree_code cmp, tree exp0, tree exp1, tree exp2, tree 
exp3)
+{
+  enum tree_code code = ERROR_MARK;
+
+  if (HONOR_NANS (exp0) || HONOR_SIGNED_ZEROS (exp0))
+return ERROR_MARK;
+
+  if (!operand_equal_p (exp0, exp2))
+return ERROR_MARK;
+
+  if (TREE_CODE (exp3) == INTEGER_CST && TREE_CODE (exp1) == INTEGER_CST)
+{
+  if (wi::to_widest (exp1) == (wi::to_widest (exp3) - 1))
+   {
+ /* X <= Y - 1 equals to X < Y.  */
+ if (cmp == LE_EXPR)
+   code = LT_EXPR;
+ /* X > Y - 1 equals to X >= Y.  */
+ if (cmp == GT_EXPR)
+   code = GE_EXPR;
+   }
+  if (wi::to_widest (exp1) == (wi::to_widest (exp3) + 1))
+   {
+ /* X < Y + 1 equals to X <= Y.  */
+ if (cmp == LT_EXPR)
+   code = LE_EXPR;
+ /* X >= Y + 1 equals to X > Y.  */
+ if (cmp == GE_EXPR)
+ code = GT_EXPR;
+   }
+}
+  if (code != ERROR_MARK
+  || operand_equal_p (exp1, exp3))
+{
+  if (cmp == LT_EXPR || cmp == LE_EXPR)
+   code = MIN_EXPR;
+  if (cmp == GT_EXPR || cmp == GE_EXPR)
+   code = MAX_EXPR;
+}
+  return code;
+}
 
 /* Return EXPR_LOCATION of T if it is not UNKNOWN_LOCATION.
Otherwise, return LOC.  */
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 56ecaa87fc6..b828badc42f 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -246,6 +246,9 @@ extern tree fold_build_pointer_plus_hwi_loc (location_t 
loc, tree ptr, HOST_WIDE
 #define fold_build_pointer_plus_hwi(p,o) \
fold_build_pointer_plus_hwi_loc (UNKNOWN_LOCATION, p, o)
 
+extern tree_code minmax_from_comparison (tree_code, tree, tree,
+tree, tree);
+
 /* In gimple-fold.cc.  */
 extern void clear_type_padding_in_mask (tree, unsigned char *);
 extern bool clear_padding_type_may_have_padding_p (tree);
diff --git a/gcc/match.pd b/gcc/match.pd
index c4320781f5b..fad337a9697 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4677,34 +4677,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 || TYPE_SIGN (c2_type) == TYPE_SIGN (from_type)
{
 if (cmp != EQ_EXPR)
-  {
-if (wi::to_widest (@3) == (wi::to_widest (@2) - 1))
-  {
-/* X <= Y - 1 equals to X < Y.  */
-if (cmp == LE_EXPR)
-  code = LT_EXPR;
-/* X > Y - 1 equals to X >= Y.  */
-if (cmp == GT_EXPR)
-  code = GE_EXPR;
-  }
-if (wi::to_widest (@3) == (wi::to_widest (@2) + 1))
-  {
-/* X < Y + 1 equals to X <= Y.  */
-if (cmp == LT_EXPR)
-  code = LE_EXPR;
-/* X >= Y + 1 equals to X > Y.  */
-if (cmp == GE_EXPR)
-  code = GT_EXPR;
-  }
-if (code != ERROR_MARK
-|| wi::to_widest (@2) == wi::to_widest (@3))
-  {
-if (cmp == LT_EXPR || cmp == LE_EXPR)
-  code = MIN_EXPR;
-if (cmp == GT_EXPR || cmp == GE_EXPR)
-  code = MAX_EXPR;
-  }
-  }
+  code = minmax_from_comparison (cmp, @1, @3, @1, @2);
 /* Can do A == C1 ? A : C2  ->  A == C1 ? C1 : C2?  */
 else if (int_fits_type_p (@3, from_type))
   code = EQ_EXPR;
-- 
2.31.1

[PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Pan Li via Gcc-patches

From: Pan Li 

When some RVV integer compare operators act on the same vector
registers without mask. They can be simplified to VMCLR.

This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
of the simplification by adding one new define_split.

Given we have:
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24<- optimized to vmclr.m
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

As above, we may have one instruction eliminated and require less
vector registers.

gcc/ChangeLog:

* config/riscv/vector.md: Add new define split to perform
  the simplification.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li 
Co-authored-by: kito-cheng 
---
 gcc/config/riscv/vector.md|  32 ++
 .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++
 2 files changed, 323 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index b3d23441679..1642822d098 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load"
   "vleff.v\t%0,%3%p1"
   [(set_attr "type" "vldff")
(set_attr "mode" "")])
+
+;; 
-
+;;  Integer Compare Instructions Simplification
+;; 
-
+;; Simplify to VMCLR.m Includes:
+;; - 1.  VMSNE
+;; - 2.  VMSLT
+;; - 3.  VMSLTU
+;; - 4.  VMSGT
+;; - 5.  VMSGTU
+;; 
-
+(define_split
+  [(set (match_operand:VB  0 "register_operand")
+   (if_then_else:VB
+ (unspec:VB
+   [(match_operand:VB 1 "vector_all_trues_mask_operand")
+(match_operand4 "vector_length_operand")
+(match_operand5 "const_int_operand")
+(match_operand6 "const_int_operand")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+ (match_operand:VB3 "vector_move_operand")
+ (match_operand:VB2 "vector_undef_operand")))]
+  "TARGET_VECTOR"
+  [(const_int 0)]
+  {
+emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode),
+RVV_VUNDEF (mode), operands[3],
+operands[4], operands[5]));
+DONE;
+  }
+)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
new file mode 100644
index 000..8954adad09d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -0,0 +1,291 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size

[PATCH] [RISC-V] Fix riscv_expand_conditional_move.

2023-04-27 Thread Die Li

Two issues have been observed in current riscv_expand_conditional_move
implementation.
1. Before introduction of TARGET_XTHEADCONDMOV, op0 of comparision expression
is used for mode comparision with word_mode, but after TARGET_XTHEADCONDMOV
megered with TARGET_SFB_ALU, dest of if-then-else is used for mode comparision 
with
word_mode, and from md file mode of dest is DI or SI which can be different with
word_mode in RV64.

2. TARGET_XTHEADCONDMOV cannot be generated when the mode of the comparison is 
E_VOID.

This patch solves the issues above.

Provide an example from the newly added test case.

Testcase:
int ConNmv_reg_reg_reg(int x, int y, int z, int n){
  if (x != y) return z;
  return n;
}

Cflags:
-O2 -march=rv64gc_xtheadcondmov -mabi=lp64d

before patch:
ConNmv_reg_reg_reg:
bne a0,a1,.L23
mv  a2,a3
.L23:
mv  a0,a2
ret

after patch:
ConNmv_reg_reg_reg:
sub a1,a0,a1
th.mveqza2,zero,a1
th.mvneza3,zero,a1
or  a0,a2,a3
ret

Co-Authored by: Fei Gao 
Signed-off-by: Die Li 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Fix mode 
checking.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: New test.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: New test.
---
 gcc/config/riscv/riscv.cc |   4 +-
 .../riscv/xtheadcondmov-indirect-rv32.c   | 116 ++
 .../riscv/xtheadcondmov-indirect-rv64.c   | 116 ++
 3 files changed, 234 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv64.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1529855a2b4..30ace45dc5f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3411,7 +3411,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   && GET_MODE_CLASS (mode) == MODE_INT
   && reg_or_0_operand (cons, mode)
   && reg_or_0_operand (alt, mode)
-  && GET_MODE (op) == mode
+  && (GET_MODE (op) == mode || GET_MODE (op) == E_VOIDmode)
   && GET_MODE (op0) == mode
   && GET_MODE (op1) == mode
   && (code == EQ || code == NE))
@@ -3420,7 +3420,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   return true;
 }
   else if (TARGET_SFB_ALU
-  && mode == word_mode)
+  && GET_MODE (op0) == word_mode)
 {
   riscv_emit_int_compare (&code, &op0, &op1);
   rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c 
b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
new file mode 100644
index 000..9afdc2eabfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
@@ -0,0 +1,116 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc_xtheadcondmov -mabi=ilp32 
-mriscv-attribute" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" } } */
+/* { dg-final { check-function-bodies "**" ""  } } */
+
+/*
+**ConEmv_imm_imm_reg:
+** addia5,a0,-1000
+** li  a0,10
+** th.mvneza0,zero,a5
+** th.mveqza1,zero,a5
+** or  a0,a0,a1
+** ret
+*/
+int ConEmv_imm_imm_reg(int x, int y){
+  if (x == 1000) return 10;
+  return y;
+}
+
+/*
+**ConEmv_imm_reg_reg:
+** addia5,a0,-1000
+** th.mvneza1,zero,a5
+** th.mveqza2,zero,a5
+** or  a0,a1,a2
+** ret
+*/
+int ConEmv_imm_reg_reg(int x, int y, int z){
+  if (x == 1000) return y;
+  return z;
+}
+
+/*
+**ConEmv_reg_imm_reg:
+** sub a1,a0,a1
+** li  a0,10
+** th.mvneza0,zero,a1
+** th.mveqza2,zero,a1
+** or  a0,a0,a2
+** ret
+*/
+int ConEmv_reg_imm_reg(int x, int y, int z){
+  if (x == y) return 10;
+  return z;
+}
+
+/*
+**ConEmv_reg_reg_reg:
+** sub a1,a0,a1
+** th.mvneza2,zero,a1
+** th.mveqza3,zero,a1
+** or  a0,a2,a3
+** ret
+*/
+int ConEmv_reg_reg_reg(int x, int y, int z, int n){
+  if (x == y) return z;
+  return n;
+}
+
+/*
+**ConNmv_imm_imm_reg:
+** li  a5,9998336
+** addia4,a0,-1000
+** addia5,a5,1664
+** th.mvneza1,zero,a4
+** th.mveqza5,zero,a4
+** or  a0,a1,a5
+** ret
+*/
+int ConNmv_imm_imm_reg(int x, int y){
+  if (x != 1000) return 1000;
+  return y;
+}
+
+/*
+**ConNmv_imm_reg_reg:
+** addia5,a0,-1000
+** th.mveqza1,zero,a5
+** th.mvneza2,zero,a5
+** or  a0,a1,a2
+** ret
+*/
+int ConNmv_imm_reg_reg(int x, int y, int z){
+  if (x != 1000) return y;
+  return z;
+}
+
+/*
+**ConNmv_reg_imm_reg:
+** sub a1,a0,a1
+** li  a0,10
+** th.mveqza0,zero,a1
+** th.mvneza2,z

RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Li, Pan2 via Gcc-patches

Thanks Kito for the better approach. It works well with the prepared test cases 
but I may have one question about the semantics of the vector_move_operand.

The defined predicate of vector_move_operand composes of (non-imm || (const 
vector && (reload_completed ? constraint_vi (op) : constraint_wc0(op))).
I may not quit understand why we group them together and named as vector_move.

Another difference is that it will act on combine pass which is more generic 
than the PATCH v1 (which acts on split2 pass).

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, April 27, 2023 11:00 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

> Could you try something like this? that should be more generic:
>
> (define_split
>  [(set (match_operand:VB 0 "register_operand")
>(if_then_else:VB
>  (unspec:VB
>[(match_operand:VB 1 "vector_all_trues_mask_operand")
> (match_operand 4 "vector_length_operand")
> (match_operand 5 "const_int_operand")
> (match_operand 6 "const_int_operand")
> (reg:SI VL_REGNUM)
> (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>  (match_operand:VB 3 "vector_move_operand")
>  (match_operand:VB 2 "vector_undef_operand")))]  
> "TARGET_VECTOR && reload_completed"

Remove the reload_completed should work well, but you might need more test, I 
didn't run full test on this change :P

>  [(const_int 0)]
>  {
>emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode),
> RVV_VUNDEF (mode), CONST0_RTX (mode),
> operands[4], operands[5]));
>DONE;
>  }
> )

[PATCH] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-04-27 Thread Fangrui Song via Gcc-patches

When using -mcmodel=medium, large data is placed into .l* sections.  GNU ld
places .l* sections into separate output sections.  If small and medium
code model object files are mixed, the .l* sections won't cause
relocation overflow pressure on sections in -mcmodel=small object files.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't apply.  This
means that the .rodata/.data/.bss sections may cause relocation overflow
pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections.

Signed-off-by: Fangrui Song 
---
 gcc/config/i386/i386.cc| 15 +--
 gcc/config/i386/i386.opt   |  2 +-
 gcc/doc/invoke.texi|  7 ---
 gcc/testsuite/gcc.target/i386/large-data.c | 13 +
 4 files changed, 27 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index a3db55642e3..c68c66a5567 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -637,7 +637,8 @@ ix86_can_inline_p (tree caller, tree callee)
 static bool
 ix86_in_large_data_p (tree exp)
 {
-  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
+  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
+  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
 return false;
 
   if (exp == NULL_TREE)
@@ -848,8 +849,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
const char *name, unsigned HOST_WIDE_INT size,
unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+ size > (unsigned int)ix86_section_threshold)
 {
   switch_to_section (get_named_section (decl, ".lbss", 0));
   fputs (LARGECOMM_SECTION_ASM_OP, file);
@@ -869,9 +871,10 @@ void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
unsigned HOST_WIDE_INT size, unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
-switch_to_section (get_named_section (decl, ".lbss", 0));
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+  size > (unsigned int)ix86_section_threshold)
+switch_to_section(get_named_section(decl, ".lbss", 0));
   else
 switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d74f6b1f8fc..de8e722cd62 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).
 
 mlarge-data-threshold=
 Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
Init(DEFAULT_LARGE_SECTION_THRESHOLD)
--mlarge-data-threshold=Data greater than given threshold will 
go into .ldata section in x86-64 medium model.
+-mlarge-data-threshold=Data greater than given threshold will 
go into a large data section in x86-64 medium and large code models.
 
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e5ee2d536fc..4a20eef92e5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32927,9 +32927,10 @@ the cache line size.  @samp{compat} is the default.
 
 @opindex mlarge-data-threshold
 @item -mlarge-data-threshold=@var{threshold}
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
+objects larger than @var{threshold} are placed in large data sections.  This
+value must be the same across all objects linked into the binary, and defaults
+to 65535.
 
 @opindex mrtd
 @item -mrtd
diff --git a/gcc/testsuite/gcc.target/i386/large-data.c 
b/gcc/testsuite/gcc.target/i386/large-data.c
new file mode 100644
index 000..09a917431d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/large-data.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mcmodel=large -mlarge-data-threshold=4" } */
+/* { dg-final { scan-assembler ".lbss" } } */
+/* { dg-final { scan-assembler ".bss" } } */
+/* { dg-final { scan-assembler ".ldata" } } */
+/* { dg-final { scan-assembler ".data" } } */
+/* { dg-final { scan-assembler ".lrodata" } } */
+/* { dg-final { scan-assembler ".rodata" } } */
+
+const char rodata_a[] = "abc", rodata_b[] = "abcd";
+char d

Re: [PATCH] c++: NSDMI instantiation from template context [PR109506]

2023-04-27 Thread Sam James via Gcc-patches

FWIW, we'd love to be able to backport this to GCC 13 (maybe 12, but no
big deal there) in Gentoo so we can continue doing large testing builds with
a lot of checking enabled, given this affects Chromium.

But it's no big deal if it's too invasive.


signature.asc
Description: PGP signature

[committed] libstdc++: Fix error in doxygen comments in

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux.

Pushed to gcc-12 and gcc-11 (not needed on gcc-13 or trunk).

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/atomic: Add missing @endcond doxygen comment.
---
 libstdc++-v3/include/std/atomic | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index faf7eb95594..94150c1181f 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -1287,6 +1287,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 using __atomic_val_t = __type_identity_t<_Tp>;
   template
 using __atomic_diff_t = typename atomic<_Tp>::difference_type;
+  /// @endcond
 
   // [atomics.nonmembers] Non-member functions.
   // Function templates generally applicable to atomic types.
-- 
2.40.0

Re: Ping: [PATCH] testsuite/C++: suppress filename canonicalization in module tests

2023-04-27 Thread Nathan Sidwell via Gcc-patches


On 4/25/23 11:04, Jan Beulich wrote:

On 28.06.2022 16:06, Jan Beulich wrote:

The pathname underneath gcm.cache/ is determined from the effective name
used for the main input file of a particular module. When modules are
built, no canonicalization occurs for the main input file. Hence the
module file wouldn't be found if a different (the canonicalized) file
name was used when importing that same module. (This is an effect of
importing happening in the preprocessor, just like #include handling.)

Since it doesn't look easy to make module generation use libcpp's
maybe_shorter_path() (in fact I'd consider this a layering violation,
while cloning the logic would - at least in principle - be prone to both
going out of sync), simply suppress system header path canonicalization
for the respective tests.


Ping: This still looks to apply as is.


ok -- I was unaware of this.  might be sensible to file a defect about this?



Thanks, Jan


---
Strictly speaking it could be necessary to also suppress
canonicalization when generating the modules, but for now they're self-
contained, i.e. don't include any "real" system headers. IOW at the
moment the tests aren't susceptible to the issue at generation time.

--- a/gcc/testsuite/g++.dg/modules/alias-1_b.C
+++ b/gcc/testsuite/g++.dg/modules/alias-1_b.C
@@ -1,4 +1,4 @@
-// { dg-additional-options "-fmodules-ts -fdump-lang-module -isystem [srcdir]" 
}
+// { dg-additional-options "-fmodules-ts -fdump-lang-module -isystem [srcdir] 
-fno-canonical-system-headers" }
  
  // Alias at the header file.  We have one CMI file

  import "alias-1_a.H";
--- a/gcc/testsuite/g++.dg/modules/alias-1_d.C
+++ b/gcc/testsuite/g++.dg/modules/alias-1_d.C
@@ -1,4 +1,4 @@
-// { dg-additional-options "-fmodules-ts -isystem [srcdir]" }
+// { dg-additional-options "-fmodules-ts -isystem [srcdir] 
-fno-canonical-system-headers" }
  // { dg-module-cmi kevin }
  
  export module kevin;

--- a/gcc/testsuite/g++.dg/modules/alias-1_e.C
+++ b/gcc/testsuite/g++.dg/modules/alias-1_e.C
@@ -1,4 +1,4 @@
-// { dg-additional-options "-fmodules-ts -isystem [srcdir]" }
+// { dg-additional-options "-fmodules-ts -isystem [srcdir] 
-fno-canonical-system-headers" }
  
  import bob;

  import kevin;
--- a/gcc/testsuite/g++.dg/modules/alias-1_f.C
+++ b/gcc/testsuite/g++.dg/modules/alias-1_f.C
@@ -1,4 +1,4 @@
-// { dg-additional-options "-fmodules-ts -fdump-lang-module -isystem [srcdir]" 
}
+// { dg-additional-options "-fmodules-ts -fdump-lang-module -isystem [srcdir] 
-fno-canonical-system-headers" }
  
  import kevin;

  import bob;
--- a/gcc/testsuite/g++.dg/modules/cpp-6_c.C
+++ b/gcc/testsuite/g++.dg/modules/cpp-6_c.C
@@ -1,5 +1,5 @@
  // { dg-do preprocess }
-// { dg-additional-options "-fmodules-ts -isystem [srcdir]" }
+// { dg-additional-options "-fmodules-ts -isystem [srcdir] 
-fno-canonical-system-headers" }
  
  #define empty

  #define nop(X) X
--- a/gcc/testsuite/g++.dg/modules/dir-only-2_b.C
+++ b/gcc/testsuite/g++.dg/modules/dir-only-2_b.C
@@ -1,5 +1,5 @@
  // { dg-do preprocess }
-// { dg-additional-options "-fmodules-ts -fdirectives-only -isystem [srcdir]" }
+// { dg-additional-options "-fmodules-ts -fdirectives-only -isystem [srcdir] 
-fno-canonical-system-headers" }
  // a comment
  module; // line
  frob




--
Nathan Sidwell

Re: [PATCH v2] testsuite/C++: cope with IPv6 being unavailable

2023-04-27 Thread Nathan Sidwell via Gcc-patches


On 4/25/23 11:00, Jan Beulich wrote:

When IPv6 is disabled in the kernel, the error message coming back from
Cody::OpenInet6() is different from the sole so far expected one.


ok -- i couldn't find such a system :)

---
v2: Re-base.

--- a/gcc/testsuite/g++.dg/modules/bad-mapper-3.C
+++ b/gcc/testsuite/g++.dg/modules/bad-mapper-3.C
@@ -1,6 +1,6 @@
  //  { dg-additional-options "-fmodules-ts 
-fmodule-mapper=localhost:172477262" }
  import unique3.bob;
-// { dg-error {failed (connecting|disabled) mapper 'localhost:172477262'} "" { 
target *-*-* } 0 }
+// { dg-error {failed (socket|connecting|disabled) mapper 'localhost:172477262'} 
"" { target *-*-* } 0 }
  // { dg-prune-output "fatal error:" }
  // { dg-prune-output "failed to read" }
  // { dg-prune-output "compilation terminated" }


--
Nathan Sidwell

Re: [PATCH] c++: outer args for level-lowered ttp [PR109651]

2023-04-27 Thread Patrick Palka via Gcc-patches

On Thu, Apr 27, 2023 at 4:46 PM Patrick Palka  wrote:
>
> Now that with r14-11-g2245459c85a3f4 made us coerce the template
> arguments of a bound ttp again after level-lowering, this unfortunately
> causes a crash from coerce_template_args_for_ttp in the below testcase.
>
> During the level-lowering substitution T=int into the bound ttp TT
> as part of substitution into the lambda signature, current_template_parms
> is just U=U rather than the ideal TT=TT, U=U.  And because we don't
> consistently set DECL_CONTEXT for level-lowered ttps (it's kind of a
> chicken of the egg problem in this case), we attempt to use
> current_template_parms to obtain the outer arguments during
> coerce_template_args_for_ttp.  But the depth 1 of c_t_p
> current_template_parms is less than the depth 2 of the level-lowered TT,
> and we end up segfaulting from there.
>
> So for level-lowered ttps it seems we need to get the outer arguments a
> different way -- namely, we can look at the trailing parms of its
> DECL_TEMPLATE_PARMS.

Note this is not an ideal solution because TREE_CHAIN of
DECL_TEMPLATE_PARMS in this case is just "2 , 1 U", so we're
missing tparm information for the level that the ttp belongs to :/ So
the only difference compared to using current_template_parms in this
case is the extra empty level of args corresponding to the ttp's
level.



>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/13?
>
> PR c++/109651
>
> gcc/cp/ChangeLog:
>
> * pt.cc (coerce_template_args_for_ttp): For level-lowered
> ttps with DECL_CONTEXT not set, obtain the outer template
> arguments via its DECL_TEMPLATE_PARMS.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/template/ttp37.C: New test.
> ---
>  gcc/cp/pt.cc  |  5 +
>  gcc/testsuite/g++.dg/template/ttp37.C | 11 +++
>  2 files changed, 16 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/template/ttp37.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 678cb7930e3..bbde61061f6 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -7872,6 +7872,11 @@ coerce_template_args_for_ttp (tree templ, tree arglist,
>tree outer = DECL_CONTEXT (templ);
>if (outer)
>  outer = generic_targs_for (outer);
> +  else if (TEMPLATE_TYPE_LEVEL (TREE_TYPE (templ))
> +  != TEMPLATE_TYPE_ORIG_LEVEL (TREE_TYPE (templ)))
> +/* This is a level-lowered template template parameter, for which
> +   we don't consistently set DECL_CONTEXT (FIXME).  */
> +outer = template_parms_to_args (TREE_CHAIN (DECL_TEMPLATE_PARMS 
> (templ)));
>else if (current_template_parms)
>  {
>/* This is an argument of the current template, so we haven't set
> diff --git a/gcc/testsuite/g++.dg/template/ttp37.C 
> b/gcc/testsuite/g++.dg/template/ttp37.C
> new file mode 100644
> index 000..876e5b6232a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/ttp37.C
> @@ -0,0 +1,11 @@
> +// PR c++/109651
> +// { dg-do compile { target c++20 } }
> +
> +template
> +void f() {
> +  []() {
> +[] class TT>(TT) { };
> +  };
> +}
> +
> +template void f();
> --
> 2.40.1.423.g2807bd2c10
>

[PATCH] c++: outer args for level-lowered ttp [PR109651]

2023-04-27 Thread Patrick Palka via Gcc-patches

Now that with r14-11-g2245459c85a3f4 made us coerce the template
arguments of a bound ttp again after level-lowering, this unfortunately
causes a crash from coerce_template_args_for_ttp in the below testcase.

During the level-lowering substitution T=int into the bound ttp TT
as part of substitution into the lambda signature, current_template_parms
is just U=U rather than the ideal TT=TT, U=U.  And because we don't
consistently set DECL_CONTEXT for level-lowered ttps (it's kind of a
chicken of the egg problem in this case), we attempt to use
current_template_parms to obtain the outer arguments during
coerce_template_args_for_ttp.  But the depth 1 of c_t_p
current_template_parms is less than the depth 2 of the level-lowered TT,
and we end up segfaulting from there.

So for level-lowered ttps it seems we need to get the outer arguments a
different way -- namely, we can look at the trailing parms of its
DECL_TEMPLATE_PARMS.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

PR c++/109651

gcc/cp/ChangeLog:

* pt.cc (coerce_template_args_for_ttp): For level-lowered
ttps with DECL_CONTEXT not set, obtain the outer template
arguments via its DECL_TEMPLATE_PARMS.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp37.C: New test.
---
 gcc/cp/pt.cc  |  5 +
 gcc/testsuite/g++.dg/template/ttp37.C | 11 +++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp37.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 678cb7930e3..bbde61061f6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7872,6 +7872,11 @@ coerce_template_args_for_ttp (tree templ, tree arglist,
   tree outer = DECL_CONTEXT (templ);
   if (outer)
 outer = generic_targs_for (outer);
+  else if (TEMPLATE_TYPE_LEVEL (TREE_TYPE (templ))
+  != TEMPLATE_TYPE_ORIG_LEVEL (TREE_TYPE (templ)))
+/* This is a level-lowered template template parameter, for which
+   we don't consistently set DECL_CONTEXT (FIXME).  */
+outer = template_parms_to_args (TREE_CHAIN (DECL_TEMPLATE_PARMS (templ)));
   else if (current_template_parms)
 {
   /* This is an argument of the current template, so we haven't set
diff --git a/gcc/testsuite/g++.dg/template/ttp37.C 
b/gcc/testsuite/g++.dg/template/ttp37.C
new file mode 100644
index 000..876e5b6232a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp37.C
@@ -0,0 +1,11 @@
+// PR c++/109651
+// { dg-do compile { target c++20 } }
+
+template
+void f() {
+  []() {
+[] class TT>(TT) { };
+  };
+}
+
+template void f();
-- 
2.40.1.423.g2807bd2c10

New Croatian PO file for 'gcc' (version 13.1.0)

2023-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1.0.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]

2023-04-27 Thread Julian Brown

This patch fixes several cases where multiple attach or detach mapping
nodes were being created for stand-alone attach or detach clauses
in Fortran.  After the introduction of stricter checking later during
compilation, these extra nodes could cause ICEs, as seen in the PR.

The patch also fixes cases that "happened to work" previously where
the user attaches/detaches a pointer to array using a descriptor, and
(I think!) the "_data" field has offset zero, hence the same address as
the descriptor as a whole.

Tested with offloading to nvptx. OK?

Thanks,

Julian

2023-04-27  Julian Brown  

PR fortran/109622

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Attach/detach clause fixes.

gcc/testsuite/
* gfortran.dg/goacc/attach-descriptor.f90: Adjust expected output.

libgomp/
* testsuite/libgomp.fortran/pr109622.f90: New test.
* testsuite/libgomp.fortran/pr109622-2.f90: New test.
* testsuite/libgomp.fortran/pr109622-3.f90: New test.
---
 gcc/fortran/trans-openmp.cc   | 36 +--
 .../gfortran.dg/goacc/attach-descriptor.f90   | 12 +++
 .../testsuite/libgomp.fortran/pr109622-2.f90  | 32 +
 .../testsuite/libgomp.fortran/pr109622-3.f90  | 32 +
 .../testsuite/libgomp.fortran/pr109622.f90| 32 +
 5 files changed, 135 insertions(+), 9 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/pr109622-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/pr109622-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/pr109622.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 4ff9c59df5cb..dbb4a335ab57 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -3388,6 +3388,17 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  gfc_add_block_to_block (block, &se.post);
  if (pointer || allocatable)
{
+ /* If it's a bare attach/detach clause, we just want
+to perform a single attach/detach operation, of the
+pointer itself, not of the pointed-to object.  */
+ if (openacc
+ && (n->u.map_op == OMP_MAP_ATTACH
+ || n->u.map_op == OMP_MAP_DETACH))
+   {
+ OMP_CLAUSE_SIZE (node) = size_zero_node;
+ goto finalize_map_clause;
+   }
+
  node2 = build_omp_clause (input_location,
OMP_CLAUSE_MAP);
  gomp_map_kind kind
@@ -3458,6 +3469,19 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
{
  if (pointer || (openacc && allocatable))
{
+ /* If it's a bare attach/detach clause, we just want
+to perform a single attach/detach operation, of the
+pointer itself, not of the pointed-to object.  */
+ if (openacc
+ && (n->u.map_op == OMP_MAP_ATTACH
+ || n->u.map_op == OMP_MAP_DETACH))
+   {
+ OMP_CLAUSE_DECL (node)
+   = build_fold_addr_expr (inner);
+ OMP_CLAUSE_SIZE (node) = size_zero_node;
+ goto finalize_map_clause;
+   }
+
  tree data, size;
 
  if (lastref->u.c.component->ts.type == BT_CLASS)
@@ -3494,12 +3518,18 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  else if (lastref->type == REF_ARRAY
   && lastref->u.ar.type == AR_FULL)
{
- /* Just pass the (auto-dereferenced) decl through for
-bare attach and detach clauses.  */
+ /* Bare attach and detach clauses don't want any
+additional nodes.  */
  if (n->u.map_op == OMP_MAP_ATTACH
  || n->u.map_op == OMP_MAP_DETACH)
{
- OMP_CLAUSE_DECL (node) = inner;
+ if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (inner)))
+   {
+ tree ptr = gfc_conv_descriptor_data_get (inner);
+ OMP_CLAUSE_DECL (node) = ptr;
+   }
+ else
+   OMP_CLAUSE_DECL (node) = inner;
  OMP_CLAUSE_SIZE (node) = size_zero_node;
  goto finalize_map_clause;
}
diff --git a/gcc/testsuite/gfor

New Croatian PO file for 'gcc' (version 13.1.0)

2023-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1.0.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-27 Thread Andrea Parri

On Thu, Apr 27, 2023 at 09:22:50AM -0700, Patrick O'Neill wrote:
> This patchset aims to make the RISCV atomics implementation stronger
> than the recommended mapping present in table A.6 of the ISA manual.
> https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157
>  
> 
> Context
> -
> GCC defined RISC-V mappings [1] before the Memory Model task group
> finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2].
> 
> For at least a year now, we've known that the mappings were different,
> but it wasn't clear if these unique mappings had correctness issues.
> 
> Andrea Parri found an issue with the GCC mappings, showing that
> atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do
> not enforce release ordering guarantees. (Meaning the GCC mappings have
> a correctness issue).
>   https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/ 
> 
> Why not A.6?
> -
> We can update our mappings now, so the obvious choice would be to
> implement Table A.6 (what LLVM implements/ISA manual recommends).
> 
> The reason why that isn't the best path forward for GCC is due to a
> proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.
> 
> For context, there is discussion about fast-tracking the addition of
> these instructions. The RISCV architectural review committee supports
> adopting a "new and common atomics ABI for gcc and LLVM toochains ...
> that assumes the addition of the preceding instructions”. That common
> ABI is likely to be A.7.
>   https://lists.riscv.org/g/tech-privileged/message/1284 
> 
> Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
> against that risk by emitting a conservative fence after SEQ_CST stores
> to make the mapping compatible with both A.6 and A.7.
> 
> What does a mapping compatible with both A.6 & A.7 look like?
> -
> It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
> fence rw,rw. It's strictly stronger than Table A.6.
> 
> Microbenchmark
> -
> Hans Boehm helpfully wrote a microbenchmark [3] that uses ARM to give a
> rough estimate for the performance benefits/penalties of the different
> mappings. The microbenchmark is single threaded and almost-write-only.
> This case seems unlikely but is useful for getting a rough idea of the
> workload that would be impacted the most.
> 
> Testcases
> ---
> Control: A simple volatile store. This is most similar to a relaxed
> store.
> Release Store: This is most similar to Sw.rl (one of the instructions in
> Hans' proposal).
> Store with release fence: This is most similar to the mapping present in
> Table A.6.
> Store with two fences: This is most similar to the compatibility mapping
> present in this patchset.
> 
> Machines
> ---
> Intel(R) Core(TM) i7-8650U (sanity check only): x86 TSO
> Cortex A53 (Raspberry pi): ARM In order core
> Cortex A55 (Pixel 6 Pro): ARM In order core
> Cortex A76 (Pixel 6 Pro): ARM Out of order core
> Cortex X1 (Pixel 6 Pro): ARM Out of order core
> 
> Microbenchmark Results [4]
> 
> Units are nsecs per iteration.
> 
> Sanity check
> Machine  CONTROL   REL_STORE   STORE_REL_FENCE   STORE_TWO_FENCE
> ---  ---   -   ---   ---
> Intel i7-8650U 1.34812   1.30038 1.293318.0474
> 
> 
> Machine   CONTROL   REL_STORE   STORE_REL_FENCE   STORE_TWO_FENCE
> ---   ---   -   ---   ---
> Cortex A537.15224   10.7282 7.15221   10.013
> Cortex A552.77965   8.89654 4.44787   7.78331
> Cortex A761.78021   1.86095 5.33088   8.88462
> Cortex X1 2.14252   2.14258 4.32982   7.05234
> 
> Reordered tests (using -r flag on microbenchmark)
> Machine   CONTROL   REL_STORE   STORE_REL_FENCE   STORE_TWO_FENCE
> ---   ---   -   ---   ---
> Cortex A537.15227   10.7282 7.16113   10.034
> Cortex A552.78024   8.89574 4.44844   7.78428
> Cortex A761.77686   1.81081 5.33018.88346
> Cortex X1 2.14254   2.14251 4.32737.05239
> 
> Benchmark Interpretation
> 
> As expected, out of order machines are significantly faster with the
> REL_STORE mappings. Unexpectedly, the in-order machines are
> significantly slower with REL_STORE rather than REL_STORE_FENCE.
> 
> Most machines in the wild are expected to use Table A.7 once the
> instructions are introduced. 
> Incurring this added cost now will make it easier for compiled RISC-V
> binaries to transition to the A.7 memory model mapping.
> 
> The performance benefits of moving to A.7 can be more clearly seen using
> an almost-all-load microbenchmark (included on page 3 of Hans’
> proposal). The code for that microbenchmark is attached below [5].
>   
> https://lists.riscv.org/g/tech-unprivileged/attach

[PATCH] c++: NSDMI instantiation from template context [PR109506]

2023-04-27 Thread Patrick Palka via Gcc-patches

The testcase from this PR fails to link when -fchecking=2 with the error:

  /usr/bin/ld: /tmp/ccpHiXEY.o: in function `bar::bar()':
  ...: undefined reference to `foo::foo()'

ultimately because we end up instantiating the NSDMI for bar::alloc_
from the template context func1 for which in_template_function is
true and thus mark_used is inhibited from scheduling other templates for
instantiation.  So we end up never instantiating foo's ctor.

Although maybe_instantiate_nsdmi_init does call push_to_top_level, which
would have gotten us out of the template context, it doesn't happen in
this case because currently_open_class (ctx) is true, thanks to an
earlier call to push_scope from synthesized_method_walk.

We could perhaps arrange to call push_to_top_level unconditionally in
maybe_instantiate_nsdmi_init or even from from synthesized_method_walk,
but that seems rather heavyweight for this situation.  Ideally we just
want a way to allow mark_used to work here despite being in a template
context.

To that end, this patch first generalizes the in_template_function test
in mark_used to instead test current_template_parms, which has two
benefits: it works for all template contexts, not just function template
contexts, and it can be cheaply disabled by simply clearing
current_template_parms.  This patch then makes us disable this test from
maybe_instantiate_nsdmi_init.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  This doesn't seem worth backporting since the bug seems to
manifest only with -fchecking=2.

PR c++/109506

gcc/cp/ChangeLog:

* decl2.cc (mark_used): Check current_template_parms instead
of in_template_function.
* init.cc (maybe_instantiate_nsdmi_init): Clear
current_template_parms before instantiating.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Waddress-of-packed-member2.C: No longer expect
a "used but never defined" warning due to the use from an
uninstantiated template.
* g++.dg/template/non-dependent25.C: New test.
---
 gcc/cp/decl2.cc   |  6 -
 gcc/cp/init.cc|  3 +++
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C | 23 +++
 .../g++.dg/warn/Waddress-of-packed-member2.C  |  2 +-
 4 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 9594be4092c..b9d37d76bf6 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -5781,7 +5781,11 @@ mark_used (tree decl, tsubst_flags_t complain /* = 
tf_warning_or_error */)
  && DECL_OMP_DECLARE_REDUCTION_P (decl)))
 maybe_instantiate_decl (decl);
 
-  if (processing_template_decl || in_template_function ())
+  /* We don't want to instantiate templates based on uses from other
+ uninstantiated templates.  Since processing_template_decl is cleared
+ during instantiate_non_dependent_expr, we check current_template_parms
+ as well.  */
+  if (processing_template_decl || current_template_parms)
 return true;
 
   /* Check this too in case we're within instantiate_non_dependent_expr.  */
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 1dd24e30d7c..ef32ef2a8c2 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -610,6 +610,9 @@ maybe_instantiate_nsdmi_init (tree member, tsubst_flags_t 
complain)
  push_deferring_access_checks (dk_no_deferred);
  pushed = true;
}
+ /* Make sure current_template_parms is cleared so that mark_used
+is uninhibited.  */
+ auto ctpo = make_temp_override (current_template_parms, NULL_TREE);
 
  /* If we didn't push_to_top_level, still step out of constructor
 scope so build_base_path doesn't try to use its __in_chrg.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C
new file mode 100644
index 000..d9e17ea6724
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C
@@ -0,0 +1,23 @@
+// PR c++/109506
+// { dg-do link { target c++11 } }
+// { dg-additional-options "-fchecking=2" }
+
+template
+struct foo {
+  foo() { };
+};
+
+template
+class bar {
+  foo alloc_{};
+};
+
+template
+bar func1() {
+  return bar{};
+}
+
+int main() {
+  func1();
+}
+
diff --git a/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member2.C 
b/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member2.C
index e9bf7cac04c..d619b28cfe1 100644
--- a/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member2.C
+++ b/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member2.C
@@ -1,7 +1,7 @@
 // PR c++/89973
 // { dg-do compile { target c++14 } }
 
-constexpr int a(); // { dg-warning "used but never defined" }
+constexpr int a();
 
 template 
 constexpr void *b = a(); // { dg-error "invalid conversion" }
-- 
2.40.1.423.g2807bd2c10

[committed] amdgcn: Fix addsub bug

2023-04-27 Thread Andrew Stubbs

I've committed this patch to fix a couple of bugs introduced in the 
recent CMul patch.


First, the fmsubadd insn was accidentally all adds and no substracts.

Second, there were input dependencies on the undefined output register 
which caused the compiler to reserve unnecessary slots in the stack-frame.


Both issues are now fixed.

This patch is already committed to OG12. I'll backport it to GCC 13 shortly.

Andrewamdgcn: Fix addsub bug

The vec_fmsubadd instuction actually had add twice, by mistake.

Also improve code-gen for all the complex patterns by using properly
undefined values.  Mostly this just prevents the compiler reserving space
in the stack frame.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (cmul3): Use gcn_gen_undef.
(cml4): Likewise.
(vec_addsub3): Likewise.
(cadd3): Likewise.
(vec_fmaddsub4): Likewise.
(vec_fmsubadd4): Likewise, and use sub for the odd lanes.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 44c48468dd6..7290cdc2fd0 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2323,8 +2323,9 @@ (define_expand "cmul3"
 rtx even = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (even, get_exec (0xUL));
 rtx dest = operands[0];
-emit_insn (gen_3_exec (dest, t1, t1_perm, dest, even));
- // a*c-b*d 0
+emit_insn (gen_3_exec (dest, t1, t1_perm,
+  gcn_gen_undef (mode),
+  even));// a*c-b*d 0
 
 rtx t2_perm = gen_reg_rtx (mode);
 emit_insn (gen_dpp_swap_pairs (t2_perm, t2));  // b*c a*d
@@ -2368,7 +2369,8 @@ (define_expand "cml4"
 rtx even = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (even, get_exec (0xUL));
 rtx dest = operands[0];
-emit_insn (gen_sub3_exec (dest, t1, t2_perm, dest, even));
+emit_insn (gen_sub3_exec (dest, t1, t2_perm,
+gcn_gen_undef (mode), even));
 
 rtx odd = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (odd, get_exec (0xUL));
@@ -2392,7 +2394,8 @@ (define_expand "vec_addsub3"
 rtx dest = operands[0];
 rtx x = operands[1];
 rtx y = operands[2];
-emit_insn (gen_sub3_exec (dest, x, y, dest, even));
+emit_insn (gen_sub3_exec (dest, x, y, gcn_gen_undef (mode),
+even));
 rtx odd = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (odd, get_exec (0xUL));
 emit_insn (gen_add3_exec (dest, x, y, dest, odd));
@@ -2419,7 +2422,9 @@ (define_expand "cadd3"
 
 rtx even = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (even, get_exec (0xUL));
-emit_insn (gen_3_exec (dest, x, y, dest, even));
+emit_insn (gen_3_exec (dest, x, y,
+  gcn_gen_undef (mode),
+  even));
 rtx odd = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (odd, get_exec (0xUL));
 emit_insn (gen_3_exec (dest, x, y, dest, odd));
@@ -2439,7 +2444,8 @@ (define_expand "vec_fmaddsub4"
 rtx even = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (even, get_exec (0xUL));
 rtx dest = operands[0];
-emit_insn (gen_sub3_exec (dest, t1, operands[3], dest, even));
+emit_insn (gen_sub3_exec (dest, t1, operands[3],
+gcn_gen_undef (mode), even));
 rtx odd = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (odd, get_exec (0xUL));
 emit_insn (gen_add3_exec (dest, t1, operands[3], dest, odd));
@@ -2459,10 +2465,11 @@ (define_expand "vec_fmsubadd4"
 rtx even = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (even, get_exec (0xUL));
 rtx dest = operands[0];
-emit_insn (gen_add3_exec (dest, t1, operands[3], dest, even));
+emit_insn (gen_add3_exec (dest, t1, operands[3],
+gcn_gen_undef (mode), even));
 rtx odd = gen_rtx_REG (DImode, EXEC_REG);
 emit_move_insn (odd, get_exec (0xUL));
-emit_insn (gen_add3_exec (dest, t1, operands[3], dest, odd));
+emit_insn (gen_sub3_exec (dest, t1, operands[3], dest, odd));
 
 DONE;
   })

Re: [PATCH] libsanitizer: cherry-pick commit 05551c658269 from upstream

2023-04-27 Thread H.J. Lu via Gcc-patches

On Thu, Apr 27, 2023 at 12:03 AM Martin Liška  wrote:
>
> On 4/27/23 04:32, H.J. Lu via Gcc-patches wrote:
> > cherry-pick:
>
> Can you please wait a few days before it? I'm going to merge again
> in the near future after https://reviews.llvm.org/D144073 got handled.

Sure.

> Martin
>
> >
> > 05551c658269 [sanitizer] Correct alignment of x32 __sanitizer_siginfo
> >
> >   * sanitizer_common/sanitizer_platform_limits_posix.h
> >   (__sanitizer_siginfo_pad): Use u64 to align x32
> >   __sanitizer_siginfo to 8 bytes.
> > ---
> >  .../sanitizer_common/sanitizer_platform_limits_posix.h   | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git 
> > a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
> > b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
> > index cfca7bdedbe..e6f298c26e1 100644
> > --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
> > +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
> > @@ -578,8 +578,13 @@ struct __sanitizer_sigset_t {
> >  #endif
> >
> >  struct __sanitizer_siginfo_pad {
> > +#if SANITIZER_X32
> > +  // x32 siginfo_t is aligned to 8 bytes.
> > +  u64 pad[128 / sizeof(u64)];
> > +#else
> >// Require uptr, because siginfo_t is always pointer-size aligned on 
> > Linux.
> >uptr pad[128 / sizeof(uptr)];
> > +#endif
> >  };
> >
> >  #if SANITIZER_LINUX
>


-- 
H.J.

[PATCH v5 05/11] RISC-V: Add AMO release bits

2023-04-27 Thread Patrick O'Neill

This patch sets the relevant .rl bits on amo operations.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): change behavior
of %A to include release bits.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 02eb5125ac1..d46781d8981 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4503,8 +4503,13 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire (model))
+  if (riscv_memmodel_needs_amo_acquire (model)
+ && riscv_memmodel_needs_release_fence (model))
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
+  else if (riscv_memmodel_needs_release_fence (model))
+   fputs (".rl", file);
   break;
 
 case 'F':
-- 
2.34.1

[PATCH v5 09/11] RISC-V: Weaken mem_thread_fence

2023-04-27 Thread Patrick O'Neill

This change brings atomic fences in line with table A.6 of the ISA
manual.

Relax mem_thread_fence according to the memmodel given.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (mem_thread_fence_1): Change fence
depending on the given memory model.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Consolidate tests in [PATCH v3 10/10]
* Remove helper functions
---
 gcc/config/riscv/sync.md | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 3e6345e83a3..ba132d8a1ce 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -45,14 +45,24 @@
   DONE;
 })
 
-;; Until the RISC-V memory model (hence its mapping from C++) is finalized,
-;; conservatively emit a full FENCE.
 (define_insn "mem_thread_fence_1"
   [(set (match_operand:BLK 0 "" "")
(unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))
(match_operand:SI 1 "const_int_operand" "")] ;; model
   ""
-  "fence\tiorw,iorw")
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[1]);
+model = memmodel_base (model);
+if (model == MEMMODEL_SEQ_CST)
+   return "fence\trw,rw";
+else if (model == MEMMODEL_ACQ_REL)
+   return "fence.tso";
+else if (model == MEMMODEL_ACQUIRE)
+   return "fence\tr,rw";
+else if (model == MEMMODEL_RELEASE)
+   return "fence\trw,w";
+  }
+  [(set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
-- 
2.34.1

[PATCH v5 03/11] RISC-V: Enforce subword atomic LR/SC SEQ_CST

2023-04-27 Thread Patrick O'Neill

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill 
---
v5 Changelog:
* Add this patch to address the added inline subword atomic sequences.
---
 gcc/config/riscv/sync.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 19274528262..0c83ef04607 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -109,7 +109,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "\t%5, %0, %2\;"
   "and\t%5, %5, %3\;"
   "and\t%6, %0, %4\;"
@@ -173,7 +173,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "and\t%5, %0, %2\;"
   "not\t%5, %5\;"
   "and\t%5, %5, %3\;"
@@ -278,7 +278,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "and\t%4, %0, %3\;"
   "or\t%4, %4, %2\;"
   "sc.w.rl\t%4, %4, %1\;"
@@ -443,7 +443,7 @@
   "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
   {
 return "1:\;"
-  "lr.w.aq\t%0, %1\;"
+  "lr.w.aqrl\t%0, %1\;"
   "and\t%6, %0, %4\;"
   "bne\t%6, %z2, 1f\;"
   "and\t%6, %0, %5\;"
-- 
2.34.1

[PATCH v5 11/11] RISC-V: Table A.6 conformance tests

2023-04-27 Thread Patrick O'Neill

These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-27 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
 .../gcc.target/riscv/amo-table-a-6-amo-add-1.c|  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-2.c|  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-3.c|  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-4.c|  8 
 .../gcc.target/riscv/amo-table-a-6-amo-add-5.c|  8 
 .../riscv/amo-table-a-6-compare-exchange-1.c  | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-2.c  | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-3.c  | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-4.c  | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-5.c  | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-6.c  | 11 +++
 .../riscv/amo-table-a-6-compare-exchange-7.c  | 10 ++
 .../gcc.target/riscv/amo-table-a-6-fence-1.c  |  8 
 .../gcc.target/riscv/amo-table-a-6-fence-2.c  | 10 ++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c  | 10 ++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c  | 10 ++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c  | 10 ++
 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-1.c |  9 +
 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-2.c | 11 +++
 gcc/testsuite/gcc.target/riscv/amo-table-a-6-load-3.c | 11 +++
 .../gcc.target/riscv/amo-table-a-6-store-1.c  |  9 +
 .../gcc.target/riscv/amo-table-a-6-store-2.c  | 11 +++
 .../gcc.target/riscv/amo-table-a-6-store-compat-3.c   | 11 +++
 .../riscv/amo-table-a-6-subword-amo-add-1.c   |  9 +
 .../riscv/amo-table-a-6-subword-amo-add-2.c   |  9 +
 .../riscv/amo-table-a-6-subword-amo-add-3.c   |  9 +
 .../riscv/amo-table-a-6-subword-amo-add-4.c   |  9 +
 .../riscv/amo-table-a-6-subword-amo-add-5.c   |  9 +
 28 files changed, 266 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-tab

[PATCH v5 08/11] RISC-V: Weaken LR/SC pairs

2023-04-27 Thread Patrick O'Neill

Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.

Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.

This change brings LR/SC ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_union_memmodels): Expose
riscv_union_memmodels function to sync.md.
* config/riscv/riscv.cc (riscv_union_memmodels): Add function to
get the union of two memmodels in sync.md.
(riscv_print_operand): Add %I and %J flags that output the
optimal LR/SC flag bits for a given memory model.
* config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl
bits on SC op and replace with optimized %I, %J flags.

Signed-off-by: Patrick O'Neill 
---
v3 Changelog:
* Consolidate tests in [PATCH v3 10/10]
---
v5 Changelog:
* Also optimize subword LR/SC ops based on given memory model.
---
 gcc/config/riscv/riscv-protos.h |   3 +
 gcc/config/riscv/riscv.cc   |  44 
 gcc/config/riscv/sync.md| 114 +++-
 3 files changed, 114 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f87661bde2c..5fa9e1122ab 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_RISCV_PROTOS_H
 #define GCC_RISCV_PROTOS_H
 
+#include "memmodel.h"
+
 /* Symbol types we understand.  The order of this list must match that of
the unspec enum in riscv.md, subsequent to UNSPEC_ADDRESS_FIRST.  */
 enum riscv_symbol_type {
@@ -81,6 +83,7 @@ extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
+extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9eba03ac189..69e9b2aa548 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4289,6 +4289,36 @@ riscv_print_operand_reloc (FILE *file, rtx op, bool 
hi_reloc)
   fputc (')', file);
 }
 
+/* Return the memory model that encapuslates both given models.  */
+
+enum memmodel
+riscv_union_memmodels (enum memmodel model1, enum memmodel model2)
+{
+  model1 = memmodel_base (model1);
+  model2 = memmodel_base (model2);
+
+  enum memmodel weaker = model1 <= model2 ? model1: model2;
+  enum memmodel stronger = model1 > model2 ? model1: model2;
+
+  switch (stronger)
+{
+  case MEMMODEL_SEQ_CST:
+  case MEMMODEL_ACQ_REL:
+   return stronger;
+  case MEMMODEL_RELEASE:
+   if (weaker == MEMMODEL_ACQUIRE || weaker == MEMMODEL_CONSUME)
+ return MEMMODEL_ACQ_REL;
+   else
+ return stronger;
+  case MEMMODEL_ACQUIRE:
+  case MEMMODEL_CONSUME:
+  case MEMMODEL_RELAXED:
+   return stronger;
+  default:
+   gcc_unreachable ();
+}
+}
+
 /* Return true if the .AQ suffix should be added to an AMO to implement the
acquire portion of memory model MODEL.  */
 
@@ -4342,6 +4372,8 @@ riscv_memmodel_needs_amo_release (enum memmodel model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
+   'I' Print the LR suffix for memory model OP.
+   'J' Print the SC suffix for memory model OP.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -4511,6 +4543,18 @@ riscv_print_operand (FILE *file, rtx op, int letter)
fputs (".rl", file);
   break;
 
+case 'I':
+  if (model == MEMMODEL_SEQ_CST)
+   fputs (".aqrl", file);
+  else if (riscv_memmodel_needs_amo_acquire (model))
+   fputs (".aq", file);
+  break;
+
+case 'J':
+  if (riscv_memmodel_needs_amo_release (model))
+   fputs (".rl", file);
+  break;
+
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 9a3b57bd09f..3e6345e83a3 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -116,21 +116,22 @@
(unspec_volatile:SI
  [(any_atomic:SI (match_dup 1)
 (match_operand:SI 2 "register_operand" "rI")) ;; value for 
op
-  (match_operand:SI 3 "register_operand" "rI")]   ;; mask
+  (match_operand:SI 3 "const_int_operand")]

[PATCH v5 02/11] RISC-V: Enforce Libatomic LR/SC SEQ_CST

2023-04-27 Thread Patrick O'Neill

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

libgcc/ChangeLog:

* config/riscv/atomic.c: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill 
---
 libgcc/config/riscv/atomic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/riscv/atomic.c b/libgcc/config/riscv/atomic.c
index 573d163ea04..bd2b033132b 100644
--- a/libgcc/config/riscv/atomic.c
+++ b/libgcc/config/riscv/atomic.c
@@ -41,7 +41,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1, tmp2;  \
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  #insn " %[tmp1], %[old], %[value]\n\t"\
  invert\
  "and %[tmp1], %[tmp1], %[mask]\n\t"   \
@@ -75,7 +75,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 unsigned old, tmp1;
\
\
 asm volatile ("1:\n\t" \
- "lr.w.aq %[old], %[mem]\n\t"  \
+ "lr.w.aqrl %[old], %[mem]\n\t"\
  "and %[tmp1], %[old], %[mask]\n\t"\
  "bne %[tmp1], %[o], 1f\n\t"   \
  "and %[tmp1], %[old], %[not_mask]\n\t"\
-- 
2.34.1

[PATCH v5 07/11] RISC-V: Eliminate AMO op fences

2023-04-27 Thread Patrick O'Neill

Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.

This change brings AMO ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_memmodel_needs_amo_release): Change function name.
(riscv_print_operand): Remove unneeded %F case.
* config/riscv/sync.md: Remove unneeded fences.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 16 +---
 gcc/config/riscv/sync.md  | 12 ++--
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d46781d8981..9eba03ac189 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4312,11 +4312,11 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
 }
 }
 
-/* Return true if a FENCE should be emitted to before a memory access to
-   implement the release portion of memory model MODEL.  */
+/* Return true if the .RL suffix should be added to an AMO to implement the
+   release portion of memory model MODEL.  */
 
 static bool
-riscv_memmodel_needs_release_fence (enum memmodel model)
+riscv_memmodel_needs_amo_release (enum memmodel model)
 {
   switch (model)
 {
@@ -4342,7 +4342,6 @@ riscv_memmodel_needs_release_fence (enum memmodel model)
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
-   'F' Print a FENCE if the memory model requires a release.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -4504,19 +4503,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 
 case 'A':
   if (riscv_memmodel_needs_amo_acquire (model)
- && riscv_memmodel_needs_release_fence (model))
+ && riscv_memmodel_needs_amo_release (model))
fputs (".aqrl", file);
   else if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
-  else if (riscv_memmodel_needs_release_fence (model))
+  else if (riscv_memmodel_needs_amo_release (model))
fputs (".rl", file);
   break;
 
-case 'F':
-  if (riscv_memmodel_needs_release_fence (model))
-   fputs ("fence iorw,ow; ", file);
-  break;
-
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 1acb78a9ae4..9a3b57bd09f 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -91,9 +91,9 @@
   (match_operand:SI 2 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F2amo.%A2 zero,%z1,%0"
+  "amo.%A2\tzero,%z1,%0"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -105,9 +105,9 @@
   (match_operand:SI 3 "const_int_operand")] ;; model
 UNSPEC_SYNC_OLD_OP))]
   "TARGET_ATOMIC"
-  "%F3amo.%A3 %0,%z2,%1"
+  "amo.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_insn "subword_atomic_fetch_strong_"
   [(set (match_operand:SI 0 "register_operand" "=&r") ;; old value 
at mem
@@ -247,9 +247,9 @@
(set (match_dup 1)
(match_operand:GPR 2 "register_operand" "0"))]
   "TARGET_ATOMIC"
-  "%F3amoswap.%A3 %0,%z2,%1"
+  "amoswap.%A3\t%0,%z2,%1"
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 4))])
 
 (define_expand "atomic_exchange"
   [(match_operand:SHORT 0 "register_operand") ;; old value at mem
-- 
2.34.1

[PATCH v5 10/11] RISC-V: Weaken atomic loads

2023-04-27 Thread Patrick O'Neill

This change brings atomic loads in line with table A.6 of the ISA
manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (atomic_load): Implement atomic
load mapping.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index ba132d8a1ce..6e7c762ac57 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -26,6 +26,7 @@
   UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
   UNSPEC_SYNC_EXCHANGE_SUBWORD
+  UNSPEC_ATOMIC_LOAD
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
 ])
@@ -66,8 +67,31 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with conservative fences.  Fall back to fences for
-;; atomic loads.
+(define_insn "atomic_load"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(unspec_volatile:GPR
+  [(match_operand:GPR 1 "memory_operand" "A")
+   (match_operand:SI 2 "const_int_operand")]  ;; model
+  UNSPEC_ATOMIC_LOAD))]
+  "TARGET_ATOMIC"
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+model = memmodel_base (model);
+
+if (model == MEMMODEL_SEQ_CST)
+  return "fence\trw,rw\;"
+"l\t%0,%1\;"
+"fence\tr,rw";
+if (model == MEMMODEL_ACQUIRE)
+  return "l\t%0,%1\;"
+"fence\tr,rw";
+else
+  return "l\t%0,%1";
+  }
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 12))])
+
+;; Implement atomic stores with conservative fences.
 ;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store"
   [(set (match_operand:GPR 0 "memory_operand" "=A")
-- 
2.34.1

[PATCH v5 00/11] RISC-V: Implement ISA Manual Table A.6 Mappings

2023-04-27 Thread Patrick O'Neill

This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157
 

Context
-
GCC defined RISC-V mappings [1] before the Memory Model task group
finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2].

For at least a year now, we've known that the mappings were different,
but it wasn't clear if these unique mappings had correctness issues.

Andrea Parri found an issue with the GCC mappings, showing that
atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do
not enforce release ordering guarantees. (Meaning the GCC mappings have
a correctness issue).
  https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/ 

Why not A.6?
-
We can update our mappings now, so the obvious choice would be to
implement Table A.6 (what LLVM implements/ISA manual recommends).

The reason why that isn't the best path forward for GCC is due to a
proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.

For context, there is discussion about fast-tracking the addition of
these instructions. The RISCV architectural review committee supports
adopting a "new and common atomics ABI for gcc and LLVM toochains ...
that assumes the addition of the preceding instructions”. That common
ABI is likely to be A.7.
  https://lists.riscv.org/g/tech-privileged/message/1284 

Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
against that risk by emitting a conservative fence after SEQ_CST stores
to make the mapping compatible with both A.6 and A.7.

What does a mapping compatible with both A.6 & A.7 look like?
-
It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
fence rw,rw. It's strictly stronger than Table A.6.

Microbenchmark
-
Hans Boehm helpfully wrote a microbenchmark [3] that uses ARM to give a
rough estimate for the performance benefits/penalties of the different
mappings. The microbenchmark is single threaded and almost-write-only.
This case seems unlikely but is useful for getting a rough idea of the
workload that would be impacted the most.

Testcases
---
Control: A simple volatile store. This is most similar to a relaxed
store.
Release Store: This is most similar to Sw.rl (one of the instructions in
Hans' proposal).
Store with release fence: This is most similar to the mapping present in
Table A.6.
Store with two fences: This is most similar to the compatibility mapping
present in this patchset.

Machines
---
Intel(R) Core(TM) i7-8650U (sanity check only): x86 TSO
Cortex A53 (Raspberry pi): ARM In order core
Cortex A55 (Pixel 6 Pro): ARM In order core
Cortex A76 (Pixel 6 Pro): ARM Out of order core
Cortex X1 (Pixel 6 Pro): ARM Out of order core

Microbenchmark Results [4]

Units are nsecs per iteration.

Sanity check
MachineCONTROL   REL_STORE   STORE_REL_FENCE   STORE_TWO_FENCE
------   -   ---   ---
Intel i7-8650U 1.34812   1.30038 1.293318.0474


Machine CONTROL   REL_STORE   STORE_REL_FENCE   STORE_TWO_FENCE
--- ---   -   ---   ---
Cortex A53  7.15224   10.7282 7.15221   10.013
Cortex A55  2.77965   8.89654 4.44787   7.78331
Cortex A76  1.78021   1.86095 5.33088   8.88462
Cortex X1   2.14252   2.14258 4.32982   7.05234

Reordered tests (using -r flag on microbenchmark)
Machine CONTROL   REL_STORE   STORE_REL_FENCE   STORE_TWO_FENCE
--- ---   -   ---   ---
Cortex A53  7.15227   10.7282 7.16113   10.034
Cortex A55  2.78024   8.89574 4.44844   7.78428
Cortex A76  1.77686   1.81081 5.33018.88346
Cortex X1   2.14254   2.14251 4.32737.05239

Benchmark Interpretation

As expected, out of order machines are significantly faster with the
REL_STORE mappings. Unexpectedly, the in-order machines are
significantly slower with REL_STORE rather than REL_STORE_FENCE.

Most machines in the wild are expected to use Table A.7 once the
instructions are introduced. 
Incurring this added cost now will make it easier for compiled RISC-V
binaries to transition to the A.7 memory model mapping.

The performance benefits of moving to A.7 can be more clearly seen using
an almost-all-load microbenchmark (included on page 3 of Hans’
proposal). The code for that microbenchmark is attached below [5].
  
https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf
 
  https://lists.riscv.org/g/tech-unprivileged/topic/92916241 

Caveats

This is a very synthetic microbenchmark that represents what is expected
to be a very unlikely workload. Nevertheless, it's helpful to see the

[PATCH v5 06/11] RISC-V: Strengthen atomic stores

2023-04-27 Thread Patrick O'Neill

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill 

PR 89835

gcc/ChangeLog:

* config/riscv/sync.md:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr89835.c: New test.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 21 ++---
 gcc/testsuite/gcc.target/riscv/pr89835.c |  9 +
 2 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr89835.c

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 5620d6ffa58..1acb78a9ae4 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -56,7 +56,9 @@
 
 ;; Atomic memory operations.
 
-;; Implement atomic stores with amoswap.  Fall back to fences for atomic loads.
+;; Implement atomic stores with conservative fences.  Fall back to fences for
+;; atomic loads.
+;; This allows us to be compatible with the ISA manual Table A.6 and Table A.7.
 (define_insn "atomic_store"
   [(set (match_operand:GPR 0 "memory_operand" "=A")
 (unspec_volatile:GPR
@@ -64,9 +66,22 @@
(match_operand:SI 2 "const_int_operand")]  ;; model
   UNSPEC_ATOMIC_STORE))]
   "TARGET_ATOMIC"
-  "%F2amoswap.%A2 zero,%z1,%0"
+  {
+enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+model = memmodel_base (model);
+
+if (model == MEMMODEL_SEQ_CST)
+  return "fence\trw,w\;"
+"s\t%z1,%0\;"
+"fence\trw,rw";
+if (model == MEMMODEL_RELEASE)
+  return "fence\trw,w\;"
+"s\t%z1,%0";
+else
+  return "s\t%z1,%0";
+  }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
+   (set (attr "length") (const_int 12))])
 
 (define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+A")
diff --git a/gcc/testsuite/gcc.target/riscv/pr89835.c 
b/gcc/testsuite/gcc.target/riscv/pr89835.c
new file mode 100644
index 000..ab190e11b60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr89835.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* Verify that relaxed atomic stores use simple store instuctions.  */
+/* { dg-final { scan-assembler-not "amoswap" } } */
+
+void
+foo(int bar, int baz)
+{
+  __atomic_store_n(&bar, baz, __ATOMIC_RELAXED);
+}
-- 
2.34.1

[PATCH v5 04/11] RISC-V: Enforce atomic compare_exchange SEQ_CST

2023-04-27 Thread Patrick O'Neill

This patch enforces SEQ_CST for atomic compare_exchange ops.

Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md: Change FENCE/LR.aq/SC.aq into
sequentially consistent LR.aqrl/SC.rl pair.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/sync.md | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 0c83ef04607..5620d6ffa58 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -297,9 +297,16 @@
 UNSPEC_COMPARE_AND_SWAP))
(clobber (match_scratch:GPR 6 "=&r"))]
   "TARGET_ATOMIC"
-  "%F5 1: lr.%A5 %0,%1; bne %0,%z2,1f; sc.%A4 %6,%z3,%1; bnez %6,1b; 
1:"
+  {
+return "1:\;"
+  "lr..aqrl\t%0,%1\;"
+  "bne\t%0,%z2,1f\;"
+  "sc..rl\t%6,%z3,%1\;"
+  "bnez\t%6,1b\;"
+  "1:";
+  }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 20))])
+   (set (attr "length") (const_int 16))])
 
 (define_expand "atomic_compare_and_swap"
   [(match_operand:SI 0 "register_operand" "")   ;; bool output
-- 
2.34.1

[PATCH v5 01/11] RISC-V: Eliminate SYNC memory models

2023-04-27 Thread Patrick O'Neill

Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and
sanitize memmodel input with memmodel_base.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv.cc | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1529855a2b4..02eb5125ac1 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4299,14 +4299,11 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
return true;
 
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -4325,14 +4322,11 @@ riscv_memmodel_needs_release_fence (enum memmodel model)
 {
   case MEMMODEL_ACQ_REL:
   case MEMMODEL_SEQ_CST:
-  case MEMMODEL_SYNC_SEQ_CST:
   case MEMMODEL_RELEASE:
-  case MEMMODEL_SYNC_RELEASE:
return true;
 
   case MEMMODEL_ACQUIRE:
   case MEMMODEL_CONSUME:
-  case MEMMODEL_SYNC_ACQUIRE:
   case MEMMODEL_RELAXED:
return false;
 
@@ -4371,6 +4365,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 }
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
+  const enum memmodel model = memmodel_base (INTVAL (op));
 
   switch (letter)
 {
@@ -4508,12 +4503,12 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'A':
-  if (riscv_memmodel_needs_amo_acquire ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_amo_acquire (model))
fputs (".aq", file);
   break;
 
 case 'F':
-  if (riscv_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
+  if (riscv_memmodel_needs_release_fence (model))
fputs ("fence iorw,ow; ", file);
   break;
 
-- 
2.34.1

[PATCH] OpenMP: Noncontiguous "target update" for Fortran

2023-04-27 Thread Julian Brown

This patch implements noncontiguous "target update" for Fortran, on top
of the following in-review patch series:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

(with followup:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html)

and:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613785.html

The existing middle end/runtime bits relating to C++ support are reused,
with some small adjustments, e.g.:

  1. The node used to map the OMP "array descriptor" (from omp-low.cc
 onwards) now uses the OMP_CLAUSE_SIZE field as a bias (the difference
 between the "virtual origin" element with zero indices in each
 dimension and the first element actually stored in memory).

  2. The OMP_CLAUSE_SIZE field of a GOMP_MAP_DIM_STRIDE node may now be
 used to store a "span", which is the distance in bytes between
 two adjacent elements in an array (with unit stride) when that is
 different from the element size, as it can be in Fortran.

The implementation goes to some effort to massage Fortran array metadata
(array descriptors) into a form that can ultimately be consumed by
omp_target_memcpy_rect_worker. The method for doing this is described
in comments in the patch body.

Tested with offloading to nvptx. OK?

2023-04-27  Julian Brown  

gcc/fortran/
* openmp.cc (resolve_omp_clauses): Don't forbid "target update" with
non-unit stride.
* trans-openmp.cc (gfc_trans_omp_arrayshape_type,
gfc_omp_calculate_gcd, gfc_desc_to_omp_noncontig_array,
gfc_omp_contiguous_update_p): New functions.
(gfc_trans_omp_clauses): Handle noncontiguous to/from clauses for OMP
"target update" directives.

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses): Don't gimplify
VIEW_CONVERT_EXPR away in GOMP_MAP_TO_GRID/GOMP_MAP_FROM_GRID clauses.
* omp-low.cc (omp_noncontig_descriptor_type): Add SPAN field.
(scan_sharing_clauses): Don't store descriptor size in its
OMP_CLAUSE_SIZE field.
(lower_omp_target): Add missing OMP_CLAUSE_MAP check.  Add special-case
string handling.  Handle span and bias.  Use low bound instead of zero
as index for trailing full dimensions.

libgomp/
* libgomp.h (omp_noncontig_array_desc): Add span field.
* target.c (omp_target_memcpy_rect_worker): Add span parameter. Update
forward declaration. Handle span != element_size.
(gomp_update): Handle bias in descriptor's size slot.  Update calls to
omp_target_memcpy_rect_worker.
* testsuite/libgomp.fortran/noncontig-updates-1.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-2.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-3.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-4.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-5.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-6.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-7.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-8.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-9.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-10.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-11.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-12.f90: New test.
* testsuite/libgomp.fortran/noncontig-updates-13.f90: New test.

gcc/testsuite/
* gfortran.dg/gomp/noncontig-updates-1.f90: New test.
* gfortran.dg/gomp/noncontig-updates-2.f90: New test.
* gfortran.dg/gomp/noncontig-updates-3.f90: New test.
* gfortran.dg/gomp/noncontig-updates-4.f90: New test.
---
 gcc/fortran/openmp.cc |   5 +-
 gcc/fortran/trans-openmp.cc   | 496 ++
 gcc/gimplify.cc   |  10 +
 gcc/omp-low.cc|  73 ++-
 .../gfortran.dg/gomp/noncontig-updates-1.f90  |  19 +
 .../gfortran.dg/gomp/noncontig-updates-2.f90  |  16 +
 .../gfortran.dg/gomp/noncontig-updates-3.f90  |  16 +
 .../gfortran.dg/gomp/noncontig-updates-4.f90  |  15 +
 libgomp/libgomp.h |   1 +
 libgomp/target.c  |  47 +-
 .../libgomp.fortran/noncontig-updates-1.f90   |  54 ++
 .../libgomp.fortran/noncontig-updates-10.f90  |  29 +
 .../libgomp.fortran/noncontig-updates-11.f90  |  51 ++
 .../libgomp.fortran/noncontig-updates-12.f90  |  59 +++
 .../libgomp.fortran/noncontig-updates-13.f90  |  42 ++
 .../libgomp.fortran/noncontig-updates-2.f90   | 101 
 .../libgomp.fortran/noncontig-updates-3.f90   |  47 ++
 .../libgomp.fortran/noncontig-updates-4.f90   |  78 +++
 .../libgomp.fortran/noncontig-updates-5.f90   |  55 ++
 .../libgomp.fortran/noncontig-updates-6.f90   |  34 ++
 .../libgomp.fortran/noncontig-updates-7.f90   |  36 ++
 .../libgomp.fortran/noncontig-update

Re: [PATCH v4 05/10] RISC-V: autovec: Add autovectorization patterns for binary integer operations

2023-04-27 Thread Palmer Dabbelt


On Wed, 26 Apr 2023 17:04:17 PDT (-0700), colli...@rivosinc.com wrote:

Hi Robin and Juzhe,

Just took a look and I like the approach.


I assume it's best to just squash these into the series?  That seems 
reasonable to me, the only issue is that Micheal's PTO for a few days 
(this week and the first half on next week), so it might take a bit 
longer that expected.  There's a v5 on the lists, but we didn't have 
time to pick this all up and figured it'd be better to just get out 
whatever was ready.


Kevin: do you have time to squash these in and re-spin the tests?  The 
changes are big enough to warrant a v6 already, so might as well get 
started now.



On 4/26/23 19:43, juzhe.zhong wrote:

Yeah，Robin stuff is what I want and is making perfect sense for me.
 Replied Message 
FromRobin Dapp 
Date04/27/2023 02:15
To  juzhe.zh...@rivai.ai
,
collison ,
gcc-patches 
Cc  jeffreyalaw ,
Kito.cheng ,
kito.cheng ,
palmer ,
palmer 
Subject Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization
patterns for binary integer operations

Hi Michael,

I have the diff below for the binops in my tree locally.
Maybe something like this works for you? Untested but compiles and
the expander helpers would need to be fortified obviously.

Regards
Robin

--

gcc/ChangeLog:

   * config/riscv/autovec.md (3): New binops expander.
   * config/riscv/riscv-protos.h (emit_nonvlmax_binop): Define.
   * config/riscv/riscv-v.cc (emit_pred_binop): New function.
   (emit_nonvlmax_binop): New function.
   * config/riscv/vector-iterators.md: New iterator.
---
gcc/config/riscv/autovec.md  | 12 
gcc/config/riscv/riscv-protos.h  |  1 +
gcc/config/riscv/riscv-v.cc  | 89 
gcc/config/riscv/vector-iterators.md | 20 +++
4 files changed, 97 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b5d46ff57ab..c21d241f426 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -47,3 +47,15 @@ (define_expand "len_store_"
                 operands[1], operands[2], mode);
  DONE;
})
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+    (any_int_binop:VI (match_operand:VI 1 "register_operand")
+              (match_operand:VI 2 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_binop (code_for_pred (,
mode),
+                 operands[0], operands[1], operands[2],
+                 gen_reg_rtx (Pmode), mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h
b/gcc/config/riscv/riscv-protos.h
index f6ea6846736..5cca543c773 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -163,6 +163,7 @@ void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_nonvlmax_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode);
enum vlmul_type get_vlmul (machine_mode);
unsigned int get_ratio (machine_mode);
int get_ta (rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5e69427ac54..98ebc052340 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -52,7 +52,7 @@ namespace riscv_vector {
template  class insn_expander
{
public:
-  insn_expander () : m_opno (0) {}
+  insn_expander () : m_opno (0), has_dest(false) {}
  void add_output_operand (rtx x, machine_mode mode)
  {
create_output_operand (&m_ops[m_opno++], x, mode);
@@ -83,6 +83,44 @@ public:
add_input_operand (gen_int_mode (type, Pmode), Pmode);
  }

+  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  {
+    dest_mode = GET_MODE (dest);
+    has_dest = true;
+
+    add_output_operand (dest, dest_mode);
+
+    if (mask)
+  add_input_operand (mask, GET_MODE (mask));
+    else
+  add_all_one_mask_operand (mask_mode);
+
+    add_vundef_operand (dest_mode);
+  }
+
+  void set_len_and_policy (rtx len, bool vlmax_p)
+    {
+  gcc_assert (has_dest);
+  gcc_assert (len || vlmax_p);
+
+  if (len)
+    add_input_operand (len, Pmode);
+  else
+    {
+      rtx vlmax = gen_reg_rtx (Pmode);
+      emit_vlmax_vsetvl (dest_mode, vlmax);
+      add_input_operand (vlmax, Pmode);
+    }
+
+  if (GET_MODE_CLASS (dest_mode) != MODE_VECTOR_BOOL)
+    add_policy_operand (get_prefer_tail_policy (),
get_prefer_mask_policy ());
+
+  if (vlmax_p)
+    add_avl_type_operand (avl_type::VLMAX);
+  else
+    add_avl_type_operand (avl_type::NONVLMAX);
+    }
+
  void expand (enum insn_code icode, bool temporary_volatile_p

Re: [PATCH] doc: Describe behaviour of enums with fixed underlying type

2023-04-27 Thread Marek Polacek via Gcc-patches

On Thu, Apr 27, 2023 at 12:16:34PM +0100, Jonathan Wakely via Gcc-patches wrote:
> C2x adds the ability to give an enumeration type a fixed underlying
> type, as C++ already has. The -fshort-enums option alters the compiler's
> choice of underlying type, but when it's fixed the compiler can't
> choose.
> 
> Similarly for C++ -fstrict-enums has no effect with a fixed underlying
> type, because every value of the underlying type is a valid value of the
> enumeration type.
> 
> This caused confusion recently: https://gcc.gnu.org/PR109532
> 
> OK for trunk?

LGTM.
 
> -- >8 --
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi (Code Gen Options): Note that -fshort-enums
>   is ignored for a fixed underlying type.
>   (C++ Dialect Options): Likewise for -fstrict-enums.
> ---
>  gcc/doc/invoke.texi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2f40c58b21c..0f91464f8c0 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -3495,6 +3495,8 @@ defined in the C++ standard; basically, a value that 
> can be
>  represented in the minimum number of bits needed to represent all the
>  enumerators).  This assumption may not be valid if the program uses a
>  cast to convert an arbitrary integer value to the enumerated type.
> +This option has no effect for an enumeration type with a fixed underlying
> +type.
>  
>  @opindex fstrong-eval-order
>  @item -fstrong-eval-order
> @@ -18303,6 +18305,8 @@ Use it to conform to a non-default application binary 
> interface.
>  Allocate to an @code{enum} type only as many bytes as it needs for the
>  declared range of possible values.  Specifically, the @code{enum} type
>  is equivalent to the smallest integer type that has enough room.
> +This option has no effect for an enumeration type with a fixed underlying
> +type.
>  
>  @strong{Warning:} the @option{-fshort-enums} switch causes GCC to generate
>  code that is not binary compatible with code generated without that switch.
> -- 
> 2.40.0
> 

Marek

[pushed] c++: print conversion error at candidate location

2023-04-27 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In testcases like this one, the printing of candidates in a diagnostic has
been longer than necessary because it jumps back and forth between the call
site and the candidate site.  So here, we first say at the call site that no
match was found; then we note the candidate site, and then explain why it's
not suitable back at the call site, which means printing the call site line
with caret again.  With this patch, the conversion diagnostic is at the same
location as the candidate, so we don't need to print any input line.

gcc/cp/ChangeLog:

* call.cc (print_conversion_rejection): Use iloc_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/template/copy1.C: Adjust error lines.
---
 gcc/cp/call.cc| 1 +
 gcc/testsuite/g++.dg/template/copy1.C | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index cdd7701b9e7..2a06520c0c1 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -3847,6 +3847,7 @@ print_conversion_rejection (location_t loc, struct 
conversion_info *info,
   if (info->n_arg >= 0)
inform (loc, "  conversion of argument %d would be ill-formed:",
info->n_arg + 1);
+  iloc_sentinel ils = loc;
   perform_implicit_conversion (info->to_type, info->from,
   tf_warning_or_error);
 }
diff --git a/gcc/testsuite/g++.dg/template/copy1.C 
b/gcc/testsuite/g++.dg/template/copy1.C
index a34221df38b..eacd9e2c025 100644
--- a/gcc/testsuite/g++.dg/template/copy1.C
+++ b/gcc/testsuite/g++.dg/template/copy1.C
@@ -6,9 +6,10 @@
 
 struct A
 {
+  // { dg-error "reference" "" { target c++14_down } .+1 }
   A(A&);   // { dg-message "A::A" "" { target c++14_down } 
}
   template  A(T); // { dg-message "A::A" "" { target c++14_down } 
}
 };
 
-A a = 0; // { dg-error "" "" { target c++14_down } }
+A a = 0; // { dg-error "no match" "" { target c++14_down } }
 

base-commit: e0cf929d99bebd9a740db6db45d69957514e0c12
-- 
2.31.1

Re: [PATCH v3] RISCV: Add vector psabi checking.

2023-04-27 Thread Kito Cheng via Gcc-patches

Ooops, I found that it also warns on intrinsic functions, could you
try to find some way to exclude that?

e.g.
#include "riscv_vector.h"

void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out,
size_t n, int cond) {

 size_t vl;
 if (cond)
   vl = __riscv_vsetvlmax_e32m1();
 else
   vl = __riscv_vsetvlmax_e16mf2();
 for (size_t i = 0; i < n; i += 1) {
   vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); // warning: ABI for
the scalable vector type is currently in experimental stage and may
changes in the upcoming version of GCC. [-Wpsabi]
   vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
   vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
   __riscv_vse32_v_i32m1(out, c, vl);

}
}

On Thu, Apr 27, 2023 at 11:13 AM yanzhang.wang--- via Gcc-patches
 wrote:
>
> From: Yanzhang Wang 
>
> This patch adds support to check function's argument or return is vector type
> and throw warning if yes.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc:
> (riscv_scalable_vector_type_p): Determine whether the type is 
> scalable vector.
> (riscv_arg_has_vector): Determine whether the arg is vector type.
> (riscv_pass_in_vector_p): Check the vector type param is passed by 
> value.
> (riscv_get_arg_info): Add the checking.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/vector-abi-1.c: New test.
> * gcc.target/riscv/vector-abi-2.c: New test.
> * gcc.target/riscv/vector-abi-3.c: New test.
> * gcc.target/riscv/vector-abi-4.c: New test.
> * gcc.target/riscv/vector-abi-5.c: New test.
>
> Signed-off-by: Yanzhang Wang 
> Co-authored-by: Kito Cheng 
> ---
>  gcc/config/riscv/riscv.cc | 73 +++
>  gcc/testsuite/gcc.target/riscv/vector-abi-1.c | 14 
>  gcc/testsuite/gcc.target/riscv/vector-abi-2.c | 14 
>  gcc/testsuite/gcc.target/riscv/vector-abi-3.c | 14 
>  gcc/testsuite/gcc.target/riscv/vector-abi-4.c | 16 
>  gcc/testsuite/gcc.target/riscv/vector-abi-5.c | 15 
>  6 files changed, 146 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-5.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 76eee4a55e9..06e9fe7d924 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3728,6 +3728,76 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned 
> regno1,
>GEN_INT (offset2;
>  }
>
> +/* Use the TYPE_SIZE to distinguish the type with vector_size attribute and
> +   intrinsic vector type.  Because we can't get the decl for the params.  */
> +
> +static bool
> +riscv_scalable_vector_type_p (const_tree type)
> +{
> +  tree size = TYPE_SIZE (type);
> +  if (size && TREE_CODE (size) == INTEGER_CST)
> +return false;
> +
> +  /* For the data type like vint32m1_t, the size code is POLY_INT_CST.  */
> +  return true;
> +}
> +
> +static bool
> +riscv_arg_has_vector (const_tree type)
> +{
> +  bool is_vector = false;
> +
> +  switch (TREE_CODE (type))
> +{
> +case RECORD_TYPE:
> +  if (!COMPLETE_TYPE_P (type))
> +   break;
> +
> +  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> +   if (TREE_CODE (f) == FIELD_DECL)
> + {
> +   tree field_type = TREE_TYPE (f);
> +   if (!TYPE_P (field_type))
> + break;
> +
> +   /* Ignore it if it's fixed length vector.  */
> +   if (VECTOR_TYPE_P (field_type))
> + is_vector = riscv_scalable_vector_type_p (field_type);
> +   else
> + is_vector = riscv_arg_has_vector (field_type);
> + }
> +
> +  break;
> +
> +case VECTOR_TYPE:
> +  is_vector = riscv_scalable_vector_type_p (type);
> +  break;
> +
> +default:
> +  is_vector = false;
> +  break;
> +}
> +
> +  return is_vector;
> +}
> +
> +/* Pass the type to check whether it's a vector type or contains vector type.
> +   Only check the value type and no checking for vector pointer type.  */
> +
> +static void
> +riscv_pass_in_vector_p (const_tree type)
> +{
> +  static int warned = 0;
> +
> +  if (type && riscv_arg_has_vector (type) && !warned)
> +{
> +  warning (OPT_Wpsabi, "ABI for the scalable vector type is currently in 
> "
> +  "experimental stage and may changes in the upcoming version of 
> "
> +  "GCC.");
> +  warned = 1;
> +}
> +}
> +
>  /* Fill INFO with information about a single argument, and return an
> RTL pattern to pass or return the argument.  CUM is the cumulative
> state for earlier arguments.  MODE is the mode of this argument and
> @@ -3812,6 +3882,9 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
>

Re: [PATCH] RISC-V: Add required tls to read thread pointer test

2023-04-27 Thread Kito Cheng via Gcc-patches

Thanks, pushed :)

On Thu, Apr 27, 2023 at 11:32 AM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> The read-thread-pointer test may require the gcc configured
> with --enable-tls. If no, there x4 (aka tp) register will not
> be presented in the assembly code.
>
> This patch requires the tls for the dg checking. It will perform
> the test checking if --enable-tls and mark the test as unsupported
> if --disable-tls.
>
> Configured with --enable-tls:
> === gcc Summary ===
> of expected passes16
>
> Configured with --disable-tls:
> === gcc Summary ===
> of unsupported tests  8
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/read-thread-pointer.c: Add required tls.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/gcc.target/riscv/read-thread-pointer.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/read-thread-pointer.c 
> b/gcc/testsuite/gcc.target/riscv/read-thread-pointer.c
> index 401fb421129..5f460b5f746 100644
> --- a/gcc/testsuite/gcc.target/riscv/read-thread-pointer.c
> +++ b/gcc/testsuite/gcc.target/riscv/read-thread-pointer.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target tls_native } */
>
>  void *get_tp()
>  {
> --
> 2.34.1
>

[PATCH] Synchronize include/ctf.h with upstream binutils/libctf.

2023-04-27 Thread Roger Sayle


This patch updates include/ctf.h to match the current libctf version in
binutils' include/.  I recently attempted to build a uber tree (following
some notes that are so old they used CVS) and noticed that binutils won't
build with GCC's top-level include, due to CTF_F_IDXSORTED not being
defined in ctf.h.  There was also a discrepancy with ansidecl.h but
I'm unsure how (best) to resolve that issue.

This patch was tested on x86_64-pc-linux-gnu with a make bootstrap and
make -k check, both with and without --target_board=unix{-m32}, with
no new failures, to confirm that the usage of ctf.h in ctfout.cc and
dwarf2ctf.cc is compatible with the new version.  Ok for mainline?


2023-04-27  Roger Sayle  

include/ChangeLog
* ctf.h: Import latest version from binutils/libctf.


Thanks in advance,
Roger
--

diff --git a/include/ctf.h b/include/ctf.h
index b867fc5..4263799 100644
--- a/include/ctf.h
+++ b/include/ctf.h
@@ -1,5 +1,5 @@
 /* CTF format description.
-   Copyright (C) 2021-2023 Free Software Foundation, Inc.
+   Copyright (C) 2019-2023 Free Software Foundation, Inc.
 
This file is part of libctf.
 
@@ -89,13 +89,13 @@ extern "C"
entries and reorder them accordingly (dropping the indexes in the process).
 
Variable records (as distinct from data objects) provide a modicum of 
support
-   for non-ELF systems, mapping a variable name to a CTF type ID.  The variable
-   names are sorted into ASCIIbetical order, permitting binary searching.  We 
do
-   not define how the consumer maps these variable names to addresses or
+   for non-ELF systems, mapping a variable or function name to a CTF type ID.
+   The names are sorted into ASCIIbetical order, permitting binary searching.
+   We do not define how the consumer maps these variable names to addresses or
anything else, or indeed what these names represent: they might be names
looked up at runtime via dlsym() or names extracted at runtime by a debugger
or anything else the consumer likes.  Variable records with identically-
-   named entries in the data object section are removed.
+   named entries in the data object or function index section are removed.
 
The data types section is a list of variable size records that represent 
each
type, in order by their ID.  The types themselves form a directed graph,
@@ -132,6 +132,12 @@ extern "C"
 #define CTF_MAX_SIZE   0xfffe  /* Max size of a v2 type in bytes. */
 #define CTF_LSIZE_SENT 0x  /* Sentinel for v2 ctt_size.  */
 
+# define CTF_MAX_TYPE_V1   0x  /* Max type identifier value.  */
+# define CTF_MAX_PTYPE_V1  0x7fff  /* Max parent type identifier value.  */
+# define CTF_MAX_VLEN_V1   0x3ff   /* Max struct, union, enums or args.  */
+# define CTF_MAX_SIZE_V1   0xfffe  /* Max size of a type in bytes. */
+# define CTF_LSIZE_SENT_V1 0x  /* Sentinel for v1 ctt_size.  */
+
   /* Start of actual data structure definitions.
 
  Every field in these structures must have corresponding code in the
@@ -144,6 +150,20 @@ typedef struct ctf_preamble
   unsigned char ctp_flags; /* Flags (see below).  */
 } ctf_preamble_t;
 
+typedef struct ctf_header_v2
+{
+  ctf_preamble_t cth_preamble;
+  uint32_t cth_parlabel;   /* Ref to name of parent lbl uniq'd against.  */
+  uint32_t cth_parname;/* Ref to basename of parent.  */
+  uint32_t cth_lbloff; /* Offset of label section.  */
+  uint32_t cth_objtoff;/* Offset of object section.  */
+  uint32_t cth_funcoff;/* Offset of function section.  */
+  uint32_t cth_varoff; /* Offset of variable section.  */
+  uint32_t cth_typeoff;/* Offset of type section.  */
+  uint32_t cth_stroff; /* Offset of string section.  */
+  uint32_t cth_strlen; /* Length of string section in bytes.  */
+} ctf_header_v2_t;
+
 typedef struct ctf_header
 {
   ctf_preamble_t cth_preamble;
@@ -182,13 +202,19 @@ typedef struct ctf_header
 # define CTF_VERSION_1_UPGRADED_3 2
 # define CTF_VERSION_2 3
 
-/* Note: some flags may be valid only in particular format versions.  */
-
 #define CTF_VERSION_3 4
 #define CTF_VERSION CTF_VERSION_3 /* Current version.  */
 
-#define CTF_F_COMPRESS 0x1 /* Data buffer is compressed by libctf.  */
-#define CTF_F_NEWFUNCINFO 0x2  /* New v3 func info section format.  */
+/* All of these flags bar CTF_F_COMPRESS and CTF_F_IDXSORTED are bug-workaround
+   flags and are valid only in format v3: in v2 and below they cannot occur and
+   in v4 and later, they will be recycled for other purposes.  */
+
+#define CTF_F_COMPRESS 0x1 /* Data buffer is compressed by libctf. 
 */
+#define CTF_F_NEWFUNCINFO 0x2  /* New v3 func info section format.  */
+#define CTF_F_IDXSORTED 0x4/* Index sections already sorted.  */
+#define CTF_F_DYNSTR 0x8   /* Strings come from .dynstr.  */
+#define CTF_F_MAX (CTF_F_COMPRESS | CTF_F_NEWFUNCINFO | CTF_F_IDXSORTED

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Kito Cheng via Gcc-patches

> Could you try something like this? that should be more generic:
>
> (define_split
>  [(set (match_operand:VB 0 "register_operand")
>(if_then_else:VB
>  (unspec:VB
>[(match_operand:VB 1 "vector_all_trues_mask_operand")
> (match_operand 4 "vector_length_operand")
> (match_operand 5 "const_int_operand")
> (match_operand 6 "const_int_operand")
> (reg:SI VL_REGNUM)
> (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>  (match_operand:VB 3 "vector_move_operand")
>  (match_operand:VB 2 "vector_undef_operand")))]
>  "TARGET_VECTOR && reload_completed"

Remove the reload_completed should work well, but you might need more
test, I didn't run full test on this change :P

>  [(const_int 0)]
>  {
>emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode),
> RVV_VUNDEF (mode), CONST0_RTX (mode),
> operands[4], operands[5]));
>DONE;
>  }
> )

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Kito Cheng via Gcc-patches

> +(define_split
> +  [(set (match_operand: 0 "register_operand")
> +   (if_then_else:
> + (unspec:
> +   [(match_operand: 1 "vector_all_trues_mask_operand")
> +(match_operand  6 "vector_length_operand")
> +(match_operand  7 "const_int_operand")
> +(match_operand  8 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operator:   3 "comparison_simplify_to_clear_operator"
> +   [(match_operand:VI   4 "register_operand")
> +(match_operand:VI   5 "vector_arith_operand")])
> + (match_operand:2 "vector_merge_operand")))]
> +  "TARGET_VECTOR && reload_completed && operands[4] == operands[5]"

Could you try something like this? that should be more generic:

(define_split
 [(set (match_operand:VB 0 "register_operand")
   (if_then_else:VB
 (unspec:VB
   [(match_operand:VB 1 "vector_all_trues_mask_operand")
(match_operand 4 "vector_length_operand")
(match_operand 5 "const_int_operand")
(match_operand 6 "const_int_operand")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 (match_operand:VB 3 "vector_move_operand")
 (match_operand:VB 2 "vector_undef_operand")))]
 "TARGET_VECTOR && reload_completed"
 [(const_int 0)]
 {
   emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode),
RVV_VUNDEF (mode), CONST0_RTX (mode),
operands[4], operands[5]));
   DONE;
 }
)

[PATCH] MAINTAINERS: Change my email address.

2023-04-27 Thread Robin Dapp

Hi,

I'm at Ventana now.  Change my email address accordingly.  Also, add
myself to the DCO list.

Regards
 Robin

--

ChangeLog:

* MAINTAINERS: Change my email address.
---
 MAINTAINERS | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 169418d44f7..8b609411a30 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -385,7 +385,8 @@ Lawrence Crowl  

 Lili Cui   
 Ian Dall   
 David Daney
-Robin Dapp 
+Robin Dapp 
+Robin Dapp 
 Simon Dardis   
 Sudakshina Das 
 Bud Davis  
@@ -731,6 +732,8 @@ Certificate of Origin Version 1.1.  See 
https://gcc.gnu.org/dco.html for more
 information.
 
 
+Robin Dapp 
+Robin Dapp 
 Matthias Kretz 
 Tim Lange  
 Jeff Law   
-- 
2.40.0

Re: [PATCH] Add targetm.libm_function_max_error

2023-04-27 Thread Michael Matz via Gcc-patches

Hello,

On Thu, 27 Apr 2023, Jakub Jelinek wrote:

> The first really large error I see is for sinl with
> x/2gx &val
> 0x748160ed90d9425b0xefd8b811d6293294
> i.e.
> 1.5926552660973502228303666578452949e+253
> with most significant double being
> 1.5926552660973502e+253
> and low double
> -5.9963639272208416e+230

Such large numbers will always be a problem with the range reduction step 
in sin/cos.  With double-double the possible mantissage length can be much 
larger than 106, and the range reduction needs to be precise to at 
least those many bits to give anything sensible.

Realistically I think you can only expect reasonably exact results for 
double-double on operations that require range-reductions for
(a) "smallish" values.  Where the low double is (say) <= 2^16 * PI, or
(b) where the low double is consecutive to the high double, i.e. the
overall mantissa size (including the implicit zeros in the middle) is 
less than 107 (or whatever the current precision for the 
range-reduction step on large values is)

> given is
> -0.4025472157704263326278375983156912
> and expected (mpfr computed)
> -0.46994008859023245970759964236618727
> But if I try on x86_64:
> #define _GNU_SOURCE
> #include 
> 
> int
> main ()
> {
>   _Float128 f, f2, f3, f4;
>   double d, d2;
>   f = 1.5926552660973502228303666578452949e+253f128;
>   d = 1.5926552660973502e+253;
>   f2 = d;
>   f2 += -5.9963639272208416e+230;
>   f3 = sinf128 (f);
>   f4 = sinf128 (f2);
>   d2 = sin (d);
>   return 0;
> }
> where I think f2 is what matches most closely the 106 bit precision value,
> (gdb) p f
> $7 = 1.5926552660973502228303666578452949e+253
> (gdb) p f2
> $8 = 1.59265526609735022283036665784527174e+253
> (gdb) p f3
> $9 = -0.277062522218693980443596385112227247
> (gdb) p f4
> $10 = -0.402547215770426332627837598315693221
> and f4 is much closer to the given than to expected.

Sure, but that's because f2 is only "close" to the double-double exact 
value of (1.5926552660973502e+253 + -5.9963639272208416e+230) relatively, 
not absolutely.  As you already wrote the mantissa to represent the exact 
value (which double-double and mpfr can!) is larger than 106 bits.  The 
necessary error of rounding to represent it in f128 is small in ULPs, but 
very very large in magnitude.  Large magnitude changes of input value to 
sin/cos essentially put the value into completely different quadrants and 
positions within those quadrants, and hence the result under such rounding 
in input can be _wildly_ off.

E.g. imagine a double-double representing (2^107 * PI + PI/2) exactly 
(assume PI is the 53-bit representation of pi, that's why I say 
"exactly").  The correct result of sin() on that is 1.  The result on the 
nearest f128 input value (2^107 * PI) will be 0.  So you really can't 
compare f128 arithmetic with double-double one when the mantissas are too 
far away.

So, maybe you want to only let your tester test "good" double-double 
values, i.e. those that can be interpreted as a about-106-bit number where 
(high-exp - low-exp) <= about 53.

(Or just not care about the similarities of cos() on double-double to a 
random number generator :) )

Ciao,
Michael.

[PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

2023-04-27 Thread Pan Li via Gcc-patches

From: Pan Li 

When some RVV integer compare operators act on the same vector
registers without mask. They can be simplified to VMCLR.

This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
of the simplification by adding one new define_split.

Given we have:
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24<- optimized to vmclr.m
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

As above, we may have one instruction eliminated and require less
vector registers.

gcc/ChangeLog:

* config/riscv/predicates.md (comparison_simplify_to_clear_operator):
  Add new predicate of the simplification operators.
* config/riscv/vector.md: Add new define split to perform
  the simplification.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li 
Co-authored-by: kito-cheng 
---
 gcc/config/riscv/predicates.md|   6 +
 gcc/config/riscv/vector.md|  34 ++
 .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++
 3 files changed, 331 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..1626665825b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -328,6 +328,12 @@ (define_predicate "ltge_operator"
 (define_predicate "comparison_except_ltge_operator"
   (match_code "eq,ne,le,leu,gt,gtu"))
 
+;; Some comparison operator with same operands can be simpiled to clear.
+;; For example, op[0] = ne (op[1], op[1]) => op[0] = clr (op[0]).  We sort
+;; similar comparison operators here.
+(define_predicate "comparison_simplify_to_clear_operator"
+  (match_code "ne,lt,ltu,gt,gtu"))
+
 (define_predicate "comparison_except_eqge_operator"
   (match_code "le,leu,gt,gtu,lt,ltu"))
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index b3d23441679..47b97dfe69d 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7689,3 +7689,37 @@ (define_insn "@pred_fault_load"
   "vleff.v\t%0,%3%p1"
   [(set_attr "type" "vldff")
(set_attr "mode" "")])
+
+;; 
-
+;;  Integer Compare Instructions Simplification
+;; 
-
+;; Simplify to VMCLR.m Includes:
+;; - 1. VMSNE
+;; - 2. VMSLT
+;; - 3. VMSLTU
+;; - 4. VMSGT
+;; - 5. VMSGTU
+;; 
-
+(define_split
+  [(set (match_operand: 0 "register_operand")
+   (if_then_else:
+ (unspec:
+   [(match_operand: 1 "vector_all_trues_mask_operand")
+(match_operand  6 "vector_length_operand")
+(match_operand  7 "const_int_operand")
+(match_operand  8 "const_int_operand")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+ (match_operator:   3 "comparison_simplify_to_clear_operator"
+   [(match_operand:VI   4 "register_operand")
+(match_operand:VI   5 "vector_arith_operand")])
+ (match_operand:2 "vector_merge_operand")))]
+  "TARGET_VECTOR && reload_completed && operands[4] == operands[5]"
+  [(const_int 0)]
+  {
+emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mode),
+RVV_VUNDEF (mode), CONST0_RTX (mode),
+operands[6], operands[8]));
+DONE;
+  }
+)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
new file mode 100644
index 000..8954adad09d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -0,0 +1,291 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t te

RISC-V: Added support clmul[r,h] instructions for Zbc extension.

2023-04-27 Thread Karen Sargsyan via Gcc-patches

clmul[h] instructions were added only for the ZBKC extension.
This patch includes them in the ZBC extension too.
Besides, added support of 'clmulr' instructions for ZBC extension.

gcc/ChangeLog:

 * config/riscv/bitmanip.md: Added clmulr instruction.
 * config/riscv/riscv-builtins.cc (AVAIL): Add new.
 * config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
 * config/riscv/riscv-cmo.def: Added built-in function for clmulr.
 * config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
 * config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
functions to riscv-cmo.def.

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/zbc32.c: New test.
   * gcc.target/riscv/zbc64.c: New test.
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 44ad350c747..10ffb2d3f51 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -696,3 +696,32 @@
operands[8] = GEN_INT (setbit);
operands[9] = GEN_INT (clearbit);
 })
+
+;; ZBKC or ZBC extension
+(define_insn "riscv_clmul_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMUL))]
+  "TARGET_ZBKC || TARGET_ZBC"
+  "clmul\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn "riscv_clmulh_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMULH))]
+  "TARGET_ZBKC || TARGET_ZBC"
+  "clmulh\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
+;; ZBC extension
+(define_insn "riscv_clmulr_"
+  [(set (match_operand:X 0 "register_operand" "=r")
+(unspec:X [(match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "register_operand" "r")]
+  UNSPEC_CLMULR))]
+  "TARGET_ZBC"
+  "clmulr\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index 777aa529005..e4b7f0190df 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -26,10 +26,6 @@
 UNSPEC_PACKH
 UNSPEC_PACKW
 
-;; Zbkc unspecs
-UNSPEC_CLMUL
-UNSPEC_CLMULH
-
 ;; Zbkx unspecs
 UNSPEC_XPERM8
 UNSPEC_XPERM4
@@ -126,26 +122,6 @@
   "packw\t%0,%1,%2"
   [(set_attr "type" "crypto")])
 
-;; ZBKC extension
-
-(define_insn "riscv_clmul_"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")]
-  UNSPEC_CLMUL))]
-  "TARGET_ZBKC"
-  "clmul\t%0,%1,%2"
-  [(set_attr "type" "crypto")])
-
-(define_insn "riscv_clmulh_"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")]
-  UNSPEC_CLMULH))]
-  "TARGET_ZBKC"
-  "clmulh\t%0,%1,%2"
-  [(set_attr "type" "crypto")])
-
 ;; ZBKX extension
 
 (define_insn "riscv_xperm4_"
diff --git a/gcc/config/riscv/riscv-builtins.cc b/gcc/config/riscv/riscv-builtins.cc
index b1c4b7547d7..79681d75962 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -105,8 +105,6 @@ AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
 AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
 AVAIL (crypto_zbkb32, TARGET_ZBKB && !TARGET_64BIT)
 AVAIL (crypto_zbkb64, TARGET_ZBKB && TARGET_64BIT)
-AVAIL (crypto_zbkc32, TARGET_ZBKC && !TARGET_64BIT)
-AVAIL (crypto_zbkc64, TARGET_ZBKC && TARGET_64BIT)
 AVAIL (crypto_zbkx32, TARGET_ZBKX && !TARGET_64BIT)
 AVAIL (crypto_zbkx64, TARGET_ZBKX && TARGET_64BIT)
 AVAIL (crypto_zknd32, TARGET_ZKND && !TARGET_64BIT)
@@ -120,6 +118,10 @@ AVAIL (crypto_zksh32, TARGET_ZKSH && !TARGET_64BIT)
 AVAIL (crypto_zksh64, TARGET_ZKSH && TARGET_64BIT)
 AVAIL (crypto_zksed32, TARGET_ZKSED && !TARGET_64BIT)
 AVAIL (crypto_zksed64, TARGET_ZKSED && TARGET_64BIT)
+AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || TARGET_ZBC) && !TARGET_64BIT)
+AVAIL (clmul_zbkc64_or_zbc64, (TARGET_ZBKC || TARGET_ZBC) && TARGET_64BIT)
+AVAIL (clmulr_zbc32, TARGET_ZBC && !TARGET_64BIT)
+AVAIL (clmulr_zbc64, TARGET_ZBC && TARGET_64BIT)
 AVAIL (always, (!0))
 
 /* Construct a riscv_builtin_description from the given arguments.
diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
index 9fe5094ce1a..b92044dc6ff 100644
--- a/gcc/config/riscv/riscv-cmo.def
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -15,3 +15,13 @@ RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, RISCV
 // zicbop
 RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI, prefetchi32),
 RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE_DI, prefetchi64),
+
+// zbkc or zbc
+RISCV_B

Ping: [PATCH][ARM] MVE: Implementing auto-vectorized array * scalar instructions

2023-04-27 Thread Victor L. Do Nascimento via Gcc-patches

May I please ping this one??

https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612152.html

Many Thanks!

Victor

On 2/16/23 15:48, Victor L. Do Nascimento wrote:
> Hi all,
> 
> The back-end pattern for mapping the auto-vectorized representation of
> vector * scalar to to machine instruction VMUL was missing, and
> multiple instructions were needed to reproduce this behavior as a
> result of failed RTL pattern match in combine pass.
> 
> RTL patterns were introduced to reproduce the behavior of the
> intrinsics vmulq_n_ and vmulq_n_f.
> 
> In the case of literal constants, an intermediate instruction was
> added in to initial RTL expansion to ensure a general-purpose register
> was allocated to store the constant, which could then be be extracted
> from the constant vector.
> 
> For the function
> 
> void test_vmulimm_s32x4 (int32_t * __restrict__ dest, int32_t *a)
> {
>int i;
>for (i=0; i<4; i++) {
>  dest[i] = a[i] * 5;
>}
> }
> 
> 
> The GIMPLE -> RTL expansion is modified to produce:
> (set (reg:SI 119)
>   (const_int 5 [0x5]))
> (set (reg:V4SI 118)
>   (mult:V4SI (vec_duplicate:V4SI (reg:SI 119))
>  (reg:V4SI 117)))
> 
> instead of:
> (set (reg:V4SI 119)
>   (const_vector:V4SI [
>  (const_int 5 [0x5]) repeated x4
>]))
> (set (reg:V4SI 118)
>   (mult:V4SI (reg:V4SI 117)
>  (reg:V4SI 119)))
> 
> The end assembly for the above function introduces the emission of the 
> following insn:
> vmul.i32 q3, q3, r3
> 
> as opposed to:
> vmul.i32 q3, q3, q2
> 
> All tests in gcc.target/arm/simd/mve-vmul-scalar-1.c now pass.
> 
> Added new RTL templates, amended unit test and checked for regressions on 
> arm-none-eabi.
> 
> Thanks,
> Victor
> 
> gcc:
>   * gcc/config/arm/arm.cc (neon_vdup_constant): static keyword
>   removed.
>   * gcc/config/arm/arm-protos.h (neon_vdup_constant): prototype
>   added.
>   * gcc/config/arm/mve.md (@mve_vmulq_n_2): New.
>   * gcc/config/arm/predicates.md (reg_or_me_replicated_const_operand):
>   New.
>   * gcc/config/arm/vec-common.md (mul3): Modify to use
>   `reg_or_me_replicated_const_operand'.
> 
> testsuite:
>   * gcc.target/arm/simd/mve-vmul-scalar-1.c: Corrected typo,
>   xfails removed.
> ---
>   gcc/config/arm/arm-protos.h|  1 +
>   gcc/config/arm/arm.cc  |  2 +-
>   gcc/config/arm/mve.md  | 11 +++
>   gcc/config/arm/predicates.md   |  8 
>   gcc/config/arm/vec-common.md   | 14 --
>   .../gcc.target/arm/simd/mve-vmul-scalar-1.c| 13 ++---
>   6 files changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index aea472bfbb9..4cf9fb00e01 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -199,6 +199,7 @@ extern rtx arm_load_tp (rtx);
>   extern bool arm_coproc_builtin_available (enum unspecv);
>   extern bool arm_coproc_ldc_stc_legitimate_address (rtx);
>   extern rtx arm_stack_protect_tls_canary_mem (bool);
> +extern rtx neon_vdup_constant (rtx, bool);
>   
>   
>   #if defined TREE_CODE
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index efc48349dd3..7d9d265b0a7 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -13301,7 +13301,7 @@ neon_pairwise_reduce (rtx op0, rtx op1, machine_mode 
> mode,
>  If this is the case, and GENERATE is set, we also generate
>  instructions to do this and return an RTX to assign to the register.  */
>   
> -static rtx
> +rtx
>   neon_vdup_constant (rtx vals, bool generate)
>   {
> machine_mode mode = GET_MODE (vals);
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 555ad1b66c8..806c24e33aa 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1376,6 +1376,17 @@
> [(set_attr "type" "mve_move")
>   ])
>   
> +(define_insn "@mve_vmulq_n_2"
> +  [
> +   (set (match_operand:MVE_VLD_ST 0 "s_register_operand" "=w")
> + (mult:MVE_VLD_ST (vec_duplicate:MVE_VLD_ST (match_operand: 1 
> "s_register_operand" "r"))
> +(match_operand:MVE_VLD_ST 2 
> "s_register_operand" "w")))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vmul.%#\t%q0, %q2, %r1"
> +  [(set_attr "type" "mve_move")
> +])
> +
>   ;;
>   ;; [vmulq_u, vmulq_s])
>   ;;
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 3139750c606..31eadfa2d3b 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -113,6 +113,14 @@
> && neon_immediate_valid_for_logic (op, mode, 1, NULL, NULL));
>   })
>   
> +(define_predicate "reg_or_mve_replicated_const_operand"
> +  (if_then_else (and (match_test "TARGET_HAVE_MVE")
> +  (match_code "const_vector")
> +  (match_test "const_vec_duplicate_p (o

New French PO file for 'gcc' (version 13.1.0)

2023-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-13.1.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] gimple-range-op: Handle sqrt (basic bounds only)

2023-04-27 Thread Aldy Hernandez via Gcc-patches

Ok

On Thu, Apr 27, 2023, 15:30 Jakub Jelinek  wrote:

> Hi!
>
> The following patch adds sqrt support (but similarly to sincos, only
> dumb basic ranges only).
>
> Ok for trunk if it passes bootstrap/regtest?
>
> Will improve this incrementally and sin/cos as well.
>
> 2023-04-27  Jakub Jelinek  
>
> * gimple-range-op.cc (class cfn_sqrt): New type.
> (op_cfn_sqrt): New variable.
> (gimple_range_op_handler::maybe_builtin_call): Handle
> CASE_CFN_SQRT{,_FN}.
>
> * gcc.dg/tree-ssa/range-sqrt.c: New test.
>
> --- gcc/gimple-range-op.cc.jj   2023-04-27 11:57:09.865879982 +0200
> +++ gcc/gimple-range-op.cc  2023-04-27 15:15:05.089787859 +0200
> @@ -400,6 +400,83 @@ public:
>}
>  } op_cfn_copysign;
>
> +class cfn_sqrt : public range_operator_float
> +{
> +public:
> +  using range_operator_float::fold_range;
> +  using range_operator_float::op1_range;
> +  virtual bool fold_range (frange &r, tree type,
> +  const frange &lh, const frange &,
> +  relation_trio) const final override
> +  {
> +if (lh.undefined_p ())
> +  return false;
> +if (lh.known_isnan () || real_less (&lh.upper_bound (), &dconstm0))
> +  {
> +   r.set_nan (type);
> +   return true;
> +  }
> +unsigned bulps
> +  = targetm.libm_function_max_error (CFN_SQRT, TYPE_MODE (type),
> true);
> +if (bulps == ~0U)
> +  r.set_varying (type);
> +else if (bulps == 0)
> +  r.set (type, dconstm0, dconstinf);
> +else
> +  {
> +   REAL_VALUE_TYPE boundmin = dconstm0;
> +   while (bulps--)
> + frange_nextafter (TYPE_MODE (type), boundmin, dconstninf);
> +   r.set (type, boundmin, dconstinf);
> +  }
> +if (!lh.maybe_isnan () && !real_less (&lh.lower_bound (), &dconst0))
> +  r.clear_nan ();
> +return true;
> +  }
> +  virtual bool op1_range (frange &r, tree type,
> + const frange &lhs, const frange &,
> + relation_trio) const final override
> +  {
> +if (lhs.undefined_p ())
> +  return false;
> +
> +// A known NAN means the input is [-INF,-0.) U +-NAN.
> +if (lhs.known_isnan ())
> +  {
> +  known_nan:
> +   REAL_VALUE_TYPE ub = dconstm0;
> +   frange_nextafter (TYPE_MODE (type), ub, dconstninf);
> +   r.set (type, dconstninf, ub);
> +   // No r.flush_denormals_to_zero (); here - it is a reverse op.
> +   return true;
> +  }
> +
> +// Results outside of [-0.0, +Inf] are impossible.
> +const REAL_VALUE_TYPE &ub = lhs.upper_bound ();
> +if (real_less (&ub, &dconstm0))
> +  {
> +   if (!lhs.maybe_isnan ())
> + r.set_undefined ();
> +   else
> + // If lhs could be NAN and finite result is impossible,
> + // the range is like lhs.known_isnan () above.
> + goto known_nan;
> +   return true;
> +  }
> +
> +if (!lhs.maybe_isnan ())
> +  {
> +   // If NAN is not valid result, the input cannot include either
> +   // a NAN nor values smaller than -0.
> +   r.set (type, dconstm0, dconstinf, nan_state (false, false));
> +   return true;
> +  }
> +
> +r.set_varying (type);
> +return true;
> +  }
> +} op_cfn_sqrt;
> +
>  class cfn_sincos : public range_operator_float
>  {
>  public:
> @@ -961,6 +1038,13 @@ gimple_range_op_handler::maybe_builtin_c
>m_valid = true;
>break;
>
> +CASE_CFN_SQRT:
> +CASE_CFN_SQRT_FN:
> +  m_op1 = gimple_call_arg (call, 0);
> +  m_float = &op_cfn_sqrt;
> +  m_valid = true;
> +  break;
> +
>  CASE_CFN_SIN:
>  CASE_CFN_SIN_FN:
>m_op1 = gimple_call_arg (call, 0);
> --- gcc/testsuite/gcc.dg/tree-ssa/range-sqrt.c.jj   2023-04-27
> 15:10:09.285102144 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/range-sqrt.c  2023-04-27
> 15:12:01.478465821 +0200
> @@ -0,0 +1,41 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -fdump-tree-evrp -fno-thread-jumps" }
> +
> +#include 
> +
> +void use (double);
> +void link_error ();
> +
> +void
> +foo (double x)
> +{
> +  if (__builtin_isnan (x))
> +__builtin_unreachable ();
> +  x = sqrt (x);
> +  if (x < -0.0)
> +link_error ();
> +  use (x);
> +}
> +
> +void
> +bar (double x)
> +{
> +  if (!__builtin_isnan (sqrt (x)))
> +{
> +  if (__builtin_isnan (x))
> +   link_error ();
> +  if (x < -0.0)
> +   link_error ();
> +}
> +}
> +
> +void
> +stool (double x)
> +{
> +  double res1 = sqrt (x);
> +  double res2 = __builtin_sqrt (x);
> +  if (res1 < -0.0 || res2 < -0.0)
> +link_error ();
> +}
> +
> +// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { {
> *-*-linux* } && { glibc } } } } }
>
> Jakub
>
>

[PATCH v2] Docs: Add vector register constarint for asm operands

2023-04-27 Thread Kito Cheng via Gcc-patches

`vr`, `vm` and `vd` constarint for vector register constarint, those 3
constarint has implemented on LLVM as well.

gcc/ChangeLog:

* doc/md.texi (RISC-V): Add vr, vm, vd constarint.

---

V2 changes:
- Drop unrelated changes.

---
 gcc/doc/md.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 07bf8bdebffb..cc4a93a87638 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3535,6 +3535,15 @@ An address that is held in a general-purpose register.
 @item S
 A constraint that matches an absolute symbolic address.
 
+@item vr
+A vector register (if available)..
+
+@item vd
+A vector register, excluding v0 (if available).
+
+@item vm
+A vector register, only v0 (if available).
+
 @end table
 
 @item RX---@file{config/rx/constraints.md}
-- 
2.39.2

Re: [PATCH] Docs: Add vector register constarint for asm operands

2023-04-27 Thread Kito Cheng via Gcc-patches

Damm, I mixed something up else..


On Thu, Apr 27, 2023 at 4:53 PM Kito Cheng via Gcc-patches
 wrote:
>
> `vr`, `vm` and `vd` constarint for vector register constarint, those 3
> constarint has implemented on LLVM as well.
>
> gcc/ChangeLog:
>
> * doc/md.texi (RISC-V): Add vr, vm, vd constarint.
> ---
>  gcc/config/riscv/riscv-modes.def|  4 
>  gcc/config/riscv/riscv-protos.h |  2 ++
>  gcc/config/riscv/riscv-selftests.cc | 16 +---
>  gcc/config/riscv/riscv-v.cc | 12 
>  gcc/config/riscv/riscv.cc   |  2 ++
>  gcc/config/riscv/riscv.md   |  1 +
>  gcc/doc/md.texi |  9 +
>  7 files changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index b1669609eec4..4f9e5ed5f3e9 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -185,6 +185,10 @@ VECTOR_MODE_WITH_PREFIX (VNx, INT, QI, 1, 0);
>  ADJUST_NUNITS (VNx1QI, riscv_v_adjust_nunits (VNx1QImode, 1));
>  ADJUST_ALIGNMENT (VNx1QI, 1);
>
> +
> +/* VLS modes.  */
> +VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
> +
>  /* TODO: According to RISC-V 'V' ISA spec, the maximun vector length can
> be 65536 for a single vector register which means the vector mode in
> GCC can be maximum = 65536 * 8 bits (LMUL=8).
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 607ff6ea697b..28d9e4f5bb82 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -207,6 +207,8 @@ enum vlen_enum
>  bool slide1_sew64_helper (int, machine_mode, machine_mode,
>   machine_mode, rtx *);
>  rtx gen_avl_for_scalar_move (rtx);
> +machine_mode minimal_vls_mode (machine_mode);
> +machine_mode mask_mode(machine_mode);
>  }
>
>  /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-selftests.cc 
> b/gcc/config/riscv/riscv-selftests.cc
> index 1bf1a648fa1f..56c1260a64b1 100644
> --- a/gcc/config/riscv/riscv-selftests.cc
> +++ b/gcc/config/riscv/riscv-selftests.cc
> @@ -234,6 +234,16 @@ run_poly_int_selftests (void)
>  worklist);
>  }
>
> +static bool
> +vls_mode_p (machine_mode mode)
> +{
> +  if (!riscv_v_ext_vector_mode_p(mode))
> +return false;
> +  poly_int64 sz = GET_MODE_SIZE (mode);
> +  return sz.is_constant();
> +}
> +
> +
>  static void
>  run_const_vector_selftests (void)
>  {
> @@ -248,7 +258,7 @@ run_const_vector_selftests (void)
>
>FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
>  {
> -  if (riscv_v_ext_vector_mode_p (mode))
> +  if (riscv_v_ext_vector_mode_p (mode) && !vls_mode_p (mode))
> {
>   for (const HOST_WIDE_INT &val : worklist)
> {
> @@ -273,7 +283,7 @@ run_const_vector_selftests (void)
>
>FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_FLOAT)
>  {
> -  if (riscv_v_ext_vector_mode_p (mode))
> +  if (riscv_v_ext_vector_mode_p (mode) && !vls_mode_p (mode))
> {
>   scalar_mode inner_mode = GET_MODE_INNER (mode);
>   REAL_VALUE_TYPE f = REAL_VALUE_ATOF ("0.2928932", inner_mode);
> @@ -322,7 +332,7 @@ run_broadcast_selftests (void)
>  #define BROADCAST_TEST(MODE_CLASS)   
>   \
>FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) 
>   \
>  {
>   \
> -  if (riscv_v_ext_vector_mode_p (mode))  
>   \
> +  if (riscv_v_ext_vector_mode_p (mode) && !vls_mode_p(mode)) 
>   \
> { 
>  \
>   rtx_insn *insn; 
>  \
>   rtx src;
>  \
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 99c414cc9102..9d41da945290 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -742,4 +742,16 @@ gen_avl_for_scalar_move (rtx avl)
>  }
>  }
>
> +machine_mode
> +minimal_vls_mode (machine_mode)
> +{
> +  return VNx4SImode;
> +}
> +
> +machine_mode
> +mask_mode(machine_mode)
> +{
> +  return VNx4BImode;
> +}
> +
>  } // namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index a2d2dd0bb670..c2bebc42bff7 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -985,6 +985,8 @@ riscv_v_ext_vector_mode_p (machine_mode mode)
>switch (mode)
>  {
>  #include "riscv-vector-switch.def"
> +case E_V4SImode:
> +  return true;
>  default:
>return false;
>  }
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index dd845cc1ed33..05b41559ff8e 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/ri

[PATCH] gimple-range-op: Handle sqrt (basic bounds only)

2023-04-27 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch adds sqrt support (but similarly to sincos, only
dumb basic ranges only).

Ok for trunk if it passes bootstrap/regtest?

Will improve this incrementally and sin/cos as well.

2023-04-27  Jakub Jelinek  

* gimple-range-op.cc (class cfn_sqrt): New type.
(op_cfn_sqrt): New variable.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_CFN_SQRT{,_FN}.

* gcc.dg/tree-ssa/range-sqrt.c: New test.

--- gcc/gimple-range-op.cc.jj   2023-04-27 11:57:09.865879982 +0200
+++ gcc/gimple-range-op.cc  2023-04-27 15:15:05.089787859 +0200
@@ -400,6 +400,83 @@ public:
   }
 } op_cfn_copysign;
 
+class cfn_sqrt : public range_operator_float
+{
+public:
+  using range_operator_float::fold_range;
+  using range_operator_float::op1_range;
+  virtual bool fold_range (frange &r, tree type,
+  const frange &lh, const frange &,
+  relation_trio) const final override
+  {
+if (lh.undefined_p ())
+  return false;
+if (lh.known_isnan () || real_less (&lh.upper_bound (), &dconstm0))
+  {
+   r.set_nan (type);
+   return true;
+  }
+unsigned bulps
+  = targetm.libm_function_max_error (CFN_SQRT, TYPE_MODE (type), true);
+if (bulps == ~0U)
+  r.set_varying (type);
+else if (bulps == 0)
+  r.set (type, dconstm0, dconstinf);
+else
+  {
+   REAL_VALUE_TYPE boundmin = dconstm0;
+   while (bulps--)
+ frange_nextafter (TYPE_MODE (type), boundmin, dconstninf);
+   r.set (type, boundmin, dconstinf);
+  }
+if (!lh.maybe_isnan () && !real_less (&lh.lower_bound (), &dconst0))
+  r.clear_nan ();
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type,
+ const frange &lhs, const frange &,
+ relation_trio) const final override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+// A known NAN means the input is [-INF,-0.) U +-NAN.
+if (lhs.known_isnan ())
+  {
+  known_nan:
+   REAL_VALUE_TYPE ub = dconstm0;
+   frange_nextafter (TYPE_MODE (type), ub, dconstninf);
+   r.set (type, dconstninf, ub);
+   // No r.flush_denormals_to_zero (); here - it is a reverse op.
+   return true;
+  }
+
+// Results outside of [-0.0, +Inf] are impossible.
+const REAL_VALUE_TYPE &ub = lhs.upper_bound ();
+if (real_less (&ub, &dconstm0))
+  {
+   if (!lhs.maybe_isnan ())
+ r.set_undefined ();
+   else
+ // If lhs could be NAN and finite result is impossible,
+ // the range is like lhs.known_isnan () above.
+ goto known_nan;
+   return true;
+  }
+
+if (!lhs.maybe_isnan ())
+  {
+   // If NAN is not valid result, the input cannot include either
+   // a NAN nor values smaller than -0.
+   r.set (type, dconstm0, dconstinf, nan_state (false, false));
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_sqrt;
+
 class cfn_sincos : public range_operator_float
 {
 public:
@@ -961,6 +1038,13 @@ gimple_range_op_handler::maybe_builtin_c
   m_valid = true;
   break;
 
+CASE_CFN_SQRT:
+CASE_CFN_SQRT_FN:
+  m_op1 = gimple_call_arg (call, 0);
+  m_float = &op_cfn_sqrt;
+  m_valid = true;
+  break;
+
 CASE_CFN_SIN:
 CASE_CFN_SIN_FN:
   m_op1 = gimple_call_arg (call, 0);
--- gcc/testsuite/gcc.dg/tree-ssa/range-sqrt.c.jj   2023-04-27 
15:10:09.285102144 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/range-sqrt.c  2023-04-27 15:12:01.478465821 
+0200
@@ -0,0 +1,41 @@
+// { dg-do compile }
+// { dg-options "-O2 -fdump-tree-evrp -fno-thread-jumps" }
+
+#include 
+
+void use (double);
+void link_error ();
+
+void
+foo (double x)
+{
+  if (__builtin_isnan (x))
+__builtin_unreachable ();
+  x = sqrt (x);
+  if (x < -0.0)
+link_error ();
+  use (x);
+}
+
+void
+bar (double x)
+{
+  if (!__builtin_isnan (sqrt (x)))
+{
+  if (__builtin_isnan (x))
+   link_error ();
+  if (x < -0.0)
+   link_error ();
+}
+}
+
+void
+stool (double x)
+{
+  double res1 = sqrt (x);
+  double res2 = __builtin_sqrt (x);
+  if (res1 < -0.0 || res2 < -0.0)
+link_error ();
+}
+
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } } } }

Jakub

Re: harden-conditionals: detach values before compares

2023-04-27 Thread Richard Biener via Gcc-patches

On Thu, Apr 27, 2023 at 1:48 PM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> The optimization barriers inserted after compares enable GCC to derive
> information about the values from e.g. the taken paths, or the absence
> of exceptions.  Move them before the original compares, so that the
> reversed compares test copies of the original operands, without
> further optimizations.
>
> Regstrapped on x86_64-linux-gnu, and also bootstrapped with both passes
> enabled.  Further tested on multiple other targets with gcc-12.  Ok to
> install?

OK.

>
> for  gcc/ChangeLog
>
> * gimple-harden-conditionals.cc (insert_edge_check_and_trap):
> Move detach value calls...
> (pass_harden_conditional_branches::execute): ... here.
> (pass_harden_compares::execute): Detach values before
> compares.
>
> for  gcc/testsuite/ChangeLog
>
> * c-c++-common/torture/harden-cond-comp.c: New.
> ---
>  gcc/gimple-harden-conditionals.cc  |   25 
> 
>  .../c-c++-common/torture/harden-cond-comp.c|   24 +++
>  2 files changed, 39 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
>
> diff --git a/gcc/gimple-harden-conditionals.cc 
> b/gcc/gimple-harden-conditionals.cc
> index 78b8d5692d76f..2e5a42e9e71b1 100644
> --- a/gcc/gimple-harden-conditionals.cc
> +++ b/gcc/gimple-harden-conditionals.cc
> @@ -276,8 +276,8 @@ insert_check_and_trap (location_t loc, 
> gimple_stmt_iterator *gsip,
>  }
>
>  /* Split edge E, and insert_check_and_trap (see above) in the
> -   newly-created block, using detached copies of LHS's and RHS's
> -   values (see detach_value above) for the COP compare.  */
> +   newly-created block, using already-detached copies of LHS's and
> +   RHS's values (see detach_value above) for the COP compare.  */
>
>  static inline void
>  insert_edge_check_and_trap (location_t loc, edge e,
> @@ -301,10 +301,6 @@ insert_edge_check_and_trap (location_t loc, edge e,
>
>gimple_stmt_iterator gsik = gsi_after_labels (chk);
>
> -  bool same_p = (lhs == rhs);
> -  lhs = detach_value (loc, &gsik, lhs);
> -  rhs = same_p ? lhs : detach_value (loc, &gsik, rhs);
> -
>insert_check_and_trap (loc, &gsik, flags, cop, lhs, rhs);
>  }
>
> @@ -366,6 +362,12 @@ pass_harden_conditional_branches::execute (function *fun)
> /* ??? Can we do better?  */
> continue;
>
> +  /* Detach the values before the compares.  If we do so later,
> +the compiler may use values inferred from the compares.  */
> +  bool same_p = (lhs == rhs);
> +  lhs = detach_value (loc, &gsi, lhs);
> +  rhs = same_p ? lhs : detach_value (loc, &gsi, rhs);
> +
>insert_edge_check_and_trap (loc, EDGE_SUCC (bb, 0), cop, lhs, rhs);
>insert_edge_check_and_trap (loc, EDGE_SUCC (bb, 1), cop, lhs, rhs);
>  }
> @@ -508,6 +510,13 @@ pass_harden_compares::execute (function *fun)
>
>   tree rhs = copy_ssa_name (lhs);
>
> + /* Detach the values before the compares, so that the
> +compiler infers nothing from them, not even from a
> +throwing compare that didn't throw.  */
> + bool same_p = (op1 == op2);
> + op1 = detach_value (loc, &gsi, op1);
> + op2 = same_p ? op1 : detach_value (loc, &gsi, op2);
> +
>   gimple_stmt_iterator gsi_split = gsi;
>   /* Don't separate the original assignment from debug stmts
>  that might be associated with it, and arrange to split the
> @@ -529,10 +538,6 @@ pass_harden_compares::execute (function *fun)
>  gimple_bb (asgn)->index, nbb->index);
> }
>
> - bool same_p = (op1 == op2);
> - op1 = detach_value (loc, &gsi_split, op1);
> - op2 = same_p ? op1 : detach_value (loc, &gsi_split, op2);
> -
>   gassign *asgnck = gimple_build_assign (rhs, cop, op1, op2);
>   gimple_set_location (asgnck, loc);
>   gsi_insert_before (&gsi_split, asgnck, GSI_SAME_STMT);
> diff --git a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c 
> b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> new file mode 100644
> index 0..5aad890a1d3b6
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fharden-conditional-branches -fharden-compares 
> -fdump-tree-hardcbr -fdump-tree-hardcmp -ffat-lto-objects" } */
> +
> +int f(int i, int j) {
> +  if (i == 0)
> +return j != 0;
> +  else
> +return i * j != 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Splitting edge" 2 "hardcbr" } } */
> +/* { dg-final { scan-tree-dump-times "Adding reversed compare" 2 "hardcbr" } 
> } */
> +/* { dg-final { scan-tree-dump-times "__builtin_trap" 2 "hardcbr" } } */
> +
> +/* { dg-final { scan-tree-dump-times "Splitting block" 2 "hardcmp" } } */
> +/* { dg-final { scan-tree-dump-times "Adding reversed c

Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-04-27 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 27, 2023 at 12:10:40PM +, Richard Biener wrote:
> The following generalizes the range-op for __builtin_expect
> by using the fnspec machinery.
> 
> We've defered this to stage1 - bootstrapped and tested on 
> x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/109170
>   * gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call):
>   Handle __builtin_expect and similar via cfn_pass_through_arg1
>   and inspecting the calls fnspec.
>   * builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
>   and BUILT_IN_EXPECT_WITH_PROBABILITY.

Ok, thanks.

Jakub

[PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-04-27 Thread Richard Biener via Gcc-patches

The following generalizes the range-op for __builtin_expect
by using the fnspec machinery.

We've defered this to stage1 - bootstrapped and tested on 
x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/109170
* gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call):
Handle __builtin_expect and similar via cfn_pass_through_arg1
and inspecting the calls fnspec.
* builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
and BUILT_IN_EXPECT_WITH_PROBABILITY.
---
 gcc/builtins.cc|  2 ++
 gcc/gimple-range-op.cc | 19 +++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 878596c240a..bd07873a80e 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -11718,6 +11718,8 @@ builtin_fnspec (tree callee)
   case BUILT_IN_RETURN_ADDRESS:
return ".c";
   case BUILT_IN_ASSUME_ALIGNED:
+  case BUILT_IN_EXPECT:
+  case BUILT_IN_EXPECT_WITH_PROBABILITY:
return "1cX ";
   /* But posix_memalign stores a pointer into the memory pointed to
 by its first argument.  */
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index f7409e35a99..04e27d6aa05 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "range.h"
 #include "value-query.h"
 #include "gimple-range.h"
+#include "attr-fnspec.h"
 
 // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
 // on the statement.  For efficiency, it is an error to not pass in enough
@@ -984,14 +985,16 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_int = &op_cfn_parity;
   break;
 
-case CFN_BUILT_IN_EXPECT:
-case CFN_BUILT_IN_EXPECT_WITH_PROBABILITY:
-  m_valid = true;
-  m_op1 = gimple_call_arg (call, 0);
-  m_int = &op_cfn_pass_through_arg1;
-  break;
-
 default:
-  break;
+  {
+   unsigned arg;
+   if (gimple_call_fnspec (call).returns_arg (&arg) && arg == 0)
+ {
+   m_valid = true;
+   m_op1 = gimple_call_arg (call, 0);
+   m_int = &op_cfn_pass_through_arg1;
+ }
+   break;
+  }
 }
 }
-- 
2.35.3

Re: [PATCH] v2: Implement range-op entry for sin/cos

2023-04-27 Thread Aldy Hernandez via Gcc-patches





On 4/27/23 13:53, Jakub Jelinek wrote:

On Thu, Apr 27, 2023 at 01:46:19PM +0200, Aldy Hernandez wrote:

On 4/27/23 13:13, Jakub Jelinek wrote:


+unsigned bulps = targetm.libm_function_max_error (m_cfn, TYPE_MODE (type),
+ true);
+if (bulps == ~0U)
+  r.set_varying (type);
+else if (bulps == 0)
+  r.set (type, dconstm1, dconst1);
+else
+  {
+   REAL_VALUE_TYPE boundmin, boundmax;
+   boundmax = dconst1;
+   while (bulps--)
+ frange_nextafter (TYPE_MODE (type), boundmax, dconstinf);
+   real_arithmetic (&boundmin, NEGATE_EXPR, &boundmax, NULL);
+   r.set (type, boundmin, boundmax);
+  }


This seems like something we'll do over and over for other operations,
right?  If so, could you abstract it into a separate function?


Not easily.  E.g. take the difference between sin/cos, where the above
grows the interval on both sides by bulps, vs. what I'm working on right now
(sqrt), where it grows in one direction only (from dconstm0 toward
dconstninf).  There could be other functions which only grow the positive
bound.


Fair enough.

Then OK.

Thanks for working on this.
Aldy

[COMMITTED] Normalize addresses in IPA before calling range_op_handler [PR109639]

2023-04-27 Thread Aldy Hernandez via Gcc-patches

The old legacy code would allow building ranges containing symbolics,
even though the entire ranger ecosystem does not handle them.  These
were normalized into non-zero ranges by helper functions in VRP
(range_fold_*_expr) before calling the ranger.  The only users of
these functions should have been legacy VRP, which is no more.
However, a handful of users crept into IPA, even though these
functions shouldn't never been called outside of VRP or vr-values.

The issue here is that IPA is building a range of [&foo, &foo] and
expecting range_fold_binary to normalize it to non-zero.  Fixed by
adding a helper function before calling the range_op handler.

I think these covers the problematic ranges.  If not, I'll come up
with something more generalized that does not involve polluting
irange::set with the normalization code.  After all, this only
involves a handful of IPA places.

I've also added an assert in irange::set() making it easier to detect
any possible fallout without having to drill deep into the setter.

gcc/ChangeLog:

PR tree-optimization/109639
* ipa-cp.cc (ipa_value_range_from_jfunc): Normalize range.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.h (ipa_range_set_and_normalize): New.
* value-range.cc (irange::set): Assert min and max are INTEGER_CST.
---
 gcc/ipa-cp.cc|  8 ++--
 gcc/ipa-fnsummary.cc | 10 +-
 gcc/ipa-prop.h   | 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr109639.c | 20 
 gcc/testsuite/gcc.dg/tree-ssa/pr109643.c | 18 ++
 gcc/value-range.cc   |  3 +++
 6 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109639.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109643.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 9ec86d77992..a5b45a8e6b9 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1963,9 +1963,11 @@ ipa_value_range_from_jfunc (ipa_node_params *info, 
cgraph_edge *cs,
{
  value_range op_res, res;
  tree op = ipa_get_jf_pass_through_operand (jfunc);
- value_range op_vr (op, op);
+ value_range op_vr;
  range_op_handler handler (operation, vr_type);
 
+ ipa_range_set_and_normalize (op_vr, op);
+
  if (!handler
  || !op_res.supports_type_p (vr_type)
  || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
@@ -2757,10 +2759,12 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
   else if (!ipa_edge_within_scc (cs))
{
  tree op = ipa_get_jf_pass_through_operand (jfunc);
- value_range op_vr (op, op);
+ value_range op_vr;
  value_range op_res,res;
  range_op_handler handler (operation, operand_type);
 
+ ipa_range_set_and_normalize (op_vr, op);
+
  if (!handler
  || !op_res.supports_type_p (operand_type)
  || !handler.fold_range (op_res, operand_type,
diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 48093a8b623..b328bb8ce14 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -500,9 +500,11 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
}
  else if (!op->val[1])
{
- value_range op0 (op->val[0], op->val[0]);
+ value_range op0;
  range_op_handler handler (op->code, op->type);
 
+ ipa_range_set_and_normalize (op0, op->val[0]);
+
  if (!handler
  || !res.supports_type_p (op->type)
  || !handler.fold_range (res, op->type,
@@ -518,12 +520,10 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
{
  value_range res;
  value_range val_vr;
- if (TREE_CODE (c->val) == INTEGER_CST)
-   val_vr.set (c->val, c->val);
- else
-   val_vr.set_varying (TREE_TYPE (c->val));
  range_op_handler handler (c->code, boolean_type_node);
 
+ ipa_range_set_and_normalize (val_vr, c->val);
+
  if (!handler
  || !res.supports_type_p (boolean_type_node)
  || !handler.fold_range (res, boolean_type_node, vr, 
val_vr))
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 7eb5c8f44ea..93785a6a8e6 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -1201,4 +1201,18 @@ tree build_ref_for_offset (location_t, tree, poly_int64, 
bool, tree,
 /* In ipa-cp.cc  */
 void ipa_cp_cc_finalize (void);
 
+/* Set R to the range of [VAL, VAL] while normalizing addresses to
+   non-zero.  */
+
+inline void
+ipa_range_set_and_normalize (irange &r, tree val)
+{
+  if (T

[FYI] Use CONFIG_SHELL-/bin/sh in genmultilib

2023-04-27 Thread Alexandre Oliva via Gcc-patches



There are still shells on some systems that lack the ability to start
scripts when not using the shell name explicitly.  Adjust genmultilib
to use ${CONFIG_SHELL-/bin/sh} the same way configure does.

Regstrapped on x86_64-linux-gnu.  Also built riscv64-elf on an affected
platform.  I'm checking this in.


for  gcc/ChangeLog

* genmultilib: Use CONFIG_SHELL to run sub-scripts.
---
 gcc/genmultilib |   30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/gcc/genmultilib b/gcc/genmultilib
index f8bd90b116cd1..6683c3490266f 100644
--- a/gcc/genmultilib
+++ b/gcc/genmultilib
@@ -170,23 +170,23 @@ if [ "$#" != "0" ]; then
 all=${initial}`echo $first | sed -e 's_|_/_'g`
 first=`echo $first | sed -e 's_|_ _'g`
 echo ${all}/
-initial="${initial}${all}/" ./tmpmultilib $@
-./tmpmultilib $first $@ | grep -v "^${all}"
+initial="${initial}${all}/" ${CONFIG_SHELL-/bin/sh} ./tmpmultilib $@
+${CONFIG_SHELL-/bin/sh} ./tmpmultilib $first $@ | grep -v "^${all}"
 ;;
   *)
 for opt in `echo $first | sed -e 's|/| |'g`; do
   echo ${initial}${opt}/
 done
-./tmpmultilib $@
+${CONFIG_SHELL-/bin/sh} ./tmpmultilib $@
 for opt in `echo $first | sed -e 's|/| |'g`; do
-  initial="${initial}${opt}/" ./tmpmultilib $@
+  initial="${initial}${opt}/" ${CONFIG_SHELL-/bin/sh} ./tmpmultilib $@
 done
   esac
 fi
 EOF
 chmod +x tmpmultilib
 
-combinations=`initial=/ ./tmpmultilib ${options}`
+combinations=`initial=/ ${CONFIG_SHELL-/bin/sh} ./tmpmultilib ${options}`
 
 # If there exceptions, weed them out now
 if [ -n "${exceptions}" ]; then
@@ -210,7 +210,7 @@ cat >>tmpmultilib2 <<\EOF
   done
 EOF
   chmod +x tmpmultilib2
-  combinations=`./tmpmultilib2 ${combinations}`
+  combinations=`${CONFIG_SHELL-/bin/sh} ./tmpmultilib2 ${combinations}`
 fi
 
 # If the MULTILIB_REQUIRED list are provided,
@@ -236,7 +236,7 @@ cat >>tmpmultilib2 <<\EOF
 EOF
 
chmod +x tmpmultilib2
-   combinations=`./tmpmultilib2 ${combinations}`
+   combinations=`${CONFIG_SHELL-/bin/sh} ./tmpmultilib2 ${combinations}`
 
 fi
 
@@ -348,12 +348,12 @@ if [ "$#" = "0" ]; then
 else
   first=$1
   shift
-  dirout="${dirout}" optout="${optout}" ./tmpmultilib2 $@
+  dirout="${dirout}" optout="${optout}" ${CONFIG_SHELL-/bin/sh} ./tmpmultilib2 
$@
   l=`echo ${first} | sed -e 's/=.*$//' -e 's/?/=/g'`
   r=`echo ${first} | sed -e 's/^.*=//' -e 's/?/=/g'`
   if expr " ${optout} " : ".* ${l} .*" > /dev/null; then
 newopt=`echo " ${optout} " | sed -e "s/ ${l} / ${r} /" -e 's/^ //' -e 's/ 
$//'`
-dirout="${dirout}" optout="${newopt}" ./tmpmultilib2 $@
+dirout="${dirout}" optout="${newopt}" ${CONFIG_SHELL-/bin/sh} 
./tmpmultilib2 $@
   fi
 fi
 EOF
@@ -453,14 +453,14 @@ chmod +x tmpmultilib4
 # correct list of options and negations.
 for combo in ${combinations}; do
   # Use the directory names rather than the option names.
-  dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" 
"${enable_multilib}"`
+  dirout=`${CONFIG_SHELL-/bin/sh} ./tmpmultilib3 "${combo}" "${todirnames}" 
"${toosdirnames}" "${enable_multilib}"`
 
   # Look through the options.  We must output each option that is
   # present, and negate each option that is not present.
-  optout=`./tmpmultilib4 "${combo}" "${options}"`
+  optout=`${CONFIG_SHELL-/bin/sh} ./tmpmultilib4 "${combo}" "${options}"`
 
   # Output the line with all appropriate matches.
-  dirout="${dirout}" optout="${optout}" ./tmpmultilib2
+  dirout="${dirout}" optout="${optout}" ${CONFIG_SHELL-/bin/sh} ./tmpmultilib2
 done
 
 # Terminate the list of string.
@@ -491,11 +491,11 @@ for rrule in ${multilib_reuse}; do
   if expr "${combinations} " : ".*/${combo}/.*" > /dev/null; then
 if echo "/${copts}/" | grep -E "${options_re}" > /dev/null; then
   combo="/${combo}/"
-  dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" 
"${enable_multilib}"`
+  dirout=`${CONFIG_SHELL-/bin/sh} ./tmpmultilib3 "${combo}" 
"${todirnames}" "${toosdirnames}" "${enable_multilib}"`
   copts="/${copts}/"
-  optout=`./tmpmultilib4 "${copts}" "${options}"`
+  optout=`${CONFIG_SHELL-/bin/sh} ./tmpmultilib4 "${copts}" "${options}"`
   # Output the line with all appropriate matches.
-  dirout="${dirout}" optout="${optout}" ./tmpmultilib2
+  dirout="${dirout}" optout="${optout}" ${CONFIG_SHELL-/bin/sh} 
./tmpmultilib2
 else
   echo "The rule ${rrule} contains an option absent from 
MULTILIB_OPTIONS." >&2
   exit 1

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] v2: Implement range-op entry for sin/cos

2023-04-27 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 27, 2023 at 01:46:19PM +0200, Aldy Hernandez wrote:
> On 4/27/23 13:13, Jakub Jelinek wrote:
> 
> > +unsigned bulps = targetm.libm_function_max_error (m_cfn, TYPE_MODE 
> > (type),
> > + true);
> > +if (bulps == ~0U)
> > +  r.set_varying (type);
> > +else if (bulps == 0)
> > +  r.set (type, dconstm1, dconst1);
> > +else
> > +  {
> > +   REAL_VALUE_TYPE boundmin, boundmax;
> > +   boundmax = dconst1;
> > +   while (bulps--)
> > + frange_nextafter (TYPE_MODE (type), boundmax, dconstinf);
> > +   real_arithmetic (&boundmin, NEGATE_EXPR, &boundmax, NULL);
> > +   r.set (type, boundmin, boundmax);
> > +  }
> 
> This seems like something we'll do over and over for other operations,
> right?  If so, could you abstract it into a separate function?

Not easily.  E.g. take the difference between sin/cos, where the above
grows the interval on both sides by bulps, vs. what I'm working on right now
(sqrt), where it grows in one direction only (from dconstm0 toward
dconstninf).  There could be other functions which only grow the positive
bound.

Jakub

harden-conditionals: detach values before compares

2023-04-27 Thread Alexandre Oliva via Gcc-patches



The optimization barriers inserted after compares enable GCC to derive
information about the values from e.g. the taken paths, or the absence
of exceptions.  Move them before the original compares, so that the
reversed compares test copies of the original operands, without
further optimizations.

Regstrapped on x86_64-linux-gnu, and also bootstrapped with both passes
enabled.  Further tested on multiple other targets with gcc-12.  Ok to
install?


for  gcc/ChangeLog

* gimple-harden-conditionals.cc (insert_edge_check_and_trap):
Move detach value calls...
(pass_harden_conditional_branches::execute): ... here.
(pass_harden_compares::execute): Detach values before
compares.

for  gcc/testsuite/ChangeLog

* c-c++-common/torture/harden-cond-comp.c: New.
---
 gcc/gimple-harden-conditionals.cc  |   25 
 .../c-c++-common/torture/harden-cond-comp.c|   24 +++
 2 files changed, 39 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cond-comp.c

diff --git a/gcc/gimple-harden-conditionals.cc 
b/gcc/gimple-harden-conditionals.cc
index 78b8d5692d76f..2e5a42e9e71b1 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -276,8 +276,8 @@ insert_check_and_trap (location_t loc, gimple_stmt_iterator 
*gsip,
 }
 
 /* Split edge E, and insert_check_and_trap (see above) in the
-   newly-created block, using detached copies of LHS's and RHS's
-   values (see detach_value above) for the COP compare.  */
+   newly-created block, using already-detached copies of LHS's and
+   RHS's values (see detach_value above) for the COP compare.  */
 
 static inline void
 insert_edge_check_and_trap (location_t loc, edge e,
@@ -301,10 +301,6 @@ insert_edge_check_and_trap (location_t loc, edge e,
 
   gimple_stmt_iterator gsik = gsi_after_labels (chk);
 
-  bool same_p = (lhs == rhs);
-  lhs = detach_value (loc, &gsik, lhs);
-  rhs = same_p ? lhs : detach_value (loc, &gsik, rhs);
-
   insert_check_and_trap (loc, &gsik, flags, cop, lhs, rhs);
 }
 
@@ -366,6 +362,12 @@ pass_harden_conditional_branches::execute (function *fun)
/* ??? Can we do better?  */
continue;
 
+  /* Detach the values before the compares.  If we do so later,
+the compiler may use values inferred from the compares.  */
+  bool same_p = (lhs == rhs);
+  lhs = detach_value (loc, &gsi, lhs);
+  rhs = same_p ? lhs : detach_value (loc, &gsi, rhs);
+
   insert_edge_check_and_trap (loc, EDGE_SUCC (bb, 0), cop, lhs, rhs);
   insert_edge_check_and_trap (loc, EDGE_SUCC (bb, 1), cop, lhs, rhs);
 }
@@ -508,6 +510,13 @@ pass_harden_compares::execute (function *fun)
 
  tree rhs = copy_ssa_name (lhs);
 
+ /* Detach the values before the compares, so that the
+compiler infers nothing from them, not even from a
+throwing compare that didn't throw.  */
+ bool same_p = (op1 == op2);
+ op1 = detach_value (loc, &gsi, op1);
+ op2 = same_p ? op1 : detach_value (loc, &gsi, op2);
+
  gimple_stmt_iterator gsi_split = gsi;
  /* Don't separate the original assignment from debug stmts
 that might be associated with it, and arrange to split the
@@ -529,10 +538,6 @@ pass_harden_compares::execute (function *fun)
 gimple_bb (asgn)->index, nbb->index);
}
 
- bool same_p = (op1 == op2);
- op1 = detach_value (loc, &gsi_split, op1);
- op2 = same_p ? op1 : detach_value (loc, &gsi_split, op2);
-
  gassign *asgnck = gimple_build_assign (rhs, cop, op1, op2);
  gimple_set_location (asgnck, loc);
  gsi_insert_before (&gsi_split, asgnck, GSI_SAME_STMT);
diff --git a/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c 
b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
new file mode 100644
index 0..5aad890a1d3b6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/harden-cond-comp.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-fharden-conditional-branches -fharden-compares 
-fdump-tree-hardcbr -fdump-tree-hardcmp -ffat-lto-objects" } */
+
+int f(int i, int j) {
+  if (i == 0)
+return j != 0;
+  else
+return i * j != 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Splitting edge" 2 "hardcbr" } } */
+/* { dg-final { scan-tree-dump-times "Adding reversed compare" 2 "hardcbr" } } 
*/
+/* { dg-final { scan-tree-dump-times "__builtin_trap" 2 "hardcbr" } } */
+
+/* { dg-final { scan-tree-dump-times "Splitting block" 2 "hardcmp" } } */
+/* { dg-final { scan-tree-dump-times "Adding reversed compare" 2 "hardcmp" } } 
*/
+/* { dg-final { scan-tree-dump-times "__builtin_trap" 4 "hardcmp" } } */
+
+/* Check that the optimization barrier is placed before the original compare.  
*/
+/* { dg-final { scan-tree-dump-times {__asm__[(]"" : "=g" _[0-9]* : "0" 
i_[0-9]*[(]D[)][)][;][\n][ ]*if

Re: [PATCH] v2: Implement range-op entry for sin/cos

2023-04-27 Thread Aldy Hernandez via Gcc-patches





On 4/27/23 13:13, Jakub Jelinek wrote:


+unsigned bulps = targetm.libm_function_max_error (m_cfn, TYPE_MODE (type),
+ true);
+if (bulps == ~0U)
+  r.set_varying (type);
+else if (bulps == 0)
+  r.set (type, dconstm1, dconst1);
+else
+  {
+   REAL_VALUE_TYPE boundmin, boundmax;
+   boundmax = dconst1;
+   while (bulps--)
+ frange_nextafter (TYPE_MODE (type), boundmax, dconstinf);
+   real_arithmetic (&boundmin, NEGATE_EXPR, &boundmax, NULL);
+   r.set (type, boundmin, boundmax);
+  }


This seems like something we'll do over and over for other operations, 
right?  If so, could you abstract it into a separate function?


Thanks.
Aldy

Re: [PATCH] Add support for vrange streaming.

2023-04-27 Thread Aldy Hernandez via Gcc-patches

Thanks. I will put it aside until I start posting the IPA patches.

Aldy

On Thu, Apr 27, 2023, 13:02 Richard Biener 
wrote:

> On Tue, Apr 18, 2023 at 2:48 PM Aldy Hernandez  wrote:
> >
> >
> >
> > On 4/18/23 11:06, Aldy Hernandez wrote:
> > > I think it's time for the ranger folk to start owning range streaming
> > > instead of passes (IPA, etc) doing their own thing.  I have plans for
> > > overhauling the IPA code later this cycle to support generic ranges,
> > > and I'd like to start cleaning up the streaming and hashing interface.
> > >
> > > This patch adds generic streaming support for vrange.
> > >
> > > I'd appreciate another set of eyes.
> > >
> > > Thoughts?
> >
> > We recently added support for querying and storing an frange's NAN
> > without the need to be friends with the class.
> >
> > Adjusted patch in testing...
>
> I think this is reasonable once you find use for it.
>
> Thanks,
> Richard.
>
> > Aldy
>
>

[PATCH] wrong GIMPLE from (bit_field_ref CTOR ..) simplification

2023-04-27 Thread Richard Biener via Gcc-patches

When we simplify a BIT_FIELD_REF of a CTOR like { _1, _2, _3, _4 }
and attempt to produce (view converted) { _1, _2 } for a selected
subset we fail to realize this cannot be done from match.pd since
we have no way to write the resulting CTOR "operation" and the
built CTOR { _1, _2 } isn't a GIMPLE value.

This kind of simplifications have to be done in forwprop (or would
need a match.pd syntax extension) where we can split out the CTOR
to a separate stmt.

The following disables this particular simplification when we are
simplifying GIMPLE.  With enhanced IL checking this otherwise
causes ICEs in the testsuite from vectorized code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* match.pd (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2): Do not
create a CTOR operand in the result when simplifying GIMPLE.
---
 gcc/match.pd | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 34e1a5c1b46..c4320781f5b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7456,10 +7456,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 ? type
 : build_vector_type (TREE_TYPE (TREE_TYPE (ctor)),
  count * k));
+ /* We used to build a CTOR in the non-constant case here
+but that's not a GIMPLE value.  We'd have to expose this
+operation somehow so the code generation can properly
+split it out to a separate stmt.  */
  res = (constant_p ? build_vector_from_ctor (evtype, vals)
-: build_constructor (evtype, vals));
+: (GIMPLE ? NULL_TREE : build_constructor (evtype, vals)));
}
-   (view_convert { res; }))
+   (if (res)
+(view_convert { res; })))
   /* The bitfield references a single constructor element.  */
   (if (k.is_constant (&const_k)
   && idx + n <= (idx / const_k + 1) * const_k)
-- 
2.35.3

[PATCH] Properly gimplify handled component chains on registers

2023-04-27 Thread Richard Biener via Gcc-patches

When for example complex lowering wants to extract the imaginary
part of a complex variable for lowering a complex move we can
end up with it generating __imag  > which
is valid GENERIC.  It then feeds that to the gimplifier via
force_gimple_operand but that fails to split up this chain
of handled components, generating invalid GIMPLE catched by
verification when PR109644 is fixed.

The following rectifies this by noting in gimplify_compound_lval
when the base object which we gimplify first ends up being a
register.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* gimplify.cc (gimplify_compound_lval): When the base
gimplified to a register make sure to split up chains
of operations.
---
 gcc/gimplify.cc | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index b0bdd1b46cd..9755f79fb2d 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3281,15 +3281,21 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
   if (need_non_reg && (fallback & fb_rvalue))
 prepare_gimple_addressable (p, pre_p);
 
+
   /* Step 3: gimplify size expressions and the indices and operands of
- ARRAY_REF.  During this loop we also remove any useless conversions.  */
+ ARRAY_REF.  During this loop we also remove any useless conversions.
+ If we operate on a register also make sure to properly gimplify
+ to individual operations.  */
 
+  bool reg_operations = is_gimple_reg (*p);
   for (; expr_stack.length () > 0; )
 {
   tree t = expr_stack.pop ();
 
   if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
{
+ gcc_assert (!reg_operations);
+
  /* Gimplify the low bound and element type size. */
  tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p, post_p,
is_gimple_reg, fb_rvalue);
@@ -3306,10 +3312,18 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
}
   else if (TREE_CODE (t) == COMPONENT_REF)
{
+ gcc_assert (!reg_operations);
+
  tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p, post_p,
is_gimple_reg, fb_rvalue);
  ret = MIN (ret, tret);
}
+  else if (reg_operations)
+   {
+ tret = gimplify_expr (&TREE_OPERAND (t, 0), pre_p, post_p,
+   is_gimple_val, fb_rvalue);
+ ret = MIN (ret, tret);
+   }
 
   STRIP_USELESS_TYPE_CONVERSION (TREE_OPERAND (t, 0));
 
-- 
2.35.3

[PATCH] doc: Describe behaviour of enums with fixed underlying type

2023-04-27 Thread Jonathan Wakely via Gcc-patches

C2x adds the ability to give an enumeration type a fixed underlying
type, as C++ already has. The -fshort-enums option alters the compiler's
choice of underlying type, but when it's fixed the compiler can't
choose.

Similarly for C++ -fstrict-enums has no effect with a fixed underlying
type, because every value of the underlying type is a valid value of the
enumeration type.

This caused confusion recently: https://gcc.gnu.org/PR109532

OK for trunk?

-- >8 --

gcc/ChangeLog:

* doc/invoke.texi (Code Gen Options): Note that -fshort-enums
is ignored for a fixed underlying type.
(C++ Dialect Options): Likewise for -fstrict-enums.
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2f40c58b21c..0f91464f8c0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3495,6 +3495,8 @@ defined in the C++ standard; basically, a value that can 
be
 represented in the minimum number of bits needed to represent all the
 enumerators).  This assumption may not be valid if the program uses a
 cast to convert an arbitrary integer value to the enumerated type.
+This option has no effect for an enumeration type with a fixed underlying
+type.
 
 @opindex fstrong-eval-order
 @item -fstrong-eval-order
@@ -18303,6 +18305,8 @@ Use it to conform to a non-default application binary 
interface.
 Allocate to an @code{enum} type only as many bytes as it needs for the
 declared range of possible values.  Specifically, the @code{enum} type
 is equivalent to the smallest integer type that has enough room.
+This option has no effect for an enumeration type with a fixed underlying
+type.
 
 @strong{Warning:} the @option{-fshort-enums} switch causes GCC to generate
 code that is not binary compatible with code generated without that switch.
-- 
2.40.0

Re: [PATCH] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-04-27 Thread Richard Biener via Gcc-patches

On Wed, Apr 26, 2023 at 9:36 AM liuhongt via Gcc-patches
 wrote:
>
> Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
> intermediate integer type whenever gimple ranger can tell it's safe.
>
> .i.e.
> When there's no direct optab for vector long long -> vector float, but
> the value range of integer can be represented as int, try vector int
> -> vector float if availble.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/108804
> * tree-vect-patterns.cc (vect_get_range_info): Remove static.
> * tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
> Add new parameter last_stmt_p.
> (vectorizable_conversion): Enhance NARROW FLOAT_EXPR
> vectorization by truncating to lower precision.
> * tree-vectorizer.h (vect_get_range_info): New declare.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr108804.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr108804.c |  15 
>  gcc/tree-vect-patterns.cc|   2 +-
>  gcc/tree-vect-stmts.cc   | 106 ++-
>  gcc/tree-vectorizer.h|   1 +
>  4 files changed, 100 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108804.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr108804.c 
> b/gcc/testsuite/gcc.target/i386/pr108804.c
> new file mode 100644
> index 000..2a43c1e1848
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr108804.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -Ofast -fdump-tree-vect-details" } */
> +/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 1 "vect" } } 
> */
> +
> +typedef unsigned long long uint64_t;
> +uint64_t d[512];
> +float f[1024];
> +
> +void foo() {
> +for (int i=0; i<512; ++i) {
> +uint64_t k = d[i];
> +f[i]=(k & 0x3F30);
> +}
> +}
> +
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index a49b0953977..dd546b488a4 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -61,7 +61,7 @@ along with GCC; see the file COPYING3.  If not see
>  /* Return true if we have a useful VR_RANGE range for VAR, storing it
> in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
>
> -static bool
> +bool
>  vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
>  {
>value_range vr;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6b7dbfd4a23..d79a1409d24 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "internal-fn.h"
>  #include "tree-vector-builder.h"
>  #include "vec-perm-indices.h"
> +#include "gimple-range.h"
>  #include "tree-ssa-loop-niter.h"
>  #include "gimple-fold.h"
>  #include "regs.h"
> @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
> vec *vec_oprnds,
>stmt_vec_info stmt_info,
>vec &vec_dsts,
>gimple_stmt_iterator *gsi,
> -  slp_tree slp_node, enum tree_code code)
> +  slp_tree slp_node, enum tree_code code,
> +  bool last_stmt_p)

Can you please document this new parameter?

>  {
>unsigned int i;
>tree vop0, vop1, new_tmp, vec_dest;
> @@ -4815,9 +4817,9 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
> vec *vec_oprnds,
>new_tmp = make_ssa_name (vec_dest, new_stmt);
>gimple_assign_set_lhs (new_stmt, new_tmp);
>vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> -
> -  if (multi_step_cvt)
> -   /* Store the resulting vector for next recursive call.  */
> +  if (multi_step_cvt || !last_stmt_p)
> +   /* Store the resulting vector for next recursive call,
> +  or return the resulting vector_tmp for NARROW FLOAT_EXPR.  */
> (*vec_oprnds)[i/2] = new_tmp;
>else
> {
> @@ -4843,7 +4845,8 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
> vec *vec_oprnds,
>vect_create_vectorized_demotion_stmts (vinfo, vec_oprnds,
>  multi_step_cvt - 1,
>  stmt_info, vec_dsts, gsi,
> -slp_node, VEC_PACK_TRUNC_EXPR);
> +slp_node, VEC_PACK_TRUNC_EXPR,
> +last_stmt_p);
>  }
>
>vec_dsts.quick_push (vec_dest);
> @@ -5248,22 +5251,53 @@ vectorizable_conversion (vec_info *vinfo,
>&interm_types))
> break;
>
> -  if (code != FIX_TRUNC_EXPR
> - || GET_MODE_SIZE

[PATCH] v2: Implement range-op entry for sin/cos

2023-04-27 Thread Jakub Jelinek via Gcc-patches

Hi!

On Tue, Apr 18, 2023 at 03:12:50PM +0200, Aldy Hernandez wrote:
> [I don't know why I keep poking at floats.  I must really like the pain.
> Jakub, are you OK with this patch for trunk?]
> 
> This is the range-op entry for sin/cos.  It is meant to serve as an
> example of what we can do for glibc math functions.  It is by no means
> exhaustive, just a stub to restrict the return range from sin/cos to
> [-1.0, 1.0] with appropriate smarts of NANs.
> 
> As can be seen in the testcase, we see sin() as well as
> __builtin_sin() in the IL, and can resolve the resulting range
> accordingly.
> 
> gcc/ChangeLog:
> 
>   * gimple-range-op.cc (class cfn_sincos): New.
>   (gimple_range_op_handler::maybe_builtin_call): Add case for sin/cos.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/range-sincos.c: New test.

Here is an updated version of the patch on top of the
v2: Add targetm.libm_function_max_error
patch with all my comments incorporated into your patch (but still no
handling of sin/cos ranges shorter than 2*M_PI).

So far tested on the new testcase with
RUNTESTFLAGS='--target_board=unix\{,-frounding-math} 
tree-ssa.exp=range-sincos.c'
where expectedly without -frounding-math it PASSes and with it fails
(as the hooks return 4ulps in that case for the boundaries).

Ok for trunk if it passes bootstrap/regtest?

I'll defer tweaks to frange_nextafter for ulps or whatever you want to do
with it.

2023-04-27  Aldy Hernandez  
Jakub Jelinek  

* value-range.h (frange_nextafter): Declare.
* gimple-range-op.cc (class cfn_sincos): New.
(op_cfn_sin, op_cfn_cos): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_CFN_{SIN,COS}{,_FN}.

* gcc.dg/tree-ssa/range-sincos.c: New test.

--- gcc/value-range.h.jj2023-04-27 10:17:46.504485376 +0200
+++ gcc/value-range.h   2023-04-27 12:07:09.891147869 +0200
@@ -1388,4 +1388,7 @@ frange::nan_signbit_p (bool &signbit) co
   return true;
 }
 
+void frange_nextafter (enum machine_mode, REAL_VALUE_TYPE &,
+  const REAL_VALUE_TYPE &);
+
 #endif // GCC_VALUE_RANGE_H
--- gcc/gimple-range-op.cc.jj   2023-04-22 10:23:26.942812958 +0200
+++ gcc/gimple-range-op.cc  2023-04-27 11:57:09.865879982 +0200
@@ -400,6 +400,89 @@ public:
   }
 } op_cfn_copysign;
 
+class cfn_sincos : public range_operator_float
+{
+public:
+  using range_operator_float::fold_range;
+  using range_operator_float::op1_range;
+  cfn_sincos (combined_fn cfn) { m_cfn = cfn; }
+  virtual bool fold_range (frange &r, tree type,
+  const frange &lh, const frange &,
+  relation_trio) const final override
+  {
+if (lh.undefined_p ())
+  return false;
+if (lh.known_isnan () || lh.known_isinf ())
+  {
+   r.set_nan (type);
+   return true;
+  }
+unsigned bulps = targetm.libm_function_max_error (m_cfn, TYPE_MODE (type),
+ true);
+if (bulps == ~0U)
+  r.set_varying (type);
+else if (bulps == 0)
+  r.set (type, dconstm1, dconst1);
+else
+  {
+   REAL_VALUE_TYPE boundmin, boundmax;
+   boundmax = dconst1;
+   while (bulps--)
+ frange_nextafter (TYPE_MODE (type), boundmax, dconstinf);
+   real_arithmetic (&boundmin, NEGATE_EXPR, &boundmax, NULL);
+   r.set (type, boundmin, boundmax);
+  }
+if (!lh.maybe_isnan () && !lh.maybe_isinf ())
+  r.clear_nan ();
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type,
+ const frange &lhs, const frange &,
+ relation_trio) const final override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+// A known NAN means the input is [-INF,-INF][+INF,+INF] U +-NAN,
+// which we can't currently represent.
+if (lhs.known_isnan ())
+  {
+   r.set_varying (type);
+   return true;
+  }
+
+// Results outside of [-1.0, +1.0] are impossible.
+REAL_VALUE_TYPE lb = lhs.lower_bound ();
+REAL_VALUE_TYPE ub = lhs.upper_bound ();
+if (real_less (&ub, &dconstm1) || real_less (&dconst1, &lb))
+  {
+   if (!lhs.maybe_isnan ())
+ r.set_undefined ();
+   else
+ /* If lhs could be NAN and finite result is impossible,
+the range is like lhs.known_isnan () above,
+[-INF,-INF][+INF,+INF] U +-NAN.  */
+ r.set_varying (type);
+   return true;
+  }
+
+if (!lhs.maybe_isnan ())
+  {
+   // If NAN is not valid result, the input cannot include either
+   // a NAN nor a +-INF.
+   lb = real_min_representable (type);
+   ub = real_max_representable (type);
+   r.set (type, lb, ub, nan_state (false, false));
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+private:
+  combined_fn m_cfn;
+} op_cfn_sin (CFN_SIN), op_cfn_cos (CFN_COS);
+
 // Implement range operator for CFN

[PATCH] v2: Add targetm.libm_function_max_error

2023-04-27 Thread Jakub Jelinek via Gcc-patches

Hi!

On Thu, Apr 27, 2023 at 10:34:59AM +, Richard Biener wrote:
> OK. As said the patch itself looks good to me, let's go ahead.  We
> have plenty of time to backtrack until GCC 14.

Thanks.  Unfortunately when I started using it, I've discovered that the
CASE_CFN_xxx_ALL macros don't include the CFN_xxx cases, just
CFN_BUILT_IN_xxx* cases.

So here is an updated version of the patch I'll bootstrap/regtest tonight
which instead uses CASE_CFN_xxx: CASE_CFN_xxx_FN:

2023-04-27  Jakub Jelinek  

* target.def (libm_function_max_error): New target hook.
* doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add.
* doc/tm.texi: Regenerated.
* targhooks.h (default_libm_function_max_error,
glibc_linux_libm_function_max_error): Declare.
* targhooks.cc: Include case-cfn-macros.h.
(default_libm_function_max_error,
glibc_linux_libm_function_max_error): New functions.
* config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/linux-protos.h (linux_libm_function_max_error): Declare.
* config/linux.cc: Include target.h and targhooks.h.
(linux_libm_function_max_error): New function.
* config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h.
(arc_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix.
(ix86_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/rs6000-protos.h
(rs6000_linux_libm_function_max_error): Declare.
* config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h
and case-cfn-macros.h.
(rs6000_linux_libm_function_max_error): New function.
* config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h.
(or1k_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.

--- gcc/target.def.jj   2023-04-27 10:17:32.598686398 +0200
+++ gcc/target.def  2023-04-27 10:26:58.361490211 +0200
@@ -2670,6 +2670,23 @@ DEFHOOK
  bool, (int fcode),
  default_libc_has_fast_function)

+DEFHOOK
+(libm_function_max_error,
+ "This hook determines expected maximum errors for math functions measured\n\
+in ulps (units of the last place).  0 means 0.5ulps precision (correctly\n\
+rounded).  ~0U means unknown errors.  The @code{combined_fn} @var{cfn}\n\
+argument should identify just which math built-in function it is rather than\n\
+its variant, @var{mode} the variant in terms of floating-point machine mode.\n\
+The hook should also take into account @code{flag_rounding_math} whether it\n\
+is maximum error just in default rounding mode, or in all possible rounding\n\
+modes.  @var{boundary_p} is @code{true} for maximum errors on intrinsic math\n\
+boundaries of functions rather than errors inside of the usual result ranges\n\
+of the functions.  E.g.@ the sin/cos function finite result is in between\n\
+-1.0 and 1.0 inclusive, with @var{boundary_p} true the function returns how\n\
+many ulps below or above those boundaries result could be.",
+ unsigned, (unsigned cfn, machine_mode mode, bool boundary_p),
+ default_libm_function_max_error)
+
 /* True if new jumps cannot be created, to replace existing ones or
not, at the current point in the compilation.  */
 DEFHOOK
--- gcc/doc/tm.texi.in.jj   2023-04-27 10:17:32.596686427 +0200
+++ gcc/doc/tm.texi.in  2023-04-27 10:26:58.362490196 +0200
@@ -4004,6 +4004,8 @@ macro, a reasonable default is used.

 @hook TARGET_LIBC_HAS_FAST_FUNCTION

+@hook TARGET_LIBM_FUNCTION_MAX_ERROR
+
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
 by default.  This calling convention involves passing the object, the selector
--- gcc/doc/tm.texi.jj  2023-04-27 10:17:32.593686470 +0200
+++ gcc/doc/tm.texi 2023-04-27 10:26:58.364490167 +0200
@@ -5760,6 +5760,21 @@ This hook determines whether a function
 @code{(enum function_class)}@var{fcode} has a fast implementation.
 @end deftypefn

+@deftypefn {Target Hook} unsigned TARGET_LIBM_FUNCTION_MAX_ERROR (unsigned 
@var{cfn}, machine_mode @var{mode}, bool @var{boundary_p})
+This hook determines expected maximum errors for math functions measured
+in ulps (units of the last place).  0 means 0.5ulps precision (correctly
+rounded).  ~0U means unknown errors.  The @code{combined_fn} @var{cfn}
+argument should identify just which math built-in function it is rather than
+its variant, @var{mode} the variant in terms of floating-point machine mode.
+The hook should also take into account @code{flag_rounding_math} whether it
+is maximum error just in default rounding mode, or in all possible rounding
+modes.  @var{boundary_p} is @code{true} f

Re: [PATCH] Add targetm.libm_function_max_error

2023-04-27 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 27, 2023 at 10:59:47AM +0200, Jakub Jelinek via Gcc-patches wrote:
> I guess I'll need to look at the IBM double double sinl/cosl results,
> either it is some bug in my tester or the libm functions are useless.
> But appart from the MODE_COMPOSITE_P cases, I think all the numbers are
> within what the patch returns.
> Even the sqrtl tonearest IBM double double case is larger than the libm ulps
> (2.5 vs. 1).

The first really large error I see is for sinl with
x/2gx &val
0x748160ed90d9425b  0xefd8b811d6293294
i.e.
1.5926552660973502228303666578452949e+253
with most significant double being
1.5926552660973502e+253
and low double
-5.9963639272208416e+230
Now, 0x748 - 0x6fd is 75, which is much larger than
53, so the number has precision larger than 106 bits.
given is
-0.4025472157704263326278375983156912
and expected (mpfr computed)
-0.46994008859023245970759964236618727
But if I try on x86_64:
#define _GNU_SOURCE
#include 

int
main ()
{
  _Float128 f, f2, f3, f4;
  double d, d2;
  f = 1.5926552660973502228303666578452949e+253f128;
  d = 1.5926552660973502e+253;
  f2 = d;
  f2 += -5.9963639272208416e+230;
  f3 = sinf128 (f);
  f4 = sinf128 (f2);
  d2 = sin (d);
  return 0;
}
where I think f2 is what matches most closely the 106 bit precision value,
(gdb) p f
$7 = 1.5926552660973502228303666578452949e+253
(gdb) p f2
$8 = 1.59265526609735022283036665784527174e+253
(gdb) p f3
$9 = -0.277062522218693980443596385112227247
(gdb) p f4
$10 = -0.402547215770426332627837598315693221
and f4 is much closer to the given than to expected.

On the other side, GCC will really work only with the assumption
the numbers have 106-bit precision, so shouldn't care much about
exact precision in between the range boundaries.

I think I'll for now just trust for IBM double double
the ulps files rather than mpfr.

Jakub

Re: [PATCH] Add support for vrange streaming.

2023-04-27 Thread Richard Biener via Gcc-patches

On Tue, Apr 18, 2023 at 2:48 PM Aldy Hernandez  wrote:
>
>
>
> On 4/18/23 11:06, Aldy Hernandez wrote:
> > I think it's time for the ranger folk to start owning range streaming
> > instead of passes (IPA, etc) doing their own thing.  I have plans for
> > overhauling the IPA code later this cycle to support generic ranges,
> > and I'd like to start cleaning up the streaming and hashing interface.
> >
> > This patch adds generic streaming support for vrange.
> >
> > I'd appreciate another set of eyes.
> >
> > Thoughts?
>
> We recently added support for querying and storing an frange's NAN
> without the need to be friends with the class.
>
> Adjusted patch in testing...

I think this is reasonable once you find use for it.

Thanks,
Richard.

> Aldy

Re: [PATCH 2/7] PHIOPT: Rename tree_ssa_phiopt_worker to pass_phiopt::execute

2023-04-27 Thread Richard Biener via Gcc-patches

On Mon, Apr 24, 2023 at 11:34 PM Andrew Pinski via Gcc-patches
 wrote:
>
> Now that store elimination and phiopt does not
> share outer code, we can move tree_ssa_phiopt_worker
> directly into pass_phiopt::execute and remove
> many declarations (prototypes) from the file.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (two_value_replacement): Remove
> prototype.
> (match_simplify_replacement): Likewise.
> (factor_out_conditional_conversion): Likewise.
> (value_replacement): Likewise.
> (minmax_replacement): Likewise.
> (spaceship_replacement): Likewise.
> (cond_removal_in_builtin_zero_pattern): Likewise.
> (hoist_adjacent_loads): Likewise.
> (tree_ssa_phiopt_worker): Move into ...
> (pass_phiopt::execute): this.
> ---
>  gcc/tree-ssa-phiopt.cc | 385 +++--
>  1 file changed, 181 insertions(+), 204 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 7f47b32576b..d232fd9b551 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -55,27 +55,10 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-propagate.h"
>  #include "tree-ssa-dce.h"
>
> -static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
> -  tree, tree);
> -static bool match_simplify_replacement (basic_block, basic_block, 
> basic_block,
> -   edge, edge, gphi *, tree, tree, bool, 
> bool);
> -static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
> tree,
> -   gimple *);
> -static int value_replacement (basic_block, basic_block,
> - edge, edge, gphi *, tree, tree);
> -static bool minmax_replacement (basic_block, basic_block, basic_block,
> -   edge, edge, gphi *, tree, tree, bool);
> -static bool spaceship_replacement (basic_block, basic_block,
> -  edge, edge, gphi *, tree, tree);
> -static bool cond_removal_in_builtin_zero_pattern (basic_block, basic_block,
> - edge, edge, gphi *,
> - tree, tree);
>  static bool cond_store_replacement (basic_block, basic_block, edge, edge,
> hash_set *);
>  static bool cond_if_else_store_replacement (basic_block, basic_block, 
> basic_block);
>  static hash_set * get_non_trapping ();
> -static void hoist_adjacent_loads (basic_block, basic_block,
> - basic_block, basic_block);
>
>  /* Return the singleton PHI in the SEQ of PHIs for edges E0 and E1. */
>
> @@ -104,188 +87,6 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge 
> e0, edge e1)
>return phi;
>  }
>
> -/* The core routine of phi optimizations.
> -   DO_HOIST_LOADS is true when we want to hoist adjacent loads out
> -   of diamond control flow patterns, false otherwise.  */
> -static unsigned int
> -tree_ssa_phiopt_worker (bool do_hoist_loads, bool early_p)
> -{
> -  basic_block bb;
> -  basic_block *bb_order;
> -  unsigned n, i;
> -  bool cfgchanged = false;
> -
> -  calculate_dominance_info (CDI_DOMINATORS);
> -
> -  /* Search every basic block for COND_EXPR we may be able to optimize.
> -
> - We walk the blocks in order that guarantees that a block with
> - a single predecessor is processed before the predecessor.
> - This ensures that we collapse inner ifs before visiting the
> - outer ones, and also that we do not try to visit a removed
> - block.  */
> -  bb_order = single_pred_before_succ_order ();
> -  n = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS;
> -
> -  for (i = 0; i < n; i++)
> -{
> -  gphi *phi;
> -  basic_block bb1, bb2;
> -  edge e1, e2;
> -  tree arg0, arg1;
> -  bool diamond_p = false;
> -
> -  bb = bb_order[i];
> -
> -  /* Check to see if the last statement is a GIMPLE_COND.  */
> -  gcond *cond_stmt = safe_dyn_cast  (*gsi_last_bb (bb));
> -  if (!cond_stmt)
> -   continue;
> -
> -  e1 = EDGE_SUCC (bb, 0);
> -  bb1 = e1->dest;
> -  e2 = EDGE_SUCC (bb, 1);
> -  bb2 = e2->dest;
> -
> -  /* We cannot do the optimization on abnormal edges.  */
> -  if ((e1->flags & EDGE_ABNORMAL) != 0
> -  || (e2->flags & EDGE_ABNORMAL) != 0)
> -   continue;
> -
> -  /* If either bb1's succ or bb2 or bb2's succ is non NULL.  */
> -  if (EDGE_COUNT (bb1->succs) == 0
> - || EDGE_COUNT (bb2->succs) == 0)
> -   continue;
> -
> -  /* Find the bb which is the fall through to the other.  */
> -  if (EDGE_SUCC (bb1, 0)->dest == bb2)
> -;
> -  else if (EDGE_SUCC (bb2, 0)->dest == bb1)
> -{
> - std::swap (bb1, bb2);
> - std::swap (e1, e2);
> -   }
> -  else if (EDGE_SUCC (bb1, 0)->dest == E

Re: [PATCH 7/7] MATCH: Add patterns from phiopt's minmax_replacement

2023-04-27 Thread Richard Biener via Gcc-patches

On Mon, Apr 24, 2023 at 11:33 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds a few patterns from phiopt's minmax_replacement
> for (A CMP B) ? MIN/MAX : MIN/MAX  .
> It is progress to remove minmax_replacement from phiopt.
> There are still some more cases dealing with constants on the
> edges (0/INT_MAX) to handle in match.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * match.pd: Add patterns for
> "(A CMP B) ? MIN/MAX : MIN/MAX ".
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly.
> * gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert
> as that now does the combining.
> ---
>  gcc/match.pd | 16 
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c| 10 --
>  gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c |  3 ++-
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 6d3aaf45a93..5d5aae24509 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4843,6 +4843,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (convert @c0
>  #endif
>
> +/* These was part of minmax phiopt.  */
> +/* Optimize (a CMP b) ? minmax : minmax
> +   to minmax, c> */
> +(for minmax (min max)
> + (for cmp (lt le gt ge)
> +  (simplify
> +   (cond (cmp @1 @3) (minmax:c @1 @4) (minmax:c @2 @4))
> +   (with
> +{
> +  tree_code code = minmax_from_comparison (cmp, @1, @2, @1, @3);
> +}
> +(if (code == MIN_EXPR)
> + (minmax (min @1 @2) @4)
> + (if (code == MAX_EXPR)
> +  (minmax (max @1 @2) @4)))
> +
>  /* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
>  (simplify
>   (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> index 4febd092d83..623b12b3f74 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O -fdump-tree-phiopt -g" } */
> +/* { dg-options "-O -fdump-tree-phiopt -fdump-tree-optimized -g" } */
>
>  #include 
>
> @@ -25,5 +25,11 @@ main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> +/* After phiopt1, there really should be only 3 MIN_EXPR in the IR 
> (including debug statements).
> +   But the way phiopt does not cleanup the CFG all the time, the PHI might 
> still reference the
> +   alternative bb's moved statement.
> +   Note in the end, we do dce the statement and other debug statements to 
> end up with only 2 MIN_EXPR.
> +   So check that too. */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 4 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c
> index 902dde44a50..b670dee8d10 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/split-path-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details --param 
> max-jump-thread-duplication-stmts=20 -fno-ssa-phiopt" } */
> +/* Note both PHI-OPT and the loop if conversion pass converts the inner if 
> to be branchless using min/max. */
> +/* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details --param 
> max-jump-thread-duplication-stmts=20 -fno-ssa-phiopt 
> -fno-tree-loop-if-convert" } */
>
>  #include 
>  #include 
> --
> 2.39.1
>

Re: [PATCH 1/7] PHIOPT: Split out store elimination from phiopt

2023-04-27 Thread Richard Biener via Gcc-patches

On Mon, Apr 24, 2023 at 11:31 PM Andrew Pinski via Gcc-patches
 wrote:
>
> Since the last cleanups, it made easier to see
> that we should split out the store elimination
> worker from tree_ssa_phiopt_worker function.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove
> do_store_elim argument and split that part out to ...
> (store_elim_worker): This new function.
> (pass_cselim::execute): Call store_elim_worker.
> (pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.
> ---
>  gcc/tree-ssa-phiopt.cc | 180 -
>  1 file changed, 126 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 4a3ab8efb71..7f47b32576b 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -104,27 +104,19 @@ single_non_singleton_phi_for_edges (gimple_seq seq, 
> edge e0, edge e1)
>return phi;
>  }
>
> -/* The core routine of conditional store replacement and normal
> -   phi optimizations.  Both share much of the infrastructure in how
> -   to match applicable basic block patterns.  DO_STORE_ELIM is true
> -   when we want to do conditional store replacement, false otherwise.
> +/* The core routine of phi optimizations.
> DO_HOIST_LOADS is true when we want to hoist adjacent loads out
> of diamond control flow patterns, false otherwise.  */
>  static unsigned int
> -tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool 
> early_p)
> +tree_ssa_phiopt_worker (bool do_hoist_loads, bool early_p)
>  {
>basic_block bb;
>basic_block *bb_order;
>unsigned n, i;
>bool cfgchanged = false;
> -  hash_set *nontrap = 0;
>
>calculate_dominance_info (CDI_DOMINATORS);
>
> -  if (do_store_elim)
> -/* Calculate the set of non-trapping memory accesses.  */
> -nontrap = get_non_trapping ();
> -
>/* Search every basic block for COND_EXPR we may be able to optimize.
>
>   We walk the blocks in order that guarantees that a block with
> @@ -148,7 +140,7 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
>/* Check to see if the last statement is a GIMPLE_COND.  */
>gcond *cond_stmt = safe_dyn_cast  (*gsi_last_bb (bb));
>if (!cond_stmt)
> -continue;
> +   continue;
>
>e1 = EDGE_SUCC (bb, 0);
>bb1 = e1->dest;
> @@ -158,12 +150,12 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
>/* We cannot do the optimization on abnormal edges.  */
>if ((e1->flags & EDGE_ABNORMAL) != 0
>|| (e2->flags & EDGE_ABNORMAL) != 0)
> -   continue;
> +   continue;
>
>/* If either bb1's succ or bb2 or bb2's succ is non NULL.  */
>if (EDGE_COUNT (bb1->succs) == 0
>   || EDGE_COUNT (bb2->succs) == 0)
> -continue;
> +   continue;
>
>/* Find the bb which is the fall through to the other.  */
>if (EDGE_SUCC (bb1, 0)->dest == bb2)
> @@ -192,39 +184,6 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
>   || (e1->flags & EDGE_FALLTHRU) == 0)
> continue;
>
> -  if (do_store_elim)
> -   {
> - if (diamond_p)
> -   {
> - basic_block bb3 = e1->dest;
> -
> - /* Only handle sinking of store from 2 bbs only,
> -The middle bbs don't need to come from the
> -if always since we are sinking rather than
> -hoisting. */
> - if (EDGE_COUNT (bb3->preds) != 2)
> -   continue;
> - if (cond_if_else_store_replacement (bb1, bb2, bb3))
> -   cfgchanged = true;
> - continue;
> -   }
> -
> - /* Also make sure that bb1 only have one predecessor and that it
> -is bb.  */
> - if (!single_pred_p (bb1)
> - || single_pred (bb1) != bb)
> -   continue;
> -
> - /* bb1 is the middle block, bb2 the join block, bb the split block,
> -e1 the fallthrough edge from bb1 to bb2.  We can't do the
> -optimization if the join block has more than two predecessors.  
> */
> - if (EDGE_COUNT (bb2->preds) > 2)
> -   continue;
> - if (cond_store_replacement (bb1, bb2, e1, e2, nontrap))
> -   cfgchanged = true;
> - continue;
> -   }
> -
>if (diamond_p)
> {
>   basic_block bb3 = e1->dest;
> @@ -322,18 +281,132 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
>
>free (bb_order);
>
> -  if (do_store_elim)
> -delete nontrap;
> +  if (cfgchanged)
> +return TODO_cleanup_cfg;
> +  return 0;
> +}
> +
> +/* The core routine of conditional store replacement.  */
> +static unsigned int
> +store_elim_worker (void)
> +{
> +  basic

Re: [PATCH 5/7] PHIOPT: Allow MIN/MAX to have up to 2 MIN/MAX expressions for early phiopt

2023-04-27 Thread Richard Biener via Gcc-patches

On Mon, Apr 24, 2023 at 11:33 PM Andrew Pinski via Gcc-patches
 wrote:
>
> In the early PHIOPT mode, the original minmax_replacement, would
> replace a PHI node with up to 2 min/max expressions in some cases,
> this allows for that too.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (phiopt_early_allow): Allow for
> up to 2 min/max expressions in the sequence/match code.
> ---
>  gcc/tree-ssa-phiopt.cc | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index fb2d2c9fc1a..de1896aa91a 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -542,9 +542,23 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
>  return false;
>tree_code code = (tree_code)op.code;
>
> -  /* For non-empty sequence, only allow one statement.  */
> +  /* For non-empty sequence, only allow one statement
> + except for MIN/MAX, allow max 2 statements,
> + each with MIN/MAX.  */
>if (!gimple_seq_empty_p (seq))
>  {
> +  if (code == MIN_EXPR || code == MAX_EXPR)
> +   {
> + if (!gimple_seq_singleton_p (seq))
> +   return false;
> +
> + gimple *stmt = gimple_seq_first_stmt (seq);
> + /* Only allow assignments.  */
> + if (!is_gimple_assign (stmt))
> +   return false;
> + code = gimple_assign_rhs_code (stmt);
> + return code == MIN_EXPR || code == MAX_EXPR;
> +   }
>/* Check to make sure op was already a SSA_NAME.  */
>if (code != SSA_NAME)
> return false;
> --
> 2.39.1
>

Re: [PATCH 3/7] PHIOPT: Move store_elim_worker into pass_cselim::execute

2023-04-27 Thread Richard Biener via Gcc-patches

On Mon, Apr 24, 2023 at 11:31 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This simple patch moves the body of store_elim_worker
> direclty into pass_cselim::execute.
>
> Also removes unneeded prototypes too.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (cond_store_replacement): Remove
> prototype.
> (cond_if_else_store_replacement): Likewise.
> (get_non_trapping): Likewise.
> (store_elim_worker): Move into ...
> (pass_cselim::execute): This.
> ---
>  gcc/tree-ssa-phiopt.cc | 250 -
>  1 file changed, 119 insertions(+), 131 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index d232fd9b551..fb2d2c9fc1a 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -55,11 +55,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-propagate.h"
>  #include "tree-ssa-dce.h"
>
> -static bool cond_store_replacement (basic_block, basic_block, edge, edge,
> -   hash_set *);
> -static bool cond_if_else_store_replacement (basic_block, basic_block, 
> basic_block);
> -static hash_set * get_non_trapping ();
> -
>  /* Return the singleton PHI in the SEQ of PHIs for edges E0 and E1. */
>
>  static gphi *
> @@ -87,130 +82,6 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge 
> e0, edge e1)
>return phi;
>  }
>
> -/* The core routine of conditional store replacement.  */
> -static unsigned int
> -store_elim_worker (void)
> -{
> -  basic_block bb;
> -  basic_block *bb_order;
> -  unsigned n, i;
> -  bool cfgchanged = false;
> -  hash_set *nontrap = 0;
> -
> -  calculate_dominance_info (CDI_DOMINATORS);
> -
> -  /* Calculate the set of non-trapping memory accesses.  */
> -  nontrap = get_non_trapping ();
> -
> -  /* Search every basic block for COND_EXPR we may be able to optimize.
> -
> - We walk the blocks in order that guarantees that a block with
> - a single predecessor is processed before the predecessor.
> - This ensures that we collapse inner ifs before visiting the
> - outer ones, and also that we do not try to visit a removed
> - block.  */
> -  bb_order = single_pred_before_succ_order ();
> -  n = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS;
> -
> -  for (i = 0; i < n; i++)
> -{
> -  basic_block bb1, bb2;
> -  edge e1, e2;
> -  bool diamond_p = false;
> -
> -  bb = bb_order[i];
> -
> -  /* Check to see if the last statement is a GIMPLE_COND.  */
> -  gcond *cond_stmt = safe_dyn_cast  (*gsi_last_bb (bb));
> -  if (!cond_stmt)
> -   continue;
> -
> -  e1 = EDGE_SUCC (bb, 0);
> -  bb1 = e1->dest;
> -  e2 = EDGE_SUCC (bb, 1);
> -  bb2 = e2->dest;
> -
> -  /* We cannot do the optimization on abnormal edges.  */
> -  if ((e1->flags & EDGE_ABNORMAL) != 0
> - || (e2->flags & EDGE_ABNORMAL) != 0)
> -   continue;
> -
> -  /* If either bb1's succ or bb2 or bb2's succ is non NULL.  */
> -  if (EDGE_COUNT (bb1->succs) == 0
> - || EDGE_COUNT (bb2->succs) == 0)
> -   continue;
> -
> -  /* Find the bb which is the fall through to the other.  */
> -  if (EDGE_SUCC (bb1, 0)->dest == bb2)
> -   ;
> -  else if (EDGE_SUCC (bb2, 0)->dest == bb1)
> -   {
> - std::swap (bb1, bb2);
> - std::swap (e1, e2);
> -   }
> -  else if (EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest
> -  && single_succ_p (bb2))
> -   {
> - diamond_p = true;
> - e2 = EDGE_SUCC (bb2, 0);
> - /* Make sure bb2 is just a fall through. */
> - if ((e2->flags & EDGE_FALLTHRU) == 0)
> -   continue;
> -   }
> -  else
> -   continue;
> -
> -  e1 = EDGE_SUCC (bb1, 0);
> -
> -  /* Make sure that bb1 is just a fall through.  */
> -  if (!single_succ_p (bb1)
> - || (e1->flags & EDGE_FALLTHRU) == 0)
> -   continue;
> -
> -  if (diamond_p)
> -   {
> - basic_block bb3 = e1->dest;
> -
> - /* Only handle sinking of store from 2 bbs only,
> -The middle bbs don't need to come from the
> -if always since we are sinking rather than
> -hoisting. */
> - if (EDGE_COUNT (bb3->preds) != 2)
> -   continue;
> - if (cond_if_else_store_replacement (bb1, bb2, bb3))
> -   cfgchanged = true;
> - continue;
> -   }
> -
> -  /* Also make sure that bb1 only have one predecessor and that it
> -is bb.  */
> -  if (!single_pred_p (bb1)
> - || single_pred (bb1) != bb)
> -   continue;
> -
> -  /* bb1 is the middle block, bb2 the join block, bb the split block,
> -e1 the fallthrough edge from bb1 to bb2.  We can't do the
> -optimization if the join block has more than two predecessors.  */
> -  if (EDGE_COUNT (bb2->preds) > 2)
> -

Re: [PATCH 4/7] MIN/MAX should be treated similar as comparisons for trapping

2023-04-27 Thread Richard Biener via Gcc-patches

On Mon, Apr 24, 2023 at 11:31 PM Andrew Pinski via Gcc-patches
 wrote:
>
> While looking into moving optimizations from minmax_replacement
> in phiopt to match.pd, I Noticed that min/max were considered
> trapping even if -ffinite-math-only was being used. This changes
> those expressions to be similar as comparisons so that they are
> not considered trapping if -ffinite-math-only is on.
>
> OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * rtlanal.cc (may_trap_p_1): Treat SMIN/SMAX similar as
> COMPARISON.
> * tree-eh.cc (operation_could_trap_helper_p): Treate
> MIN_EXPR/MAX_EXPR similar as other comparisons.
> ---
>  gcc/rtlanal.cc | 3 +++
>  gcc/tree-eh.cc | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index c96a88cebf1..b7948ecfad1 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -3204,6 +3204,9 @@ may_trap_p_1 (const_rtx x, unsigned flags)
>  case LT:
>  case LTGT:
>  case COMPARE:
> +/* Treat min/max similar as comparisons.  */
> +case SMIN:
> +case SMAX:
>/* Some floating point comparisons may trap.  */
>if (!flag_trapping_math)
> break;
> diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
> index 41cf57d2b30..dbaa27d95c5 100644
> --- a/gcc/tree-eh.cc
> +++ b/gcc/tree-eh.cc
> @@ -2490,6 +2490,9 @@ operation_could_trap_helper_p (enum tree_code op,
>  case GT_EXPR:
>  case GE_EXPR:
>  case LTGT_EXPR:
> +/* MIN/MAX similar as LT/LE/GT/GE. */
> +case MIN_EXPR:
> +case MAX_EXPR:
>/* Some floating point comparisons may trap.  */
>return honor_nans;
>
> --
> 2.39.1
>

Re: [PATCH] Add targetm.libm_function_max_error

2023-04-27 Thread Richard Biener via Gcc-patches

On Thu, 27 Apr 2023, Jakub Jelinek wrote:

> On Thu, Apr 27, 2023 at 07:18:52AM +, Richard Biener wrote:
> > Humm.  Is it worth the trouble?  I think if we make use of this it needs
> 
> I think so.  Without that, frange is half blind, using at least most common
> libm functions in floating point code is extremely common and without
> knowing anything about what those functions can or can't return frange will
> be mostly VARYING.  And simply assuming all libm implementations are perfect
> 0.5ulps precise for all inputs would be very risky when we know it clearly
> is not the case.
> 
> Of course, by improving frange further, we run more and more into the
> already reported and just tiny bit worked around bug on the optimized away
> floating point exception generating statements and we need to decide what to
> do for that case..
> 
> > to be with -funsafe-math-optimizations (or a new switch?).  I'll note
> 
> Why?  If we know or can reasonably assume that say on the boundary values
> some function is always precise (say sqrt always in [-0.,+Inf] U NAN,
> sin/cos always in [-1.,1.] U NAN etc., that isn't an unsafe math
> optimization to assume it is the case.  If we know it is a few ulps away
> from that, we can just widen the range, if we don't know anything or the
> function implementation is uselessly buggy, we can punt.
> Whether something is a known math library function or just some floating
> point arithmetics which we already handle in 13.1 shouldn't make much
> difference.

OK, fair enough.

> > Should we, when simplifying say
> > 
> >   x = sin (y);
> >   if (x <= 1.)
> > 
> > simplify it to
> > 
> >   x = sin (y);
> >   x = min (x, 1.);
> > 
> > for extra safety?
> 
> Why?  If we don't know anything about y, x could be NAN, so we can't fold
> it, but if we know it will not be NAN, it is always true and we are there
> back to the exceptions case (plus errno but that makes the function
> non-const, doesn't it?).
> 
> > That said - what kind of code do we expect to optimize when producing
> > ranges for math function [operands]?  Isn't it just defensive programming
> > that we'd "undo"?  Are there any missed-optimization PRs around this?
> 
> I strongly doubt real-world code has such defensive programming checks.
> The intent isn't to optimize those away, but generally propagate range
> information, such that we say know that sqrt result isn't negative (except
> for possible -0. or negative NaN), when you add sin(x)^2+cos(y)^2 it will be
> never > 2. etc.
> It can then e.g. help with expansion of other possibly error generating
> functions, e.g. where cdce transforms library function calls into inline
> fast hw instruction vs. slow libm function for error cases; if we can prove
> those error cases will never happen or will always happen, we can create
> smaller/faster code.

OK. As said the patch itself looks good to me, let's go ahead.  We
have plenty of time to backtrack until GCC 14.

Richard.

[committed] libstdc++: Remove obsolete options from Doxygen config

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Docs tested on Fedora 37 with Doxygen 1.9.7
from current git master.

Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (FORMULA_TRANSPARENT, DOT_FONTNAME)
(DOT_FONTSIZE, DOT_TRANSPARENT): Remove obsolete options.
---
 libstdc++-v3/doc/doxygen/user.cfg.in | 40 
 1 file changed, 40 deletions(-)

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 9d814af8614..75108604a07 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -1728,17 +1728,6 @@ HTML_FORMULA_FORMAT= png
 
 FORMULA_FONTSIZE   = 10
 
-# Use the FORMULA_TRANSPARENT tag to determine whether or not the images
-# generated for formulas are transparent PNGs. Transparent PNGs are not
-# supported properly for IE 6.0, but are supported on all modern browsers.
-#
-# Note that when changing this option you need to delete any form_*.png files 
in
-# the HTML output directory before the changes have effect.
-# The default value is: YES.
-# This tag requires that the tag GENERATE_HTML is set to YES.
-
-FORMULA_TRANSPARENT= YES
-
 # The FORMULA_MACROFILE can contain LaTeX \newcommand and \renewcommand 
commands
 # to create new LaTeX commands to be used in formulas as building blocks. See
 # the section "Including formulas" for details.
@@ -2512,23 +2501,6 @@ HAVE_DOT   = YES
 
 DOT_NUM_THREADS= 0
 
-# When you want a differently looking font in the dot files that doxygen
-# generates you can specify the font name using DOT_FONTNAME. You need to make
-# sure dot is able to find the font, which can be done by putting it in a
-# standard location or by setting the DOTFONTPATH environment variable or by
-# setting DOT_FONTPATH to the directory containing the font.
-# The default value is: Helvetica.
-# This tag requires that the tag HAVE_DOT is set to YES.
-
-DOT_FONTNAME   =
-
-# The DOT_FONTSIZE tag can be used to set the size (in points) of the font of
-# dot graphs.
-# Minimum value: 4, maximum value: 24, default value: 10.
-# This tag requires that the tag HAVE_DOT is set to YES.
-
-DOT_FONTSIZE   = 9
-
 # By default doxygen will tell dot to use the default font as specified with
 # DOT_FONTNAME. If you specify a different font using DOT_FONTNAME you can set
 # the path where dot can find it using this tag.
@@ -2741,18 +2713,6 @@ DOT_GRAPH_MAX_NODES= 50
 
 MAX_DOT_GRAPH_DEPTH= 0
 
-# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent
-# background. This is disabled by default, because dot on Windows does not seem
-# to support this out of the box.
-#
-# Warning: Depending on the platform used, enabling this option may lead to
-# badly anti-aliased labels on the edges of a graph (i.e. they become hard to
-# read).
-# The default value is: NO.
-# This tag requires that the tag HAVE_DOT is set to YES.
-
-DOT_TRANSPARENT= NO
-
 # Set the DOT_MULTI_TARGETS tag to YES to allow dot to generate multiple output
 # files in one run (i.e. multiple -o and -T options on the command line). This
 # makes dot run faster, but since only newer versions of dot (>1.8.10) support
-- 
2.40.0

[committed] libstdc++: Reduce Doxygen output for PDF

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Docs tested on Fedora 37 with Doxygen 1.9.7
from current git master.

Pushed to trunk.  I'll backport this too.

-- >8 --

Including the header source code in the doxygen-generated PDF file makes
it too large, and causes pdflatex to run out of memory. If we only set
SOURCE_BROWSER=YES for the HTML docs then we won't include the sources
in the PDF file.

There are several macros defined for std::valarray that are only used to
generate repetitive code and then #undef'd. Those aren't useful in the
doxygen docs, especially the ones that reuse the same name in different
files. Omitting them avoids warnings about duplicate labels in the
refman.tex file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for
HTML docs.
* include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit
from doxygen docs.
* include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR)
(_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT)
(_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT)
(_DEFINE_BINARY_OPERATOR): Likewise.
---
 libstdc++-v3/doc/doxygen/user.cfg.in   | 2 +-
 libstdc++-v3/include/bits/gslice_array.h   | 2 ++
 libstdc++-v3/include/bits/indirect_array.h | 2 ++
 libstdc++-v3/include/bits/mask_array.h | 2 ++
 libstdc++-v3/include/bits/slice_array.h| 2 ++
 libstdc++-v3/include/std/valarray  | 2 ++
 6 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 31613f51517..9d814af8614 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -1217,7 +1217,7 @@ USE_MDFILE_AS_MAINPAGE =
 # also VERBATIM_HEADERS is set to NO.
 # The default value is: NO.
 
-SOURCE_BROWSER = YES
+SOURCE_BROWSER = @do_html@
 
 # Setting the INLINE_SOURCES tag to YES will include the body of functions,
 # classes and enums directly into the documentation.
diff --git a/libstdc++-v3/include/bits/gslice_array.h 
b/libstdc++-v3/include/bits/gslice_array.h
index f117a172678..6a48d418477 100644
--- a/libstdc++-v3/include/bits/gslice_array.h
+++ b/libstdc++-v3/include/bits/gslice_array.h
@@ -183,6 +183,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _Array(_M_index));
   }
 
+  /// @cond undocumented
 #undef _DEFINE_VALARRAY_OPERATOR
 #define _DEFINE_VALARRAY_OPERATOR(_Op, _Name)  \
   template   \
@@ -214,6 +215,7 @@ _DEFINE_VALARRAY_OPERATOR(<<, __shift_left)
 _DEFINE_VALARRAY_OPERATOR(>>, __shift_right)
 
 #undef _DEFINE_VALARRAY_OPERATOR
+  /// @endcond
 
   /// @} group numeric_arrays
 
diff --git a/libstdc++-v3/include/bits/indirect_array.h 
b/libstdc++-v3/include/bits/indirect_array.h
index deeed99893c..8d34a365799 100644
--- a/libstdc++-v3/include/bits/indirect_array.h
+++ b/libstdc++-v3/include/bits/indirect_array.h
@@ -174,6 +174,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   indirect_array<_Tp>::operator=(const _Expr<_Dom, _Tp>& __e) const
   { std::__valarray_copy(__e, _M_sz, _M_array, _M_index); }
 
+  /// @cond undocumented
 #undef _DEFINE_VALARRAY_OPERATOR
 #define _DEFINE_VALARRAY_OPERATOR(_Op, _Name)  \
   template   \
@@ -203,6 +204,7 @@ _DEFINE_VALARRAY_OPERATOR(<<, __shift_left)
 _DEFINE_VALARRAY_OPERATOR(>>, __shift_right)
 
 #undef _DEFINE_VALARRAY_OPERATOR
+  /// @endcond
 
   /// @} group numeric_arrays
 
diff --git a/libstdc++-v3/include/bits/mask_array.h 
b/libstdc++-v3/include/bits/mask_array.h
index d4112a9d0a3..a3174dd7074 100644
--- a/libstdc++-v3/include/bits/mask_array.h
+++ b/libstdc++-v3/include/bits/mask_array.h
@@ -181,6 +181,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::__valarray_copy(__e, __e.size(), _M_array, _M_mask);
   }
 
+  /// @cond undocumented
 #undef _DEFINE_VALARRAY_OPERATOR
 #define _DEFINE_VALARRAY_OPERATOR(_Op, _Name)  \
   template   \
@@ -213,6 +214,7 @@ _DEFINE_VALARRAY_OPERATOR(<<, __shift_left)
 _DEFINE_VALARRAY_OPERATOR(>>, __shift_right)
 
 #undef _DEFINE_VALARRAY_OPERATOR
+  /// @endcond
 
   /// @} group numeric_arrays
 
diff --git a/libstdc++-v3/include/bits/slice_array.h 
b/libstdc++-v3/include/bits/slice_array.h
index 571e372c292..42b136d7dce 100644
--- a/libstdc++-v3/include/bits/slice_array.h
+++ b/libstdc++-v3/include/bits/slice_array.h
@@ -245,6 +245,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 slice_array<_Tp>::operator=(const _Expr<_Dom,_Tp>& __e) const
 { std::__valarray_copy(__e, _M_sz, _M_array, _M_stride); }
 
+  /// @cond undocumented
 #undef

[committed] libstdc++: Fix typos in doxygen comments

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Docs tested on Fedora 37 with Doxygen 1.9.7
from current git master.

Pushed to trunk.  I'll backport this too.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h: Fix typo in doxygen comment.
* include/std/format: Likewise.
---
 libstdc++-v3/include/bits/mofunc_impl.h | 3 +--
 libstdc++-v3/include/std/format | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/mofunc_impl.h 
b/libstdc++-v3/include/bits/mofunc_impl.h
index 47e1e506306..318a55e618f 100644
--- a/libstdc++-v3/include/bits/mofunc_impl.h
+++ b/libstdc++-v3/include/bits/mofunc_impl.h
@@ -51,14 +51,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @headerfile functional
*
*  The `std::move_only_function` class template is a call wrapper similar
-   *  to *  `std::function`, but does not require the stored target function
+   *  to `std::function`, but does not require the stored target function
*  to be copyable.
*
*  It also supports const-qualification, ref-qualification, and
*  no-throw guarantees. The qualifications and exception-specification
*  of the `move_only_function::operator()` member function are respected
*  when invoking the target function.
-   *
*/
   template
 class move_only_function<_Res(_ArgTypes...) _GLIBCXX_MOF_CV
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index e4ef4f9b6d9..6edc3208afa 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -185,7 +185,7 @@ namespace __format
   __failed_to_parse_format_spec()
   { __throw_format_error("format error: failed to parse format-spec"); }
 } // namespace __format
-/// @endcond
+  /// @endcond
 
   // [format.parse.ctx], class template basic_format_parse_context
   template class basic_format_parse_context;
@@ -3870,7 +3870,7 @@ namespace __format
 };
 #endif
 } // namespace __format
-/// @@endcond
+/// @endcond
 
   template
 [[nodiscard]]
-- 
2.40.0

[committed] libstdc++: Improve doxygen docs for

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Docs tested on Fedora 37 with Doxygen 1.9.7
from current git master.

Pushed to trunk.  I'll probably backport this too.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/memory_resource.h: Improve doxygen comments.
* include/std/memory_resource: Likewise.
---
 libstdc++-v3/include/bits/memory_resource.h | 12 
 libstdc++-v3/include/std/memory_resource| 63 +
 2 files changed, 75 insertions(+)

diff --git a/libstdc++-v3/include/bits/memory_resource.h 
b/libstdc++-v3/include/bits/memory_resource.h
index 1b9e51ddbbd..f12555d4215 100644
--- a/libstdc++-v3/include/bits/memory_resource.h
+++ b/libstdc++-v3/include/bits/memory_resource.h
@@ -53,6 +53,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 namespace pmr
 {
   /// Class memory_resource
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   class memory_resource
   {
 static constexpr size_t _S_max_align = alignof(max_align_t);
@@ -104,6 +109,13 @@ namespace pmr
 #endif
 
   // C++17 23.12.3 Class template polymorphic_allocator
+
+  /// Class template polymorphic_allocator
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   template
 class polymorphic_allocator
 {
diff --git a/libstdc++-v3/include/std/memory_resource 
b/libstdc++-v3/include/std/memory_resource
index 00859262470..fdfc23c95ed 100644
--- a/libstdc++-v3/include/std/memory_resource
+++ b/libstdc++-v3/include/std/memory_resource
@@ -24,6 +24,9 @@
 
 /** @file include/memory_resource
  *  This is a Standard C++ Library header.
+ *
+ *  This header declares the @ref pmr (std::pmr) memory resources.
+ *  @ingroup pmr
  */
 
 #ifndef _GLIBCXX_MEMORY_RESOURCE
@@ -35,6 +38,25 @@
 
 #if __cplusplus >= 201703L
 
+/**
+ * @defgroup pmr Polymorphic memory resources
+ *
+ * @anchor pmr
+ * @ingroup memory
+ * @since C++17
+ *
+ * Memory resources are classes that implement the `std::pmr::memory_resource`
+ * interface for allocating and deallocating memory. Unlike traditional C++
+ * allocators, memory resources are not value types and are used via pointers
+ * to the abstract base class. They are only responsible for allocating and
+ * deallocating, not for construction and destruction of objects. As a result,
+ * memory resources just allocate raw memory as type `void*` and are not
+ * templates that allocate/deallocate and construct/destroy a specific type.
+ *
+ * The class template `std::pmr::polymorphic_allocator` is an allocator that
+ * uses a memory resource for its allocations.
+ */
+
 #include 
 #include   // vector
 #include // shared_mutex
@@ -63,6 +85,11 @@ namespace pmr
   // Global memory resources
 
   /// A pmr::memory_resource that uses `new` to allocate memory
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   [[nodiscard, __gnu__::__returns_nonnull__, __gnu__::__const__]]
   memory_resource*
   new_delete_resource() noexcept;
@@ -91,6 +118,11 @@ namespace pmr
   class monotonic_buffer_resource;
 
   /// Parameters for tuning a pool resource's behaviour.
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   struct pool_options
   {
 /** @brief Upper limit on number of blocks in a chunk.
@@ -152,6 +184,11 @@ namespace pmr
 
 #ifdef _GLIBCXX_HAS_GTHREADS
   /// A thread-safe memory resource that manages pools of fixed-size blocks.
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   class synchronized_pool_resource : public memory_resource
   {
   public:
@@ -218,6 +255,11 @@ namespace pmr
 #endif
 
   /// A non-thread-safe memory resource that manages pools of fixed-size 
blocks.
+  /**
+   * @ingroup pmr
+   * @headerfile memory_resource
+   * @since C++17
+   */
   class unsynchronized_pool_resource : public memory_resource
   {
   public:
@@ -275,6 +317,27 @@ namespace pmr
 _Pool* _M_pools = nullptr;
   };
 
+  /// A memory resource that allocates from a fixed-size buffer.
+  /**
+   * The main feature of a `pmr::monotonic_buffer_resource` is that its
+   * `do_deallocate` does nothing. This makes it very fast because there is no
+   * need to manage a free list, and every allocation simply returns a new
+   * block of memory, rather than searching for a suitably-sized free block.
+   * Because deallocating is a no-op, the amount of memory used by the resource
+   * only grows until `release()` (or the destructor) is called to return all
+   * memory to upstream.
+   *
+   * A `monotonic_buffer_resource` can be initialized with a buffer that
+   * will be used to satisfy all allocation requests, until the buffer is full.
+   * After that a new buffer will be allocated from the upstream resource.
+   * By using a stack buffer and `pmr::null_memory_resource()` as the upstream
+   * you can get a memory resource that only uses the stack and never
+   * dynamically allocates.
+   *
+   * @

[committed] libstdc++: Add @headerfile and @since to doxygen comments [PR40380]

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Docs tested on Fedora 37 with Doxygen 1.9.7
from current git master.

Pushed to trunk.  I'll probably backport this too.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/40380
* include/bits/basic_string.h: Improve doxygen comments.
* include/bits/cow_string.h: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/fs_dir.h: Likewise.
* include/bits/fs_path.h: Likewise.
* include/bits/quoted_string.h: Likewise.
* include/bits/stl_bvector.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/std/filesystem: Likewise.
* include/std/iomanip: Likewise.
---
 libstdc++-v3/include/bits/basic_string.h  |  2 ++
 libstdc++-v3/include/bits/cow_string.h|  2 ++
 libstdc++-v3/include/bits/forward_list.h  |  2 ++
 libstdc++-v3/include/bits/fs_dir.h| 35 +--
 libstdc++-v3/include/bits/fs_path.h   | 18 +++-
 libstdc++-v3/include/bits/quoted_string.h | 12 +---
 libstdc++-v3/include/bits/stl_bvector.h   |  2 ++
 libstdc++-v3/include/bits/stl_map.h   |  2 ++
 libstdc++-v3/include/bits/stl_multimap.h  |  2 ++
 libstdc++-v3/include/bits/stl_multiset.h  |  3 +-
 libstdc++-v3/include/bits/stl_set.h   |  2 ++
 libstdc++-v3/include/bits/stl_vector.h|  2 ++
 libstdc++-v3/include/bits/unordered_map.h |  4 +++
 libstdc++-v3/include/bits/unordered_set.h |  4 +++
 libstdc++-v3/include/std/filesystem   |  2 ++
 libstdc++-v3/include/std/iomanip  |  1 +
 16 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 5d040e2897d..8247ee6bdc6 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -69,6 +69,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*
*  @ingroup strings
*  @ingroup sequences
+   *  @headerfile string
+   *  @since C++98
*
*  @tparam _CharT  Type of character
*  @tparam _Traits  Traits for character type, defaults to
diff --git a/libstdc++-v3/include/bits/cow_string.h 
b/libstdc++-v3/include/bits/cow_string.h
index b6024365d4f..e5f094fd13e 100644
--- a/libstdc++-v3/include/bits/cow_string.h
+++ b/libstdc++-v3/include/bits/cow_string.h
@@ -54,6 +54,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  @ingroup strings
*  @ingroup sequences
+   *  @headerfile string
+   *  @since C++98
*
*  @tparam _CharT  Type of character
*  @tparam _Traits  Traits for character type, defaults to
diff --git a/libstdc++-v3/include/bits/forward_list.h 
b/libstdc++-v3/include/bits/forward_list.h
index e1e68bd7e04..72b1ef46d14 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -406,6 +406,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  and fixed time insertion/deletion at any point in the sequence.
*
*  @ingroup sequences
+   *  @headerfile forward_list
+   *  @since C++11
*
*  @tparam _Tp  Type of element.
*  @tparam _Alloc  Allocator type, defaults to allocator<_Tp>.
diff --git a/libstdc++-v3/include/bits/fs_dir.h 
b/libstdc++-v3/include/bits/fs_dir.h
index b4adf49e94a..9dd0f896e46 100644
--- a/libstdc++-v3/include/bits/fs_dir.h
+++ b/libstdc++-v3/include/bits/fs_dir.h
@@ -52,6 +52,10 @@ namespace filesystem
*/
 
   /// Information about a file's type and permissions.
+  /**
+   * @headerfile filesystem
+   * @since C++17
+   */
   class file_status
   {
   public:
@@ -94,6 +98,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   class recursive_directory_iterator;
 
   /// The value type used by directory iterators
+  /**
+   * @headerfile filesystem
+   * @since C++17
+   */
   class directory_entry
   {
   public:
@@ -354,7 +362,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 file_type  _M_type = file_type::none;
   };
 
+  /// @cond undocumented
+
   /// Proxy returned by post-increment on directory iterators.
+  /**
+   * @headerfile filesystem
+   * @since C++17
+   */
   struct __directory_iterator_proxy
   {
 const directory_entry& operator*() const& noexcept { return _M_entry; }
@@ -370,8 +384,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 directory_entry _M_entry;
   };
+  /// @endcond
 
   /// Iterator type for traversing the entries in a single directory.
+  /**
+   * @headerfile filesystem
+   * @since C++17
+   */
   class directory_iterator
   {
   public:
@@ -451,7 +470,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 std::__shared_ptr<_Dir> _M_dir;
   };
 
-  /// @relates std::filesystem::directory_iterator @{
+  /** @relates std::filesystem::directory_iterator
+   *  @headerfile filesystem
+   *  @since C++17
+   *  @{

[committed] libstdc++: Make std::random_device throw std::system_error [PR105081]

2023-04-27 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

This changes std::random_device constructors to throw std::system_error
(with EINVAL as the error code) when the constructor argument is
invalid. We can also throw std::system_error when read(2) fails so that
the exception includes the additional information provided by errno.

As noted in the PR, this is consistent with libc++, and doesn't break
any existing code which catches std::runtime_error, because those
handlers will still catch std::system_error.

libstdc++-v3/ChangeLog:

PR libstdc++/105081
* src/c++11/random.cc (__throw_syserr): New function.
(random_device::_M_init, random_device::_M_init_pretr1): Use new
function for bad tokens.
(random_device::_M_getval): Use new function for read errors.
* testsuite/util/testsuite_random.h (random_device_available):
Change catch handler to use std::system_error.
---
 libstdc++-v3/src/c++11/random.cc   | 18 --
 libstdc++-v3/testsuite/util/testsuite_random.h |  3 ++-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index ed2db4aef57..daf934808cc 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -26,6 +26,7 @@
 #define _CRT_RAND_S // define this before including  to get rand_s
 
 #include 
+#include 
 
 #ifdef  _GLIBCXX_USE_C99_STDINT_TR1
 
@@ -94,6 +95,11 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
   namespace
   {
+[[noreturn]]
+inline void
+__throw_syserr([[maybe_unused]] int e, [[maybe_unused]] const char* msg)
+{ _GLIBCXX_THROW_OR_ABORT(system_error(e, std::generic_category(), msg)); }
+
 #if USE_RDRAND
 unsigned int
 __attribute__ ((target("rdrnd")))
@@ -365,9 +371,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
   which = prng;
 #endif
 else
-  std::__throw_runtime_error(
- __N("random_device::random_device(const std::string&):"
- " unsupported token"));
+  std::__throw_syserr(EINVAL, __N("random_device::random_device"
+ "(const std::string&):"
+ " unsupported token"));
 
 #ifdef _GLIBCXX_USE_CRT_RAND_S
 if (which & rand_s)
@@ -508,8 +514,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
char* endptr;
seed = std::strtoul(nptr, &endptr, 0);
if (*nptr == '\0' || *endptr != '\0')
- std::__throw_runtime_error(__N("random_device::_M_init_pretr1"
-"(const std::string&)"));
+ std::__throw_syserr(EINVAL, __N("random_device::_M_init_pretr1"
+ "(const std::string&)"));
   }
 _M_mt.seed(seed);
 #else
@@ -582,7 +588,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
p = static_cast(p) + e;
  }
else if (e != -1 || errno != EINTR)
- __throw_runtime_error(__N("random_device could not be read"));
+ __throw_syserr(errno, __N("random_device could not be read"));
   }
 while (n > 0);
 #else // USE_POSIX_FILE_IO
diff --git a/libstdc++-v3/testsuite/util/testsuite_random.h 
b/libstdc++-v3/testsuite/util/testsuite_random.h
index 840294d01e1..763707bbfac 100644
--- a/libstdc++-v3/testsuite/util/testsuite_random.h
+++ b/libstdc++-v3/testsuite/util/testsuite_random.h
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 namespace __gnu_test
@@ -204,7 +205,7 @@ namespace __gnu_test
 try {
   std::random_device dev(token);
   return true;
-} catch (...) {
+} catch (const std::system_error& /* See PR libstdc++/105081 */) {
   return false;
 }
   }
-- 
2.40.0

Re: [PATCH] ipa/109607 - properly gimplify conversions introduced by IPA param manipulation

2023-04-27 Thread Martin Jambor

Hi,

On Thu, Apr 27 2023, Richard Biener wrote:
> The following addresses IPA param manipulation (through IPA SRA)
> replacing
>
>   BIT_FIELD_REF <*this_8(D), 8, 56>
>
> with
>
>   BIT_FIELD_REF (ISRA.814), 8, 
> 56>
>
> which is supposed to be invalid GIMPLE (ISRA.814 is a register).
> There's currently insufficient checking in place to catch this in the
> IL verifier but I am working on that as part of fixing PR109594.
>
> The solution for the particular testcase I am running into this is
> to split the conversion to a separate stmt.  Generally the modification
> phase is set up for this but the extra_stmts sequence isn't passed
> around everywhere.  The following passes it to modify_expression
> from modify_assignment when rewriting the RHS.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?

Yes, thank you!

Martin


>
> Thanks,
> Richard.
>
>   PR ipa/109607
>   * ipa-param-manipulation.h
>   (ipa_param_body_adjustments::modify_expression): Add extra_stmts
>   argument.
>   * ipa-param-manipulation.cc
>   (ipa_param_body_adjustments::modify_expression): Likewise.
>   When we need a conversion and the replacement is a register
>   split the conversion out.
>   (ipa_param_body_adjustments::modify_assignment): Pass
>   extra_stmts to RHS modify_expression.
>
>   * g++.dg/torture/pr109607.C: New testcase.

Re: [PATCH v2 #1/2] enable adjustment of return_pc debug attrs

2023-04-27 Thread Alexandre Oliva via Gcc-patches

On Apr 14, 2023, Alexandre Oliva  wrote:

> On Mar 23, 2023, Alexandre Oliva  wrote:
>> This patch introduces infrastructure for targets to add an offset to
>> the label issued after the call_insn to set the call_return_pc
>> attribute.  This will be used on rs6000, that sometimes issues another
>> instruction after the call proper as part of a call insn.

> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614452.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614453.html

Ping?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

1 2 >

1 - 100 of 113 matches

Mail list logo