Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-11 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>
>> 
>> Hi,
>> 
>> Richard Biener  writes:
>> 
>> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi,
>> >> 
>> >> Richard Biener  writes:
>> >> 
>> >> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
>> >> >
>> >> >> guojiufu  writes:
>> >> >> > Hi,
>> >> >> >
>> >> >> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >> >> 
>> >> >> >>> Hi,
>> >> >> >>> 
>> ...
>> >> >> >>> 
>> >> >> >>> This patch is raised when drafting below one.
>> >> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >> >> >>> try_const_anchors, and hits the assert/ice.
>> >> >> >>> 
>> >> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >> >> >>> Is this ok for trunk?
>> >> >> >> 
>> >> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) then
>> >> >> >> I suggest to instead fix try_const_anchors to change
>> >> >> >> 
>> >> >> >>   /* CONST_INT is used for CC modes, but we should leave those 
>> >> >> >> alone.  
>> >> >> >> */
>> >> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> >> >> return NULL_RTX;
>> >> >> >> 
>> >> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> >> >> 
>> >> >> >> to
>> >> >> >> 
>> >> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int mode 
>> >> >> >> alone.  */
>> >> >> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> >> >> return NULL_RTX;
>> >> >> >> 
>> >> >> >
>> >> >> > This is also able to fix this issue.  there is a "Punt on CC modes" 
>> >> >> > patch
>> >> >> > to return NULL_RTX in try_const_anchors.
>> >> >> >
>> >> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and 
>> >> >> >> whether
>> >> >> >> we should have fended this off earlier.  Can you share more complete
>> >> >> >> RTL of that stack_tie?
>> >> >> >
>> >> >> >
>> >> >> > (insn 15 14 16 3 (parallel [
>> >> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >> >> >  (const_int 0 [0]))
>> >> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >> >> >   (nil))
>> >> >> >
>> >> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> >> >> 
>> >> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] 
>> >> >> ...)
>> >> >> would be though.  It's arguably more accurate too, since the effect
>> >> >> on the stack locations is unspecified rather than predictable.
>> >> >
>> >> > powerpc seems to be the only port with a stack_tie that's not
>> >> > using an UNSPEC RHS.
>> >> In rs6000.md, it is
>> >> 
>> >> ; This is to explain that changes to the stack pointer should
>> >> ; not be moved over loads from or stores to stack memory.
>> >> (define_insn "stack_tie"
>> >>   [(match_parallel 0 "tie_operand"
>> >>  [(set (mem:BLK (reg 1)) (const_int 0))])]
>> >>   ""
>> >>   ""
>> >>   [(set_attr "length" "0")])
>> >> 
>> >> This would be just an placeholder insn, and acts as the comments.
>> >> UNSPEC_ would works like other targets.  While, I'm wondering
>> >> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> >> MODEs between SET_DEST and SET_SRC?
>> >
>> > I don't think the issue is the mode but the issue is that
>> > the patter as-is says some memory is zeroed while that's not
>> > actually true (not specifying a size means we can't really do
>> > anything with this MEM, but still).  Using an UNSPEC avoids
>> > implying anything for the stored value.
>> >
>> > Of course I think a MEM SET_DEST without a specified size is bougs
>> > as well, but there's larger precedent for this...
>> 
>> Thanks for your kindly comments!
>> Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
>> insn does not generate real thing (not a real store and no asm code),
>> may like barrier.
>> 
>> While I agree that, using UNSPEC may be more clear to avoid mis-reading.
>
> Btw, another way to avoid the issue in CSE is to make it not process
> (aka record anything for optimization) for SET from MEMs with
> !MEM_SIZE_KNOWN_P

Thanks! Yes, this would make sense.
Then, there are two ideas(patches) to handle this issue:
Which one would be preferable?  This one (from compiling time aspect)?

And maybe, the changes in rs6000 stack_tie through using unspec
can be a standalone enhancement besides cse patch.

Thanks for comments!

BR,
Jeff (Jiufu Guo)

 patch 1
diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..06ecdadecbc 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4271,6 +4271,8 @@ find_sets_in_insn (rtx_insn *insn, vec *psets)
 someplace else, so it isn't worth cse'ing.  */
   else if (GET_CODE (SET_SRC (x)) == CALL)
;
+  else if (MEM_P (SET_DEST (x)) && !MEM_SIZE_KNOWN_P (SET_DEST (x)))
+   ;
   else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
   && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != 

[PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-11 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:   v1 = MEM (...)
  for (int i = 0; i < 4; i++)   v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2 
MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:   mask = comparison
  for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i])v2 = LEN_MASK_LOAD (length = VF, mask) 
  a[i] = b[i] + c[i];   v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
   
Case 3 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask 
= {-1,-1,...}, v3)

Case 4 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   mask = comparison
  if (cond[i])  v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, 
mask, v3)

More features:
1. Support simplify gimple fold for LEN_MASK_ LOAD/STORE:
   LEN_MASK_STORE (length = vf, mask = {-1,-1,...}, v) ===> MEM [...] = V
2. Allow DSE for LEN_MASK_* LOAD/STORE.

Bootstrap && Regression on X86 with no surprise difference.

gcc/ChangeLog:

* doc/md.texi: Add LEN_MASK_ LOAD/STORE.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* gimple-fold.cc (arith_overflowed_p): Ditto.
(gimple_fold_partial_load_store_mem_ref): Ditto.
(gimple_fold_partial_store): Ditto.
(gimple_fold_call): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs-query.cc (can_vec_len_mask_load_store_p): Ditto.
* optabs-query.h (can_vec_len_mask_load_store_p): Ditto.
* optabs.def (OPTAB_CD): Ditto.
* tree-data-ref.cc (get_references_in_stmt): Ditto.
* tree-if-conv.cc (ifcvt_can_use_mask_load_store): Ditto.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Ditto.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto.
(dse_optimize_stmt): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Ditto.
* tree-vect-data-refs.cc (can_group_stmts_p): Ditto.
(vect_find_stmt_data_reference): Ditto.
(vect_supportable_dr_alignment): Ditto.
* tree-vect-loop.cc (vect_verify_loop_lens): Ditto.
(optimize_mask_stores): Ditto.
* tree-vect-slp.cc (vect_get_operand_map): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): 

Re: [PATCH] In the pipeline, UNRECOG INSN is not executed in advance if it starts a live range.

2023-06-11 Thread Jin Ma via Gcc-patches
> On 5/29/23 04:51, Jin Ma wrote:
> >Unrecog insns (such as CLOBBER, USE) does not represent real 
> > instructions, but in the
> > process of pipeline optimization, they will wait for transmission in ready 
> > list like
> > other insns, without considering resource conflicts and cycles. This 
> > results in a
> > multi-issue CPU architecture that can be issued at any time if other 
> > regular insns
> > have resource conflicts or cannot be launched for other reasons. As a 
> > result, its
> > position is advanced in the generated insns sequence, which will affect 
> > register
> > allocation and often lead to more redundant mov instructions.
> > 
> > A simple example:
> > https://github.com/majin2020/gcc-test/blob/master/test.c
> > This is a function in the dhrystone benchmark.
> > 
> > https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
> > This is a log of the pass 'sched1' When issue_rate == 2. Among them, insn 
> > 13 and 14 are
> > much ahead of schedule, which risks generating redundant mov instructions, 
> > which seems
> > unreasonable.
> > 
> > Therefore, I submit patch again on the basis of the last review opinions to 
> > try to solve
> > this problem.
> > 
> > This is the new log of shed1 after patch is added.
> > https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd
> > 
> > gcc/ChangeLog:
> > 
> >  * haifa-sched.cc (unrecog_insn_for_forw_only_p): New.
> >  (prune_ready_list): UNRECOG INSN is not executed in advance if it 
> > starts a
> > live range.
> > ---
> >   gcc/haifa-sched.cc | 44 +++-
> >   1 file changed, 39 insertions(+), 5 deletions(-)
> > 
> > diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> > index 2c881ede0ec..205680a4936 100644
> > --- a/gcc/haifa-sched.cc
> > +++ b/gcc/haifa-sched.cc
> > @@ -765,6 +765,23 @@ real_insn_for_shadow (rtx_insn *insn)
> > return pair->i1;
> >   }
> >   
> > +/* Return true if INSN is unrecog that starts a live range.  */
> I would rewrite this as
> 
> /* Return TRUE if INSN (a USE or CLOBBER) starts a new live
> range, FALSE otherwise.  */

Ok.

> > +
> > +static bool
> > +unrecog_insn_for_forw_only_p (rtx_insn *insn)
> I would call this "use_or_clobber_starts_range_p" or something like that.

Ok.

> > +{
> > +  if (insn && !INSN_P (insn) && recog_memoized (insn) >= 0)
> > +return false;
> I would drop the test that INSN is not NULL in this test.  There's no 
> way it can ever be NULL here.
> 
> If you really want to check that, then I'd do something like
> 
> gcc_assert (INSN);
> 
> Instead of checking it in that condition.

Ok.

> > @@ -6320,11 +6337,28 @@ prune_ready_list (state_t temp_state, bool 
> > first_cycle_insn_p,
> > }
> >   else if (recog_memoized (insn) < 0)
> > {
> > - if (!first_cycle_insn_p
> > - && (GET_CODE (PATTERN (insn)) == ASM_INPUT
> > - || asm_noperands (PATTERN (insn)) >= 0))
> > -   cost = 1;
> > - reason = "asm";
> > + if (GET_CODE (PATTERN (insn)) == ASM_INPUT
> > + || asm_noperands (PATTERN (insn)) >= 0)
> > +   {
> > + reason = "asm";
> > + if (!first_cycle_insn_p)
> > +   cost = 1;
> > +   }
> > + else if (unrecog_insn_for_forw_only_p (insn))
> > +   {
> > + reason = "unrecog insn";
> > + if (!first_cycle_insn_p)
> > +   cost = 1;
> > + else
> > +   {
> > + int j = i;
> > + while (n > ++j)
> > +   if (!unrecog_insn_for_forw_only_p (ready_element 
> > (, j)))
> > + break;
> > +
> > + cost = (j == n) ? 0 : 1;
> > +   }
> Why do you need a different cost based on what's in the ready list? 
> Isn't the only property we're looking for whether or not the USE/CLOBBER 
> opens a live range?
> 
> Jeff

For this, I found that if I only look for the USE/CLOBBER  that opens a live 
range,
when there is only the USE/CLOBBERs left in the ready list, there will be an 
infinite
loop, because we will always postpone it to the next cycle(cost = 1), causing 
it to
never be emitted and always be in the ready list.

So I think (may not be correct) when there is only the USE/CLOBBERs left in the 
ready
list, the cost should be set to 0, and the USE/CLOBBER can be emitted 
immediately.

Maybe there's a better way?

RE: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API

2023-06-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

From: Kito Cheng 
Sent: Monday, June 12, 2023 11:33 AM
To: 钟居哲 
Cc: Li, Pan2 ; gcc-patches ; 
rdapp.gcc ; Jeff Law ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API

Lgtm too :)

钟居哲 mailto:juzhe.zh...@rivai.ai>> 於 2023年6月12日 週一 05:48 
寫道:
LGTM



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-11 08:33
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; 
yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch support the intrinsic API of FP16 ZVFHMIN vlmul ext. Aka:

vfloat16*_t <==> vfloat16*_t.

From the user's perspective, it is reasonable to do some type convert
between vfloat16*_t and vfloat16*_t when only ZVFHMIN is enabled.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add type to X2/X4/X8/X16/X32 vlmul ext ops.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add new test cases.
---
.../riscv/riscv-vector-builtins-types.def | 15 ++
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 18 +--
.../riscv/rvv/base/zvfhmin-intrinsic.c| 54 +--
3 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 589ea532727..db8e61fea6a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -978,6 +978,11 @@ DEF_RVV_X2_VLMUL_EXT_OPS (vuint32m4_t, 0)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
@@ -1014,6 +1019,10 @@ DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m1_t, 0)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m2_t, 0)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
@@ -1040,6 +1049,9 @@ DEF_RVV_X8_VLMUL_EXT_OPS (vuint16m1_t, 0)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint32m1_t, 0)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
@@ -1056,6 +1068,8 @@ DEF_RVV_X16_VLMUL_EXT_OPS (vuint8mf2_t, 0)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf2_t, 0)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X16_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
@@ -1064,6 +1078,7 @@ DEF_RVV_X32_VLMUL_EXT_OPS (vint16mf4_t, 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf4_t, 0)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X32_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)

RE: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul trunc

2023-06-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

From: Kito Cheng 
Sent: Monday, June 12, 2023 11:32 AM
To: 钟居哲 
Cc: Li, Pan2 ; gcc-patches ; Robin 
Dapp ; jeffreyalaw ; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul 
trunc

LGTM

juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>> 於 2023年6月12日 週一 10:58 寫道:
LGTM.



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-12 10:57
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; 
yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul 
trunc
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to add more tests for RVV FP16 undef and vlmul
trunc, aka

__riscv_vundefined_f16*();
__riscv_vlmul_trunc_v_f16*_f16*();

From the user's perspective, it is reasonable to do above operation
when only ZVFHMIN is enabled. This patch would like to add new test
cases to make sure the RVV FP16 vreinterpret works well as expected.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
---
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 28 ++--
.../riscv/rvv/base/zvfhmin-intrinsic.c| 66 +++
2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
index ff9e0156a68..c3ed4191a36 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
@@ -45,15 +45,33 @@ vfloat16m8_t test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t 
op1) {
   return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
}
+vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
+  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
+}
+
+vfloat16m4_t test_vlmul_trunc_v_f16m8_f16m4(vfloat16m8_t op1) {
+  return __riscv_vlmul_trunc_v_f16m8_f16m4(op1);
+}
+
+vfloat16mf4_t test_vundefined_f16mf4() {
+  return __riscv_vundefined_f16mf4();
+}
+
+vfloat16m8_t test_vundefined_f16m8() {
+  return __riscv_vundefined_f16m8();
+}
+
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 6 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 8 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 2 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
/* { dg-final { scan-assembler-times {vfwcvt\.f\.f\.v\s+v[0-9]+,\s*v[0-9]+} 2 } 
} */
/* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 } 
} */
-/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 6 } 
} */
-/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
4 } } */
+/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 } 
} */
+/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
6 } } */
+/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
/* { dg-final { scan-assembler-times {vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
1 } } */
-/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 4 
} } */
+/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 5 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
index 68720e64926..8d39a2ed4c2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
@@ -121,26 +121,70 @@ vfloat16m8_t test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t 
op1) {
   return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
}
+vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
+  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m1_f16mf4(vfloat16m1_t op1) {
+  return __riscv_vlmul_trunc_v_f16m1_f16mf4(op1);
+}
+
+vfloat16mf2_t test_vlmul_trunc_v_f16m1_f16mf2(vfloat16m1_t op1) {
+  return __riscv_vlmul_trunc_v_f16m1_f16mf2(op1);
+}
+
+vfloat16mf4_t 

Re: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API

2023-06-11 Thread Kito Cheng via Gcc-patches
Lgtm too :)

钟居哲  於 2023年6月12日 週一 05:48 寫道:

> LGTM
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-11 08:33
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang;
> kito.cheng
> Subject: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API
> From: Pan Li 
>
> This patch support the intrinsic API of FP16 ZVFHMIN vlmul ext. Aka:
>
> vfloat16*_t <==> vfloat16*_t.
>
> From the user's perspective, it is reasonable to do some type convert
> between vfloat16*_t and vfloat16*_t when only ZVFHMIN is enabled.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat16mf4_t): Add type to X2/X4/X8/X16/X32 vlmul ext ops.
> (vfloat16mf2_t): Ditto.
> (vfloat16m1_t): Ditto.
> (vfloat16m2_t): Ditto.
> (vfloat16m4_t): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
> * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add new test cases.
> ---
> .../riscv/riscv-vector-builtins-types.def | 15 ++
> .../riscv/rvv/base/zvfh-over-zvfhmin.c| 18 +--
> .../riscv/rvv/base/zvfhmin-intrinsic.c| 54 +--
> 3 files changed, 79 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index 589ea532727..db8e61fea6a 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -978,6 +978,11 @@ DEF_RVV_X2_VLMUL_EXT_OPS (vuint32m4_t, 0)
> DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 |
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 |
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
> @@ -1014,6 +1019,10 @@ DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m1_t, 0)
> DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m2_t, 0)
> DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
> DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 |
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 |
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
> @@ -1040,6 +1049,9 @@ DEF_RVV_X8_VLMUL_EXT_OPS (vuint16m1_t, 0)
> DEF_RVV_X8_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X8_VLMUL_EXT_OPS (vuint32m1_t, 0)
> DEF_RVV_X8_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 |
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 |
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_X8_VLMUL_EXT_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
> @@ -1056,6 +1068,8 @@ DEF_RVV_X16_VLMUL_EXT_OPS (vuint8mf2_t, 0)
> DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf2_t, 0)
> DEF_RVV_X16_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 |
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_X16_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 |
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X32_VLMUL_EXT_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
> @@ -1064,6 +1078,7 @@ DEF_RVV_X32_VLMUL_EXT_OPS (vint16mf4_t,
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf4_t, 0)
> DEF_RVV_X32_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_X32_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 |
> RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X64_VLMUL_EXT_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
> DEF_RVV_X64_VLMUL_EXT_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> 

Re: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul trunc

2023-06-11 Thread Kito Cheng via Gcc-patches
LGTM

juzhe.zh...@rivai.ai  於 2023年6月12日 週一 10:58 寫道:

> LGTM.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-12 10:57
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang;
> kito.cheng
> Subject: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and
> vlmul trunc
> From: Pan Li 
>
> This patch would like to add more tests for RVV FP16 undef and vlmul
> trunc, aka
>
> __riscv_vundefined_f16*();
> __riscv_vlmul_trunc_v_f16*_f16*();
>
> From the user's perspective, it is reasonable to do above operation
> when only ZVFHMIN is enabled. This patch would like to add new test
> cases to make sure the RVV FP16 vreinterpret works well as expected.
>
> Signed-off-by: Pan Li 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add test cases.
> * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
> ---
> .../riscv/rvv/base/zvfh-over-zvfhmin.c| 28 ++--
> .../riscv/rvv/base/zvfhmin-intrinsic.c| 66 +++
> 2 files changed, 78 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> index ff9e0156a68..c3ed4191a36 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> @@ -45,15 +45,33 @@ vfloat16m8_t
> test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t op1) {
>return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
> }
> +vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
> +  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
> +}
> +
> +vfloat16m4_t test_vlmul_trunc_v_f16m8_f16m4(vfloat16m8_t op1) {
> +  return __riscv_vlmul_trunc_v_f16m8_f16m4(op1);
> +}
> +
> +vfloat16mf4_t test_vundefined_f16mf4() {
> +  return __riscv_vundefined_f16mf4();
> +}
> +
> +vfloat16m8_t test_vundefined_f16m8() {
> +  return __riscv_vundefined_f16m8();
> +}
> +
> /* { dg-final { scan-assembler-times
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
> /* { dg-final { scan-assembler-times
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
> /* { dg-final { scan-assembler-times
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
> -/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 6 } } */
> -/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 2 } } */
> /* { dg-final { scan-assembler-times
> {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
> /* { dg-final { scan-assembler-times
> {vfwcvt\.f\.f\.v\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> /* { dg-final { scan-assembler-times
> {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 } } */
> -/* { dg-final { scan-assembler-times
> {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 6 } } */
> -/* { dg-final { scan-assembler-times
> {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 } } */
> +/* { dg-final { scan-assembler-times
> {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> /* { dg-final { scan-assembler-times
> {vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> -/* { dg-final { scan-assembler-times
> {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 5 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
> index 68720e64926..8d39a2ed4c2 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
> @@ -121,26 +121,70 @@ vfloat16m8_t
> test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t op1) {
>return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
> }
> +vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
> +  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
> +}
> +
> +vfloat16mf4_t test_vlmul_trunc_v_f16m1_f16mf4(vfloat16m1_t op1) {
> +  return __riscv_vlmul_trunc_v_f16m1_f16mf4(op1);
> +}
> +
> +vfloat16mf2_t test_vlmul_trunc_v_f16m1_f16mf2(vfloat16m1_t op1) {
> +  return __riscv_vlmul_trunc_v_f16m1_f16mf2(op1);
> +}
> +
> +vfloat16mf4_t test_vlmul_trunc_v_f16m2_f16mf4(vfloat16m2_t op1) {
> +  return __riscv_vlmul_trunc_v_f16m2_f16mf4(op1);
> +}
> +
> +vfloat16m1_t test_vlmul_trunc_v_f16m2_f16m1(vfloat16m2_t op1) {
> +  return __riscv_vlmul_trunc_v_f16m2_f16m1(op1);
> +}
> +

[r14-1624 Regression] FAIL: std/time/year_month_day_last/1.cc (test for excess errors) on Linux/x86_64

2023-06-11 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

28db36e2cfca1b7106adc8d371600fa3a325c4e2 is the first bad commit
commit 28db36e2cfca1b7106adc8d371600fa3a325c4e2
Author: Jason Merrill 
Date:   Wed Jun 7 05:15:02 2023 -0400

c++: allow NRV and non-NRV returns [PR58487]

caused

FAIL: 25_algorithms/minmax/constrained.cc (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth10.C  -std=gnu++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth10.C  -std=gnu++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth12.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth12.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth13.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth13.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth14.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth14.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth1a.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth1a.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth1.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth1.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth2a.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth2a.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth2b.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth2b.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth2.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth2.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth4.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth4.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-synth5.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-synth5.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/cpp2a/spaceship-weak1.C  -std=c++20 (internal compiler error: 
Segmentation fault)
FAIL: g++.dg/cpp2a/spaceship-weak1.C  -std=c++20 (test for excess errors)
FAIL: std/time/month_day/1.cc (test for excess errors)
FAIL: std/time/month_day_last/1.cc (test for excess errors)
FAIL: std/time/year_month/1.cc (test for excess errors)
FAIL: std/time/year_month_day/1.cc (test for excess errors)
FAIL: std/time/year_month_day/4.cc (test for excess errors)
FAIL: std/time/year_month_day_last/1.cc (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-1624/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=25_algorithms/minmax/constrained.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=25_algorithms/minmax/constrained.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth10.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth10.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth12.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth12.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth13.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth13.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth14.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth14.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth1a.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth1a.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth1.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth1.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd 

RE: [PATCH v4] RISC-V: Add vector psabi checking.

2023-06-11 Thread Wang, Yanzhang via Gcc-patches
I reproduce the failure too. Because it returns early in get_arg_info for
v-ext mode. I'll move the checking to the beginning.

> -Original Message-
> From: Kito Cheng 
> Sent: Friday, June 9, 2023 5:52 PM
> To: Wang, Yanzhang 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com;
> Li, Pan2 
> Subject: Re: [PATCH v4] RISC-V: Add vector psabi checking.
> 
> Hmmm, I still saw some fail on testsuite after applying this patch, most
> are because the testcase has used vector type as argument or return value,
> but .. vector-abi-1.c should not fail I think?
> 
> For other fails, I would suggest you could just add -Wno-psabi to rvv.exp
> 
> === gcc: Unexpected fails for rv64imafdcv lp64d medlow ===
> FAIL: gcc.target/riscv/vector-abi-1.c   -O0   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c   -O1   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c   -O2   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c   -O2 -flto
> -fno-use-linker-plugin -flto-partition=none   (test for warnings, line
> 7)
> FAIL: gcc.target/riscv/vector-abi-1.c   -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c   -O3 -g   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c   -Os   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c  -Og -g   (test for warnings, line 7)
> FAIL: gcc.target/riscv/vector-abi-1.c  -Oz   (test for warnings, line 7)
> FAIL: gcc.target/riscv/rvv/base/binop_vx_constraint-120.c (test for excess
> errors)
> FAIL: gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c (test for
> excess errors)
> FAIL: gcc.target/riscv/rvv/base/mask_insn_shortcut.c (test for excess
> errors)
> FAIL: gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c (test for
> excess errors)
> FAIL: gcc.target/riscv/rvv/base/pr110109-2.c (test for excess errors)
> FAIL: gcc.target/riscv/rvv/base/scalar_move-9.c (test for excess errors)
> FAIL: gcc.target/riscv/rvv/base/vlmul_ext-1.c (test for excess errors)
> FAIL: gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c
> (test for excess errors)
> FAIL: gcc.target/riscv/rvv/base/zvfh-intrinsic.c (test for excess errors)
> FAIL: gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c (test for excess errors)
> FAIL: gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c (test for excess errors)
> 
>   = Summary of gcc testsuite =
>| # of unexpected case / # of unique unexpected
> case
>|  gcc |  g++ | gfortran |
> rv32imafdc/ ilp32d/ medlow |   20 /12 |0 / 0 |0 / 0 |
> rv32imafdcv/ ilp32d/ medlow |   25 /14 |   22 /22 |0 / 0 |
> rv64imafdc/  lp64d/ medlow |   20 /12 |0 / 0 |0 / 0 |
> rv64imafdcv/  lp64d/ medlow |   20 /12 |   21 /21 |0 / 0 |
> 
> On Fri, Jun 9, 2023 at 2:02 PM yanzhang.wang--- via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > From: Yanzhang Wang 
> >
> > This patch adds support to check function's argument or return is
> > vector type and throw warning if yes.
> >
> > There're two exceptions,
> >   - The vector_size attribute.
> >   - The intrinsic functions.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-protos.h (riscv_init_cumulative_args): Set
> >   warning flag if func is not builtin
> > * config/riscv/riscv.cc
> > (riscv_scalable_vector_type_p): Determine whether the type is
> scalable vector.
> > (riscv_arg_has_vector): Determine whether the arg is vector type.
> > (riscv_pass_in_vector_p): Check the vector type param is passed
> by value.
> > (riscv_init_cumulative_args): The same as header.
> > (riscv_get_arg_info): Add the checking.
> > (riscv_function_value): Check the func return and set warning
> flag
> > * config/riscv/riscv.h (INIT_CUMULATIVE_ARGS): Add a flag to
> >   determine whether warning psabi or not.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/vector-abi-1.c: New test.
> > * gcc.target/riscv/vector-abi-2.c: New test.
> > * gcc.target/riscv/vector-abi-3.c: New test.
> > * gcc.target/riscv/vector-abi-4.c: New test.
> > * gcc.target/riscv/vector-abi-5.c: New test.
> > * gcc.target/riscv/vector-abi-6.c: New test.
> >
> > Signed-off-by: Yanzhang Wang 
> > Co-authored-by: Kito Cheng 
> > ---
> >  gcc/config/riscv/riscv-protos.h   |   2 +
> >  gcc/config/riscv/riscv.cc | 112 +-
> >  gcc/config/riscv/riscv.h  |   5 +-
> >  gcc/testsuite/gcc.target/riscv/vector-abi-1.c |  14 +++
> > gcc/testsuite/gcc.target/riscv/vector-abi-2.c |  15 +++
> > gcc/testsuite/gcc.target/riscv/vector-abi-3.c |  14 +++
> > gcc/testsuite/gcc.target/riscv/vector-abi-4.c |  16 +++
> > 

Re: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul trunc

2023-06-11 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-12 10:57
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul 
trunc
From: Pan Li 
 
This patch would like to add more tests for RVV FP16 undef and vlmul
trunc, aka
 
__riscv_vundefined_f16*();
__riscv_vlmul_trunc_v_f16*_f16*();
 
From the user's perspective, it is reasonable to do above operation
when only ZVFHMIN is enabled. This patch would like to add new test
cases to make sure the RVV FP16 vreinterpret works well as expected.
 
Signed-off-by: Pan Li 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
---
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 28 ++--
.../riscv/rvv/base/zvfhmin-intrinsic.c| 66 +++
2 files changed, 78 insertions(+), 16 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
index ff9e0156a68..c3ed4191a36 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
@@ -45,15 +45,33 @@ vfloat16m8_t test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t 
op1) {
   return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
}
+vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
+  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
+}
+
+vfloat16m4_t test_vlmul_trunc_v_f16m8_f16m4(vfloat16m8_t op1) {
+  return __riscv_vlmul_trunc_v_f16m8_f16m4(op1);
+}
+
+vfloat16mf4_t test_vundefined_f16mf4() {
+  return __riscv_vundefined_f16mf4();
+}
+
+vfloat16m8_t test_vundefined_f16m8() {
+  return __riscv_vundefined_f16m8();
+}
+
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 6 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 8 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 2 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
/* { dg-final { scan-assembler-times {vfwcvt\.f\.f\.v\s+v[0-9]+,\s*v[0-9]+} 2 } 
} */
/* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 } 
} */
-/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 6 } 
} */
-/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
4 } } */
+/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 } 
} */
+/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
6 } } */
+/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
/* { dg-final { scan-assembler-times {vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
1 } } */
-/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 4 
} } */
+/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 5 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
index 68720e64926..8d39a2ed4c2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
@@ -121,26 +121,70 @@ vfloat16m8_t test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t 
op1) {
   return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
}
+vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
+  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m1_f16mf4(vfloat16m1_t op1) {
+  return __riscv_vlmul_trunc_v_f16m1_f16mf4(op1);
+}
+
+vfloat16mf2_t test_vlmul_trunc_v_f16m1_f16mf2(vfloat16m1_t op1) {
+  return __riscv_vlmul_trunc_v_f16m1_f16mf2(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m2_f16mf4(vfloat16m2_t op1) {
+  return __riscv_vlmul_trunc_v_f16m2_f16mf4(op1);
+}
+
+vfloat16m1_t test_vlmul_trunc_v_f16m2_f16m1(vfloat16m2_t op1) {
+  return __riscv_vlmul_trunc_v_f16m2_f16m1(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m4_f16mf4(vfloat16m4_t op1) {
+  return __riscv_vlmul_trunc_v_f16m4_f16mf4(op1);
+}
+
+vfloat16m2_t test_vlmul_trunc_v_f16m4_f16m2(vfloat16m4_t op1) {
+  return __riscv_vlmul_trunc_v_f16m4_f16m2(op1);
+}
+
+vfloat16mf4_t 

[PATCH v1] RISC-V: Add test cases for RVV FP16 undefined and vlmul trunc

2023-06-11 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to add more tests for RVV FP16 undef and vlmul
trunc, aka

__riscv_vundefined_f16*();
__riscv_vlmul_trunc_v_f16*_f16*();

>From the user's perspective, it is reasonable to do above operation
when only ZVFHMIN is enabled. This patch would like to add new test
cases to make sure the RVV FP16 vreinterpret works well as expected.

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
---
 .../riscv/rvv/base/zvfh-over-zvfhmin.c| 28 ++--
 .../riscv/rvv/base/zvfhmin-intrinsic.c| 66 +++
 2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
index ff9e0156a68..c3ed4191a36 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
@@ -45,15 +45,33 @@ vfloat16m8_t test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t 
op1) {
   return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
 }
 
+vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
+  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
+}
+
+vfloat16m4_t test_vlmul_trunc_v_f16m8_f16m4(vfloat16m8_t op1) {
+  return __riscv_vlmul_trunc_v_f16m8_f16m4(op1);
+}
+
+vfloat16mf4_t test_vundefined_f16mf4() {
+  return __riscv_vundefined_f16mf4();
+}
+
+vfloat16m8_t test_vundefined_f16m8() {
+  return __riscv_vundefined_f16m8();
+}
+
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 6 } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 8 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 2 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
 /* { dg-final { scan-assembler-times {vfwcvt\.f\.f\.v\s+v[0-9]+,\s*v[0-9]+} 2 
} } */
 /* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 
} } */
-/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 6 } 
} */
-/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
4 } } */
+/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 } 
} */
+/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
6 } } */
+/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
 /* { dg-final { scan-assembler-times 
{vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
-/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 4 
} } */
+/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 5 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
index 68720e64926..8d39a2ed4c2 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
@@ -121,26 +121,70 @@ vfloat16m8_t test_vlmul_ext_v_f16mf4_f16m8(vfloat16mf4_t 
op1) {
   return __riscv_vlmul_ext_v_f16mf4_f16m8(op1);
 }
 
+vfloat16mf4_t test_vlmul_trunc_v_f16mf2_f16mf4(vfloat16mf2_t op1) {
+  return __riscv_vlmul_trunc_v_f16mf2_f16mf4(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m1_f16mf4(vfloat16m1_t op1) {
+  return __riscv_vlmul_trunc_v_f16m1_f16mf4(op1);
+}
+
+vfloat16mf2_t test_vlmul_trunc_v_f16m1_f16mf2(vfloat16m1_t op1) {
+  return __riscv_vlmul_trunc_v_f16m1_f16mf2(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m2_f16mf4(vfloat16m2_t op1) {
+  return __riscv_vlmul_trunc_v_f16m2_f16mf4(op1);
+}
+
+vfloat16m1_t test_vlmul_trunc_v_f16m2_f16m1(vfloat16m2_t op1) {
+  return __riscv_vlmul_trunc_v_f16m2_f16m1(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m4_f16mf4(vfloat16m4_t op1) {
+  return __riscv_vlmul_trunc_v_f16m4_f16mf4(op1);
+}
+
+vfloat16m2_t test_vlmul_trunc_v_f16m4_f16m2(vfloat16m4_t op1) {
+  return __riscv_vlmul_trunc_v_f16m4_f16m2(op1);
+}
+
+vfloat16mf4_t test_vlmul_trunc_v_f16m8_f16mf4(vfloat16m8_t op1) {
+  return __riscv_vlmul_trunc_v_f16m8_f16mf4(op1);
+}
+
+vfloat16m4_t test_vlmul_trunc_v_f16m8_f16m4(vfloat16m8_t op1) {
+  return __riscv_vlmul_trunc_v_f16m8_f16m4(op1);
+}
+

[PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-11 Thread juzhe . zhong
From: Juzhe-Zhong 

Optimize the following auto-vectorization codes:
void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
{
for (int i = 0; i < n; i++)
  a[i] = b[i] >> c;
}

Before this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a4,zero,e32,m1,ta,ma
vsra.vx v1,v1,a2
vsetvli zero,zero,e16,mf2,ta,ma
sllia7,a5,2
vncvt.x.x.w v1,v1
sllia6,a5,1
vsetvli zero,a5,e16,mf2,ta,ma
sub a3,a3,a5
vse16.v v1,0(a0)
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

After this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a7,zero,e16,mf2,ta,ma
sllia6,a5,2
vnsra.wxv1,v1,a2
sllia4,a5,1
vsetvli zero,a5,e16,mf2,ta,ma
sub a3,a3,a5
vse16.v v1,0(a0)
add a1,a1,a6
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec-opt.md 
(*vtrunc): New pattern.
(*trunc): Ditto.
* config/riscv/autovec.md (3): Change to 
define_insn_and_split.
(v3): Ditto.
(trunc2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 46 +
 gcc/config/riscv/autovec.md   | 43 ++--
 .../riscv/rvv/autovec/binop/narrow-1.c| 31 
 .../riscv/rvv/autovec/binop/narrow-2.c| 32 
 .../riscv/rvv/autovec/binop/narrow-3.c| 31 
 .../riscv/rvv/autovec/binop/narrow_run-1.c| 50 +++
 .../riscv/rvv/autovec/binop/narrow_run-2.c| 46 +
 .../riscv/rvv/autovec/binop/narrow_run-3.c| 46 +
 8 files changed, 311 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 7bb93eed220..aef28e445e1 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -330,3 +330,49 @@
   }
   [(set_attr "type" "viwmuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [INT] Binary narrow shifts.
+;; -
+;; Includes:
+;; - vnsrl.wv/vnsrl.wx/vnsrl.wi
+;; - vnsra.wv/vnsra.wx/vnsra.wi
+;; -
+
+(define_insn_and_split "*vtrunc"
+  [(set (match_operand: 0 "register_operand"   "=vr,vr")
+(truncate:
+  (any_shiftrt:VWEXTI
+(match_operand:VWEXTI 1 "register_operand" " vr,vr")
+   (any_extend:VWEXTI
+  (match_operand: 2 "vector_shift_operand" " 
vr,vk")]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_narrow (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+  DONE;
+}
+ [(set_attr "type" "vnshift")
+  (set_attr "mode" "")])
+
+(define_insn_and_split "*trunc"
+  [(set (match_operand: 0 "register_operand" "=vr")
+(truncate:
+  (any_shiftrt:VWEXTI
+(match_operand:VWEXTI 1 "register_operand"   " vr")
+   (match_operand: 2 "csr_operand" " rK"]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  operands[2] = gen_lowpart (Pmode, operands[2]);
+  insn_code icode = code_for_pred_narrow_scalar (, 
mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+  DONE;
+}
+ [(set_attr "type" "vnshift")
+  (set_attr "mode" "")])
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b7070099f29..eadc2c5b595 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -150,18 +150,23 @@
 ;; - vsll.vi/vsra.vi/vsrl.vi
 ;; 

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-11 Thread Jiufu Guo via Gcc-patches


Hi,

Thanks for your comments!

Segher Boessenkool  writes:

> Hi!
>
> On Wed, Jun 07, 2023 at 04:21:11PM +0800, Jiufu Guo wrote:
>> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
>> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
>> wrap/overflow/underflow. So, it is valid that "X - N * M" does
>> not cross zero and does not wrap/overflow/underflow.
>
> Is it ever valid semi-generally when N does not divide X?

It is valid only if there is no wrap/overflow/underflow, and the sign
of "X" and "X-N*M" are the same.  Under this condition, N,M and X can be
any value. 

>
> Say X=5, N=2, M=3.  Then the original expression evaluates to 0, but the
> new one to -1.  Whenever one of the divisions rounds up and the other
> rounds down you have this problem.
You are right.  Since '/' is always towards zero, so, 'X' and 'X-N*M'
should have the same sign bit.  Otherwise, one rounds up, the other
rounds down, then the transform is invalid.

BR,
Jeff (Jiufu Guo)
>
>
> Segher


[PATCHv2, rs6000] Add two peephole2 patterns for mr. insn

2023-06-11 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch adds two peephole2 patterns which help convert certain insn
sequences to "mr." instruction. These insn sequences can't be combined in
combine pass.

  Compared to last version, it adds a new mode iterator "Q" which should
be used for dot instruction. With "-m32/-mpowerpc64" set, the dot
instruction should compare DImode with 0, not the SImode.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Add two peephole patterns for "mr." insn

When investigating the issue mentioned in PR87871#c30 - if compare
and move pattern benefits before RA, I checked the assembly generated
for SPEC2017 and found that certain insn sequences aren't converted to
"mr." instructions.
Following two sequence are never to be combined to "mr." pattern as
there is no register link between them. This patch adds two peephole2
patterns to convert them to "mr." instructions.

cmp 0,3,0
mr 4,3

mr 4,3
cmp 0,3,0

The patch also creates a new mode iterator which decided by
TARGET_POWERPC64.  This mode iterator is used in "mr." and its split
pattern.  The original P iterator is wrong when -m32/-mpowerpc64 is set.
In this situation, the "mr." should compares the whole 64-bit register
with 0 other than the low 32-bit one.

gcc/
* config/rs6000/rs6000.md (peephole2 for compare_and_move): New.
(peephole2 for move_and_compare): New.
(mode_iterator Q): New.  Set the mode to SI/DImode by
TARGET_POWERPC64.
(*mov_internal2): Change the mode iterator from P to Q.
(split pattern for compare_and_move): Likewise.

gcc/testsuite/
* gcc.dg/rtl/powerpc/move_compare_peephole_32.c: New.
* gcc.dg/rtl/powerpc/move_compare_peephole_64.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..fdb5b6ed22a 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -491,6 +491,7 @@ (define_mode_iterator SDI [SI DI])
 ; The size of a pointer.  Also, the size of the value that a record-condition
 ; (one with a '.') will compare; and the size used for arithmetic carries.
 (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
+(define_mode_iterator Q [(SI "!TARGET_POWERPC64") (DI "TARGET_POWERPC64")])

 ; Iterator to add PTImode along with TImode (TImode can go in VSX registers,
 ; PTImode is GPR only)
@@ -7879,9 +7880,9 @@ (define_split

 (define_insn "*mov_internal2"
   [(set (match_operand:CC 2 "cc_reg_operand" "=y,x,?y")
-   (compare:CC (match_operand:P 1 "gpc_reg_operand" "0,r,r")
+   (compare:CC (match_operand:Q 1 "gpc_reg_operand" "0,r,r")
(const_int 0)))
-   (set (match_operand:P 0 "gpc_reg_operand" "=r,r,r") (match_dup 1))]
+   (set (match_operand:Q 0 "gpc_reg_operand" "=r,r,r") (match_dup 1))]
   ""
   "@
cmpi %2,%0,0
@@ -7891,11 +7892,41 @@ (define_insn "*mov_internal2"
(set_attr "dot" "yes")
(set_attr "length" "4,4,8")])

+(define_peephole2
+  [(set (match_operand:CC 2 "cc_reg_operand" "")
+   (compare:CC (match_operand:Q 1 "int_reg_operand" "")
+   (const_int 0)))
+   (set (match_operand:Q 0 "int_reg_operand" "")
+   (match_dup 1))]
+  "!cc_reg_not_cr0_operand (operands[2], CCmode)"
+  [(parallel [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+  (compare:CC (match_operand:Q 1 "int_reg_operand" "r")
+  (const_int 0)))
+ (set (match_operand:Q 0 "int_reg_operand" "=r")
+  (match_dup 1))])]
+  ""
+)
+
+(define_peephole2
+  [(set (match_operand:Q 0 "int_reg_operand" "")
+   (match_operand:Q 1 "int_reg_operand" ""))
+   (set (match_operand:CC 2 "cc_reg_operand" "")
+   (compare:CC (match_dup 1)
+   (const_int 0)))]
+  "!cc_reg_not_cr0_operand (operands[2], CCmode)"
+  [(parallel [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+  (compare:CC (match_operand:GPR 1 "int_reg_operand" "r")
+  (const_int 0)))
+ (set (match_operand:Q 0 "int_reg_operand" "=r")
+  (match_dup 1))])]
+  ""
+)
+
 (define_split
   [(set (match_operand:CC 2 "cc_reg_not_cr0_operand")
-   (compare:CC (match_operand:P 1 "gpc_reg_operand")
+   (compare:CC (match_operand:Q 1 "gpc_reg_operand")
(const_int 0)))
-   (set (match_operand:P 0 "gpc_reg_operand") (match_dup 1))]
+   (set (match_operand:Q 0 "gpc_reg_operand") (match_dup 1))]
   "reload_completed"
   [(set (match_dup 0) (match_dup 1))
(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.dg/rtl/powerpc/move_compare_peephole_32.c 
b/gcc/testsuite/gcc.dg/rtl/powerpc/move_compare_peephole_32.c
new file mode 100644
index 000..29234dea7c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/powerpc/move_compare_peephole_32.c
@@ -0,0 +1,60 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-skip-if "" { has_arch_ppc64 } } */
+/* { dg-options "-O2 -mregnames" } */
+
+/* Following 

Re: [PATCH] Add MinGW option -mcrtdll= for choosing C RunTime DLL library

2023-06-11 Thread LIU Hao via Gcc-patches

在 2023/6/12 07:08, Jonathan Yong 写道:

+preprocessor is done. MinGW import library @code{msvcrt} is just a
+symlink (or file copy) to the other MinGW CRT import library 


I suggest a change to this line:

   symlink to (or a copy of) another MinGW CRT import library


Also, as discussed earlier, linking against a CRT version different from the value of 
`__MSVCRT_VERSION__` in _mingw.h is not officially supported and should be warned. So maybe we can 
append a paragraph to the documentation:


   Generally speaking, changing the CRT DLL requires recompiling
   the entire MinGW CRT. This option is for experimental and testing
   purposes only.



--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature


[PATCH, AIX] Debugging does not require a stack frame.

2023-06-11 Thread David Edelsohn via Gcc-patches
The rs6000 port has allocated a stack frame when debugging is enabled
on AIX since the earliest versions of the port.  Apparently the
earliest versions of the debuggers for AIX had difficulty with
stackless frames.

Both AIX DBX and GDB support stackless frames on AIX, and IBM XLC,
OpenXL and LLVM for AIX do not generate an extraneous stack frame when
debugging is enabled.  This patch updates the rs6000 stack info
function to not set the.stack frame flag when debugging is enabled for
AIX.

Bootstrapped on powerpc-ibm-aix7.2.5.0

Committed.

Thanks, David

* gcc/config/rs6000/rs6000-logue.cc (rs6000_stack_info):
Do not require a stack frame when debugging is enabled for AIX.

index bc6b153b59f..98846f781ec 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -928,9 +928,6 @@ rs6000_stack_info (void)
   else if (frame_pointer_needed)
 info->push_p = 1;

-  else if (TARGET_XCOFF && write_symbols != NO_DEBUG && !flag_compare_debug)
-info->push_p = 1;
-
   else
 info->push_p = non_fixed_size > (TARGET_32BIT ? 220 : 288);


Re: [PATCH] Add MinGW option -mcrtdll= for choosing C RunTime DLL library

2023-06-11 Thread Jonathan Yong via Gcc-patches

On 5/27/23 10:14, Pali Rohár wrote:

It adjust preprocess, compile and link flags, which allows to change
default -lmsvcrt library by another provided by MinGW runtime.

gcc/
  * config/i386/mingw-w64.h (CPP_SPEC): Adjust for -mcrtdll=.
  (REAL_LIBGCC_SPEC): New define.
  * config/i386/mingw.opt: Add mcrtdll=
  * config/i386/mingw32.h (CPP_SPEC): Adjust for -mcrtdll=.
  (REAL_LIBGCC_SPEC): Adjust for -mcrtdll=.
  (STARTFILE_SPEC): Adjust for -mcrtdll=.
  * doc/invoke.texi: Add mcrtdll= documentation.
---
  gcc/config/i386/mingw-w64.h | 22 +-
  gcc/config/i386/mingw.opt   |  4 
  gcc/config/i386/mingw32.h   | 28 
  gcc/doc/invoke.texi | 21 -
  4 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mingw-w64.h b/gcc/config/i386/mingw-w64.h
index 3a21cec3f8cd..0146ed4f793e 100644
--- a/gcc/config/i386/mingw-w64.h
+++ b/gcc/config/i386/mingw-w64.h
@@ -25,7 +25,27 @@ along with GCC; see the file COPYING3.  If not see
  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
 "%{municode:-DUNICODE} " \
 "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
-"%{" SPEC_PTHREAD2 ":-U_REENTRANT} "
+"%{" SPEC_PTHREAD2 ":-U_REENTRANT} " \
+"%{mcrtdll=crtdll*:-U__MSVCRT__ -D__CRTDLL__} " \
+"%{mcrtdll=msvcrt10*:-D__MSVCRT_VERSION__=0x100} " \
+"%{mcrtdll=msvcrt20*:-D__MSVCRT_VERSION__=0x200} " \
+"%{mcrtdll=msvcrt40*:-D__MSVCRT_VERSION__=0x400} " \
+"%{mcrtdll=msvcrt-os*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr70*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr71*:-D__MSVCRT_VERSION__=0x701} " \
+"%{mcrtdll=msvcr80*:-D__MSVCRT_VERSION__=0x800} " \
+"%{mcrtdll=msvcr90*:-D__MSVCRT_VERSION__=0x900} " \
+"%{mcrtdll=msvcr100*:-D__MSVCRT_VERSION__=0xA00} " \
+"%{mcrtdll=msvcr110*:-D__MSVCRT_VERSION__=0xB00} " \
+"%{mcrtdll=msvcr120*:-D__MSVCRT_VERSION__=0xC00} " \
+"%{mcrtdll=ucrt*:-D_UCRT} "
+
+#undef REAL_LIBGCC_SPEC
+#define REAL_LIBGCC_SPEC \
+  "%{mthreads:-lmingwthrd} -lmingw32 \
+   " SHARED_LIBGCC_SPEC " \
+   -lmingwex %{!mcrtdll=*:-lmsvcrt} %{mcrtdll=*:-l%*} \
+   -lkernel32 " MCFGTHREAD_SPEC
  
  #undef STARTFILE_SPEC

  #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
diff --git a/gcc/config/i386/mingw.opt b/gcc/config/i386/mingw.opt
index 0ae026a66bd6..dd66a50aec00 100644
--- a/gcc/config/i386/mingw.opt
+++ b/gcc/config/i386/mingw.opt
@@ -18,6 +18,10 @@
  ; along with GCC; see the file COPYING3.  If not see
  ; .
  
+mcrtdll=

+Target RejectNegative Joined
+Preprocess, compile or link with specified C RunTime DLL library.
+
  pthread
  Driver
  
diff --git a/gcc/config/i386/mingw32.h b/gcc/config/i386/mingw32.h

index 6a55baaa4587..a1ee001983a7 100644
--- a/gcc/config/i386/mingw32.h
+++ b/gcc/config/i386/mingw32.h
@@ -89,7 +89,20 @@ along with GCC; see the file COPYING3.  If not see
  #undef CPP_SPEC
  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
 "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
-"%{" SPEC_PTHREAD2 ": } "
+"%{" SPEC_PTHREAD2 ": } " \
+"%{mcrtdll=crtdll*:-U__MSVCRT__ -D__CRTDLL__} " \
+"%{mcrtdll=msvcrt10*:-D__MSVCRT_VERSION__=0x100} " \
+"%{mcrtdll=msvcrt20*:-D__MSVCRT_VERSION__=0x200} " \
+"%{mcrtdll=msvcrt40*:-D__MSVCRT_VERSION__=0x400} " \
+"%{mcrtdll=msvcrt-os*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr70*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr71*:-D__MSVCRT_VERSION__=0x701} " \
+"%{mcrtdll=msvcr80*:-D__MSVCRT_VERSION__=0x800} " \
+"%{mcrtdll=msvcr90*:-D__MSVCRT_VERSION__=0x900} " \
+"%{mcrtdll=msvcr100*:-D__MSVCRT_VERSION__=0xA00} " \
+"%{mcrtdll=msvcr110*:-D__MSVCRT_VERSION__=0xB00} " \
+"%{mcrtdll=msvcr120*:-D__MSVCRT_VERSION__=0xC00} " \
+"%{mcrtdll=ucrt*:-D_UCRT} "
  
  /* For Windows applications, include more libraries, but always include

 kernel32.  */
@@ -184,11 +197,18 @@ along with GCC; see the file COPYING3.  If not see
  #define REAL_LIBGCC_SPEC \
"%{mthreads:-lmingwthrd} -lmingw32 \
 " SHARED_LIBGCC_SPEC " \
-   -lmoldname -lmingwex -lmsvcrt -lkernel32 " MCFGTHREAD_SPEC
+   %{mcrtdll=crtdll*:-lcoldname} %{!mcrtdll=crtdll*:-lmoldname} \
+   -lmingwex %{!mcrtdll=*:-lmsvcrt} %{mcrtdll=*:-l%*} \
+   -lkernel32 " MCFGTHREAD_SPEC
  
  #undef STARTFILE_SPEC

-#define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
-  %{!shared:%{!mdll:crt2%O%s}} %{pg:gcrt2%O%s} \
+#define STARTFILE_SPEC " \
+  %{shared|mdll:%{mcrtdll=crtdll*:dllcrt1%O%s}} \
+  

[PATCH] VECT: Add LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-11 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:   v1 = MEM (...)
  for (int i = 0; i < 4; i++)   v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2 
MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:   mask = comparison
  for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i])v2 = LEN_MASK_LOAD (length = VF, mask) 
  a[i] = b[i] + c[i];   v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
   
Case 3 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask 
= {-1,-1,...}, v3)

Case 4 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   mask = comparison
  if (cond[i])  v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, 
mask, v3)

More features:
1. Support simplify gimple fold for LEN_MASK_ LOAD/STORE:
   LEN_MASK_STORE (length = vf, mask = {-1,-1,...}, v) ===> MEM [...] = V
2. Allow DSE for LEN_MASK_* LOAD/STORE.

Bootstrap && Regression on X86 with no surprise difference.

gcc/ChangeLog:

* doc/md.texi: Add LEN_MASK_ LOAD/STORE.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* gimple-fold.cc (arith_overflowed_p): Ditto.
(gimple_fold_partial_load_store_mem_ref): Ditto.
(gimple_fold_partial_store): Ditto.
(gimple_fold_call): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs-query.cc (can_vec_len_mask_load_store_p): Ditto.
* optabs-query.h (can_vec_len_mask_load_store_p): Ditto.
* optabs.def (OPTAB_CD): Ditto.
* tree-data-ref.cc (get_references_in_stmt): Ditto.
* tree-if-conv.cc (ifcvt_can_use_mask_load_store): Ditto.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Ditto.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto.
(dse_optimize_stmt): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Ditto.
* tree-vect-data-refs.cc (can_group_stmts_p): Ditto.
(vect_find_stmt_data_reference): Ditto.
(vect_supportable_dr_alignment): Ditto.
* tree-vect-loop.cc (vect_verify_loop_lens): Ditto.
(optimize_mask_stores): Ditto.
* tree-vect-slp.cc (vect_get_operand_map): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): 

Re: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API

2023-06-11 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-11 08:33
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFHMIN vlmul ext. Aka:
 
vfloat16*_t <==> vfloat16*_t.
 
From the user's perspective, it is reasonable to do some type convert
between vfloat16*_t and vfloat16*_t when only ZVFHMIN is enabled.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add type to X2/X4/X8/X16/X32 vlmul ext ops.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add new test cases.
---
.../riscv/riscv-vector-builtins-types.def | 15 ++
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 18 +--
.../riscv/rvv/base/zvfhmin-intrinsic.c| 54 +--
3 files changed, 79 insertions(+), 8 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 589ea532727..db8e61fea6a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -978,6 +978,11 @@ DEF_RVV_X2_VLMUL_EXT_OPS (vuint32m4_t, 0)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
@@ -1014,6 +1019,10 @@ DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m1_t, 0)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m2_t, 0)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
@@ -1040,6 +1049,9 @@ DEF_RVV_X8_VLMUL_EXT_OPS (vuint16m1_t, 0)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint32m1_t, 0)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
@@ -1056,6 +1068,8 @@ DEF_RVV_X16_VLMUL_EXT_OPS (vuint8mf2_t, 0)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf2_t, 0)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X16_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
@@ -1064,6 +1078,7 @@ DEF_RVV_X32_VLMUL_EXT_OPS (vint16mf4_t, 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf4_t, 0)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X32_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X64_VLMUL_EXT_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X64_VLMUL_EXT_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
index d5bcdd5156a..ff9e0156a68 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
@@ -37,13 

Re: [PATCH] rs6000: Guard __builtin_{un, }pack_vector_int128 with vsx [PR109932]

2023-06-11 Thread David Edelsohn via Gcc-patches
On Tue, Jun 6, 2023 at 5:19 AM Kewen.Lin  wrote:

> Hi,
>
> As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
> should be guarded under vsx rather than power7, as their
> corresponding bif patterns have the conditions TARGET_VSX
> and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode).  This patch is to
> move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
> their supports.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>
> I'll push this next week if no objections.
>
> BR,
> Kewen
> -
> PR target/109932
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
> __builtin_unpack_vector_int128): Move from stanza power7 to vsx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr109932-1.c: New test.
> * gcc.target/powerpc/pr109932-2.c: New test.
>

This is okay.

Thanks, David


> ---
>  gcc/config/rs6000/rs6000-builtins.def | 14 +++---
>  gcc/testsuite/gcc.target/powerpc/pr109932-1.c | 16 
>  gcc/testsuite/gcc.target/powerpc/pr109932-2.c | 16 
>  3 files changed, 39 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109932-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109932-2.c
>
> diff --git a/gcc/config/rs6000/rs6000-builtins.def
> b/gcc/config/rs6000/rs6000-builtins.def
> index 92d9b46e1b9..a38184b0ef9 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2009,6 +2009,13 @@
>const vsll __builtin_vsx_xxspltd_2di (vsll, const int<1>);
>  XXSPLTD_V2DI vsx_xxspltd_v2di {}
>
> +  const vsq __builtin_pack_vector_int128 (unsigned long long, \
> +  unsigned long long);
> +PACK_V1TI packv1ti {}
> +
> +  const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>);
> +UNPACK_V1TI unpackv1ti {}
> +
>
>  ; Power7 builtins (ISA 2.06).
>  [power7]
> @@ -2030,16 +2037,9 @@
>const unsigned int __builtin_divweu (unsigned int, unsigned int);
>  DIVWEU diveu_si {}
>
> -  const vsq __builtin_pack_vector_int128 (unsigned long long, \
> -  unsigned long long);
> -PACK_V1TI packv1ti {}
> -
>void __builtin_ppc_speculation_barrier ();
>  SPECBARR speculation_barrier {}
>
> -  const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>);
> -UNPACK_V1TI unpackv1ti {}
> -
>
>  ; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing).
>  [power7-64]
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr109932-1.c
> b/gcc/testsuite/gcc.target/powerpc/pr109932-1.c
> new file mode 100644
> index 000..3e3f9eaa65e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr109932-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec -mno-vsx" } */
> +
> +/* Verify there is no ICE but one expected error message instead.  */
> +
> +#include 
> +
> +extern vector signed __int128 res_vslll;
> +extern unsigned long long aull[2];
> +
> +void
> +testVectorInt128Pack ()
> +{
> +  res_vslll = __builtin_pack_vector_int128 (aull[0], aull[1]); /* {
> dg-error "'__builtin_pack_vector_int128' requires the '-mvsx' option" } */
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr109932-2.c
> b/gcc/testsuite/gcc.target/powerpc/pr109932-2.c
> new file mode 100644
> index 000..3e3f9eaa65e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr109932-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec -mno-vsx" } */
> +
> +/* Verify there is no ICE but one expected error message instead.  */
> +
> +#include 
> +
> +extern vector signed __int128 res_vslll;
> +extern unsigned long long aull[2];
> +
> +void
> +testVectorInt128Pack ()
> +{
> +  res_vslll = __builtin_pack_vector_int128 (aull[0], aull[1]); /* {
> dg-error "'__builtin_pack_vector_int128' requires the '-mvsx' option" } */
> +}
> +
> --
> 2.25.1
>


[PATCH] Avoid duplicate vector initializations during RTL expansion.

2023-06-11 Thread Roger Sayle

This middle-end patch avoids some redundant RTL for vector initialization
during RTL expansion.  For the simple test case:

typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
__int128 key;

v1ti foo() {
return (v1ti){key};
}

the middle-end currently expands:

(set (reg:V1TI 85) (const_vector:V1TI [ (const_int 0) ]))

(set (reg:V1TI 85) (mem/c:V1TI (symbol_ref:DI ("key"

where we create a dead instruction that initializes the vector to zero,
immediately followed by a set of the entire vector.  This patch skips
this zeroing instruction when the vector has only a single element.
It also updates the code to indicate when we've cleared the vector,
so that we don't need to initialize zero elements.

Interestingly, this code is very similar to my patch from April 2006:
https://gcc.gnu.org/pipermail/gcc-patches/2006-April/192861.html


This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
and make -k check, both with and without --target_board=unix{-m32}, with
no new failures.  Ok for mainline?


2023-06-11  Roger Sayle  

gcc/ChangeLog
* expr.cc (store_constructor) : Don't bother
clearing vectors with only a single element.  Set CLEARED if the
vector was initialized to zero.


Thanks,
Roger
--

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 868fa6e..62cd8fa 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7531,8 +7531,11 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
  }
 
/* Inform later passes that the old value is dead.  */
-   if (!cleared && !vector && REG_P (target))
- emit_move_insn (target, CONST0_RTX (mode));
+   if (!cleared && !vector && REG_P (target) && maybe_gt (n_elts, 1u))
+ {
+   emit_move_insn (target, CONST0_RTX (mode));
+   cleared = 1;
+ }
 
 if (MEM_P (target))
  alias = MEM_ALIAS_SET (target);


Re: [RFC] Add stdckdint.h header for C23

2023-06-11 Thread Martin Uecker via Gcc-patches


Hi Jakup,

two comments which may or may not be helpful:

Clang extended _Generic in a similar way:
https://github.com/llvm/llvm-project/commit/12728e144994efe84715f4e5dbb8c3104e9f0b5a

Although for _Generic you can achieve the same with checking
for compatiblilty of pointer to the type, and I do not think
this helps with the classification problem.


If I am not missing something, you should be able to check
for an enumerated type using _Generic by checking that the
type is not compatible to another enum type:

enum type_check { _X = 1 };

#define type_is_enum(x) \
_Generic(x, unsigned int: _Generic(x, enum type_check: 0, default: 1), 
default: 0)

https://godbolt.org/z/j6z4a4Mdn

For C23 with fixed underlying type this may become more
complicated. Maybe this becomes to messy.

Martin





Re: [PATCH] Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

2023-06-11 Thread FX Coudert via Gcc-patches
Hi,

> Running
> nohup make -j7 check-fortran 
> RUNTESTFLAGS="--target_board=unix/-mabi=ieeelongdouble/-mcpu=power9"&
> from the gcc subdirectory yielded only a single failure:

I dug more into the code and I understand why all tests are running: since 
db630423a97ec6690a8eb0e5c3cb186c91e3740d and 
0c2d6aa1be2ea85e751852834986ae52d58134d3 all IEEE functions manipulating real 
or complex arguments are actually expanded fully inline (we retain functions in 
libgfortran for backward compatibility).

The only IEEE functions that depend on libgfortran runtime are the 
“IEEE_SUPPORT_*” functions.

FX

Re: libgfortran: remove support for --enable-intermodule

2023-06-11 Thread FX Coudert via Gcc-patches
> OK, thanks.

Committed at 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ecc96eb5d2a0e5dd93365ef76a58d7f754273934

Re: [pushed] diagnostics: ensure that .sarif files are UTF-8 encoded [PR109098]

2023-06-11 Thread Lewis Hyatt via Gcc-patches
On Fri, Mar 24, 2023 at 9:04 PM David Malcolm via Gcc-patches
 wrote:
>
> PR analyzer/109098 notes that the SARIF spec mandates that .sarif
> files are UTF-8 encoded, but -fdiagnostics-format=sarif-file naively
> assumes that the source files are UTF-8 encoded when quoting source
> artefacts in the .sarif output, which can lead to us writing out
> .sarif files with non-UTF-8 bytes in them (which break my reporting
> scripts).
>
> The root cause is that sarif_builder::maybe_make_artifact_content_object
> was using maybe_read_file to load the file content as bytes, and
> assuming they were UTF-8 encoded.
>
> This patch reworks both overloads of this function (one used for the
> whole file, the other for snippets of quoted lines) so that they go
> through input.cc's file cache, which attempts to decode the input files
> according to the input charset, and then encode as UTF-8.  They also
> check that the result actually is UTF-8, for cases where the input
> charset is missing, or incorrectly specified, and omit the quoted
> source for such awkward cases.
>
> Doing so fixes all of the cases I've encountered.
>
> The patch adds a new:
>   { dg-final { verify-sarif-file } }
> directive to all SARIF test cases in the test suite, which verifies
> that the output is UTF-8 encoded, and is valid JSON.  In particular
> it verifies that when we complain about encoding problems, the .sarif
> report we emit is itself correctly encoded.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Integration testing shows no regressions, and a fix for the case
> seen in haproxy-2.7.1.
> Pushed to trunk as r13-6861-gd495ea2b232f3e.

Hi David-

Regarding the patch series I had about _Pragma locations (most
recently https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609472.html
and https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html).
That one will need some work now in order to apply on top of these
changes to input.cc. Happy to do that, but I thought I better check in
first to see if you had any feedback please on the new approach to
input.cc that's in the v2 patch? Do you think it's a worthwhile
feature, or you'd rather I just drop it? Thanks!

-Lewis


[avr,committed] Tidy code for inverted bit insertions

2023-06-11 Thread Georg-Johann Lay
Applied this no-op change that tidies up the code for inverted bit 
insertions.


Johann

--

Use canonical form for reversed single-bit insertions after reload.

We now split almost all insns after reload in order to add clobber of 
REG_CC.

If insns are coming from insn combiner and there is no canonical form for
the respective arithmetic (like for reversed bit insertions), there is
no need to keep all these different representations after reload:
Instead of splitting such patterns to their clobber-REG_CC-analogon, we can
split to a canonical representation, which is insv_notbit for the 
present case.

This is a no-op change.

gcc/
* config/avr/avr.md (adjust_len) [insv_notbit_0, insv_notbit_7]:
Remove attribute values.
(insv_notbit): New post-reload insn.
(*insv.not-shiftrt_split, *insv.xor1-bit.0_split)
(*insv.not-bit.0_split, *insv.not-bit.7_split)
(*insv.xor-extract_split): Split to insv_notbit.
(*insv.not-shiftrt, *insv.xor1-bit.0, *insv.not-bit.0, *insv.not-bit.7)
(*insv.xor-extract): Remove post-reload insns.
* config/avr/avr.cc (avr_out_insert_notbit) [bitno]: Remove parameter.
(avr_adjust_insn_length): Adjust call of avr_out_insert_notbit.
[ADJUST_LEN_INSV_NOTBIT_0, ADJUST_LEN_INSV_NOTBIT_7]: Remove cases.
* config/avr/avr-protos.h (avr_out_insert_notbit): Adjust prototype.


diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index a10d91d186f..5c1343f0df8 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -57,7 +57,7 @@ extern const char *avr_out_compare64 (rtx_insn *, 
rtx*, int*);

 extern const char *ret_cond_branch (rtx x, int len, int reverse);
 extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*);
 extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*);
-extern const char *avr_out_insert_notbit (rtx_insn *, rtx*, rtx, int*);
+extern const char *avr_out_insert_notbit (rtx_insn *, rtx*, int*);
 extern const char *avr_out_extr (rtx_insn *, rtx*, int*);
 extern const char *avr_out_extr_not (rtx_insn *, rtx*, int*);
 extern const char *avr_out_plus_set_ZN (rtx*, int*);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index b02f5e2..ef6872a3f55 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -8995,20 +8995,15 @@ avr_out_addto_sp (rtx *op, int *plen)
 }


-/* Output instructions to insert an inverted bit into OPERANDS[0]:
-   $0.$1 = ~$2.$3  if XBITNO = NULL
-   $0.$1 = ~$2.XBITNO  if XBITNO != NULL.
+/* Output instructions to insert an inverted bit into OP[0]: $0.$1 = 
~$2.$3.

If PLEN = NULL then output the respective instruction sequence which
is a combination of BST / BLD and some instruction(s) to invert the 
bit.
If PLEN != NULL then store the length of the sequence (in words) in 
*PLEN.

Return "".  */

 const char*
-avr_out_insert_notbit (rtx_insn *insn, rtx operands[], rtx xbitno, int 
*plen)

+avr_out_insert_notbit (rtx_insn *insn, rtx op[], int *plen)
 {
-  rtx op[4] = { operands[0], operands[1], operands[2],
-xbitno == NULL_RTX ? operands [3] : xbitno };
-
   if (INTVAL (op[1]) == 7
   && test_hard_reg_class (LD_REGS, op[0]))
 {
@@ -10038,15 +10033,7 @@ avr_adjust_insn_length (rtx_insn *insn, int len)
 case ADJUST_LEN_INSERT_BITS: avr_out_insert_bits (op, ); break;
 case ADJUST_LEN_ADD_SET_ZN: avr_out_plus_set_ZN (op, ); break;

-case ADJUST_LEN_INSV_NOTBIT:
-  avr_out_insert_notbit (insn, op, NULL_RTX, );
-  break;
-case ADJUST_LEN_INSV_NOTBIT_0:
-  avr_out_insert_notbit (insn, op, const0_rtx, );
-  break;
-case ADJUST_LEN_INSV_NOTBIT_7:
-  avr_out_insert_notbit (insn, op, GEN_INT (7), );
-  break;
+case ADJUST_LEN_INSV_NOTBIT: avr_out_insert_notbit (insn, op, 
); break;


 default:
   gcc_unreachable();
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index eadc482da15..83dd15040b0 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -163,7 +163,7 @@ (define_attr "adjust_len"
ashlhi, ashrhi, lshrhi,
ashlsi, ashrsi, lshrsi,
ashlpsi, ashrpsi, lshrpsi,
-   insert_bits, insv_notbit, insv_notbit_0, insv_notbit_7,
+   insert_bits, insv_notbit,
add_set_ZN, cmp_uext, cmp_sext,
no"
   (const_string "no"))
@@ -9151,6 +9151,21 @@ (define_insn "*insv.shiftrt"
   [(set_attr "length" "2")])

 ;; Same, but with a NOT inverting the source bit.
+;; Insert bit ~$2.$3 into $0.$1
+(define_insn "insv_notbit"
+  [(set (zero_extract:QI (match_operand:QI 0 "register_operand" 
   "+r")

+ (const_int 1)
+ (match_operand:QI 1 "const_0_to_7_operand" 
"n"))
+(not:QI (zero_extract:QI (match_operand:QI 2 "register_operand" 
"r")

+ (const_int 1)
+ (match_operand:QI 3 
"const_0_to_7_operand" "n"

+   (clobber (reg:CC REG_CC))]
+  "reload_completed"
+  {
+  

Re: [PATCH] Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

2023-06-11 Thread Thomas Koenig via Gcc-patches

Hi FX,

>> The KIND=17 is a bit of a kludge.  It is not visible for
>> user programs, they use KIND=16, but this is then translated
>> to library calls as if it was KIND=17 if the IEEE 128-bit floats
>> are selected
>
> Can you check what the IEEE test results are when 
-mabi=ieeelongdouble is enabled?

Running

nohup make -j7 check-fortran 
RUNTESTFLAGS="--target_board=unix/-mabi=ieeelongdouble/-mcpu=power9"&


from the gcc subdirectory yielded only a single failure:

grep ^FAIL nohup.out
FAIL: gfortran.dg/gomp/target-update-1.f90   -O   scan-tree-dump gimple 
"#pragma omp target update to\\(c \\[len: [0-9]+\\]\\) to\\(present:a 
\\[len: [0-9]+\\]\\) to\\(e \\[len: [0-9]+\\]\\) from\\(present:b 
\\[len: [0-9]+\\]\\) from\\(d \\[len: [0-9]+\\]\\)


and also ran the correct tests, as seen from gfortran.log; for example:

Executing on host: gfortran 
/home/tkoenig/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 
-mabi=ieeelongdouble -mcpu=power9   -fdiagnostics-plain-output 
-fdiagnostics-plain-output-O0   -pedantic-errors 
-fintrinsic-modules-path 
/home/tkoenig/gcc-bin/powerpc64le-unknown-linux-gnu/./libgfortran/ 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans-lm 
-o ./large_2.exe(timeout = 300)
spawn -ignore SIGHUP gfortran 
/home/tkoenig/gcc/gcc/testsuite/gfortran.dg/ieee/large_2.f90 
-mabi=ieeelongdouble -mcpu=power9 -fdiagnostics-plain-output 
-fdiagnostics-plain-output -O0 -pedantic-errors -fintrinsic-modules-path 
/home/tkoenig/gcc-bin/powerpc64le-unknown-linux-gnu/./libgfortran/ 
-fno-unsafe-math-optimizations -frounding-math -fsignaling-nans -lm -o 
./large_2.exe^M

PASS: gfortran.dg/ieee/large_2.f90   -O0  (test for excess errors)
Setting LD_LIBRARY_PATH to 
.:/home/tkoenig/lib/../lib64:.:/home/tkoenig/lib/../lib64

Execution timeout is: 300
spawn [open ...]^M
PASS: gfortran.dg/ieee/large_2.f90   -O0  execution test

This is a test that fails on POWER with the IBM long double format,
so things look ok.  It also works when compiled individually.

So, this is looking good.

By the way, if you or any other gfortran maintainer would like an
account on the POWER virtual machine in question, that would be
no problem. I would ask the cluster administrators for permission
and then create the account (I have admin privileges on that
virtual machine).

> It’s not even clear to me what the IEEE kinds selected should be, in 
this case, depending on -mabi=ieeelongdouble


The "KIND=17" stuff should only be visible inside the library.

>
>> Regarding FX's patch: I am not quite sure that I am
>> actually testing the right thing if running the testsuite
>> there, so POWER should not hold up this patch.  If it turns
>> out that POWER needs additonal work on IEEE, we can always
>> add that later.
>
> Actually, it sounds like the situation is: the same target can
> have two ABIs based on a compile-time flag. That sounds like a job
> for multilib, i.e., we should compile libgfortran twice, one for
> each ABI. I am sure> this was considered and rejected, do you
> remember what was the rationale?

I don't remember discussing multilib in this context, sorry.

Best regards

Thomas


Re: libgfortran: remove support for --enable-intermodule

2023-06-11 Thread Mikael Morin

Le 10/06/2023 à 22:28, FX Coudert via Fortran a écrit :

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109373
I don’t believe it is widely used, and it was removed from everywhere else in 
gcc.

Bootstrapped and regtested on x86_64-pc-linux-gnu.
OK to commit?

FX


OK, thanks.