[r14-2786 Regression] FAIL: g++.target/i386/pr98218-1.C -std=gnu++98 scan-assembler-times cmpltps 3 on Linux/x86_64

2023-07-27 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

ade30fad6669e5f34ca4c587c724d74ecc953175 is the first bad commit
commit ade30fad6669e5f34ca4c587c724d74ecc953175
Author: Uros Bizjak 
Date:   Wed Jul 26 11:10:46 2023 +0200

i386: Clear upper half of XMM register for V2SFmode operations [PR110762]

caused

FAIL: gcc.target/i386/pr106910-1.c scan-assembler-times roundps 9
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++14  scan-assembler-times cmpltps 3
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++14  scan-assembler-times pcmpgtd 2
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++17  scan-assembler-times cmpltps 3
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++17  scan-assembler-times pcmpgtd 2
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++20  scan-assembler-times cmpltps 3
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++20  scan-assembler-times pcmpgtd 2
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++98  scan-assembler-times cmpltps 3
FAIL: g++.target/i386/pr98218-1.C  -std=gnu++98  scan-assembler-times pcmpgtd 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2786/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr106910-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=g++.target/i386/pr98218-1.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Robin Dapp via Gcc-patches
>> Why do we appear to return a different mode here?  We already request
>> FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
>> we do not change the mode so we could just always return the incoming
>> mode?
> 
> Because we need to emit 2 insn when meet a call. One before the call,
> we must return DYN_CALL when needed, then the emit part is able to
> know the mode switch to DYN_CALL and restore. One after the call, we
> must return DYN_CALL when after, then the next insn emit part is able
> to know the prev_mode is DYN_CALL and backup.

My question was not about DYN_CALL in general but rather - mode switching
will switch to the moded requested by mode_needed.  The mode_after hook
allows to specify if we want to change a mode afterwards but it doesn't look
like we every do.

mode_needed -> CALL_P -> DYN_CALL
mode_sw switches to DYN_CALL
mode_after -> DYN_CALL

so there is no need to appear to change the mode but we can just pass it
through, possibly same for DYN?  Or to put it differently, can we start
with "return mode" in riscv_frm_mode_after and then only add the condition
that are strictly necessary?

I also noticed an unused bb in riscv_frm_adjust_mode_after_call that
we want to remove.  Also, if (mode != prev_mode) in mode_set is unnecessary
as mode_sw already checks that.

Regards
 Robin



Re: [PATCH] tree-optimization/106081 - elide redundant permute

2023-07-27 Thread Richard Biener via Gcc-patches
On Wed, 26 Jul 2023, Jeff Law wrote:

> 
> 
> On 7/26/23 07:27, Richard Biener via Gcc-patches wrote:
> > The following patch makes sure to elide a redundant permute that
> > can be merged with existing splats represented as load permutations
> > as we now do for non-grouped SLP loads.  This is the last bit
> > missing to fix this PR where the main fix was already done by
> > r14-2117-gdd86a5a69cbda4
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> >  PR tree-optimization/106081
> >  * tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
> >  Assign layout -1 to splats.
> > 
> >  * gcc.dg/vect/pr106081.c: New testcase.
> :-)  Glad to see how easy this ended up being after the work you put 
> into pushing permutes around a couple years ago.

And of course Richard S. refactoring and improving this a lot after
that.

Richard.


RE: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Li, Pan2 via Gcc-patches
> ../../../.././gcc/libstdc++-v3/libsupc++/eh_personality.cc:805:1: internal 
> compiler error: in insert_insn_on_edge, at cfgrtl.cc:1976.

This error comes from assert of insert_insn_on_edge, the edge cannot be 
ABNORMAL and CRITIAL.

Thus, I try to filter out it like gcse.cc:2168 do like below, but still hit 
some assert like regno < FIRST_PSEDUO_REGISTER in find_oldest_value_reg.

If (eg->flags & EDGE_ABNORMAL)
  Insert_insn_end_basic_block
else
  insert_insn_on_edge

will continue to figure it out.

Pan

From: Li, Pan2
Sent: Thursday, July 27, 2023 9:38 AM
To: 钟居哲 ; Jeff Law ; rdapp.gcc 
; kito.cheng 
Cc: gcc-patches ; Wang, Yanzhang 

Subject: RE: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Thanks Juzhe and Jeff for suggestion. The approach like emit_insn_before_noloc 
will result in below ICE here.

../../../.././gcc/libstdc++-v3/libsupc++/new_opant.cc:42:1: error: flow control 
insn inside a basic block.
../../../.././gcc/libstdc++-v3/libsupc++/new_opant.cc:42:1: internal compiler 
error: in rtl_verify_bb_insns, at cfgrtl.cc:2796

Then I tried below approach but also have ICE like below.

../../../.././gcc/libstdc++-v3/libsupc++/eh_personality.cc:805:1: internal 
compiler error: in insert_insn_on_edge, at cfgrtl.cc:1976.

The insert_insn_end_basic_block have some special handling when end bb is CALL.

Pan

From: 钟居哲 mailto:juzhe.zh...@rivai.ai>>
Sent: Thursday, July 27, 2023 6:56 AM
To: Jeff Law mailto:jeffreya...@gmail.com>>; rdapp.gcc 
mailto:rdapp@gmail.com>>; kito.cheng 
mailto:kito.ch...@sifive.com>>; Li, Pan2 
mailto:pan2...@intel.com>>
Cc: gcc-patches mailto:gcc-patches@gcc.gnu.org>>; 
Wang, Yanzhang mailto:yanzhang.w...@intel.com>>
Subject: Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Thanks Jeff.

Hi, Pan:
Plz try (insert edge and put 'frrm' on that edge) instead of insert end of 
block to see whether it works
(I have tried onece but I don't remember what happens).

Try that with following codes:
edge eg;
edge_iterator ei;
FOR_EACH_EDGE (eg, ei, bb->succs)
{
  start_sequence ();
   emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
   rtx_insn *backup_insn = get_insns ();
  end_sequence ();
  insert_insn_on_edge (backup_insn, eg);
}

to see how's going.

Not sure whether it is correct, Jeff could comments on that.

Thanks.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2023-07-27 06:46
To: 钟居哲; rdapp.gcc; 
kito.cheng; pan2.li
CC: gcc-patches; 
yanzhang.wang
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding


On 7/26/23 16:21, 钟居哲 wrote:
> Hi, Jeff.
>
> insert_insn_end_basic_block is to handle this following case:
>
> bb 1:
> ...
> CALL.>BB_END of bb
> bb 2:
> vfadd rne
>
> You can see there is no instructions after CALL.
>
> So you we use insert_insn_end_basic_block insert a "frrm" at the end of
> the bb 1.
>
> I know typically it's better to insert a edge between bb 1 and bb 2,
> then put "frrm" in that edgen.
> However, it causes ICE.
We'd need to know the reason for the ICE.

>
> If we really need to follow this approach, it seems that we need to
> modify the "mode_sw" PASS?
> Currently, we are avoiding changing the codes of PASS.
Generally wise, but sometimes we do need to change generic bits.  Let's
dive a bit into this.

We have more freedom here to loosen the profitability constraints since
its a target specific pass, but let's at least understand the what's
going on with the ICE, then make some decisions about the best way forward.

jeff



RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Li, Pan2 via Gcc-patches
> so there is no need to appear to change the mode but we can just pass it
> through, possibly same for DYN?  Or to put it differently, can we start
> with "return mode" in riscv_frm_mode_after and then only add the condition
> that are strictly necessary?

I see, you mean at the beginning of frm_after, we can just return the incoming 
mode as is?

If (CALL_P (insn))
  return mode; // Given we aware the mode is DYN_CALL already.

> I also noticed an unused bb in riscv_frm_adjust_mode_after_call that
> we want to remove.  Also, if (mode != prev_mode) in mode_set is unnecessary
> as mode_sw already checks that.

Thank and will cleanup this in v8. AFAIK, the optimize_mode_switching only 
check ptr->mode != no_mode before emit,
not sure if I missed something.

Pan


-Original Message-
From: Robin Dapp  
Sent: Thursday, July 27, 2023 3:26 PM
To: Li, Pan2 ; Kito Cheng 
Cc: rdapp@gmail.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, 
Yanzhang 
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

>> Why do we appear to return a different mode here?  We already request
>> FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
>> we do not change the mode so we could just always return the incoming
>> mode?
> 
> Because we need to emit 2 insn when meet a call. One before the call,
> we must return DYN_CALL when needed, then the emit part is able to
> know the mode switch to DYN_CALL and restore. One after the call, we
> must return DYN_CALL when after, then the next insn emit part is able
> to know the prev_mode is DYN_CALL and backup.

My question was not about DYN_CALL in general but rather - mode switching
will switch to the moded requested by mode_needed.  The mode_after hook
allows to specify if we want to change a mode afterwards but it doesn't look
like we every do.

mode_needed -> CALL_P -> DYN_CALL
mode_sw switches to DYN_CALL
mode_after -> DYN_CALL

so there is no need to appear to change the mode but we can just pass it
through, possibly same for DYN?  Or to put it differently, can we start
with "return mode" in riscv_frm_mode_after and then only add the condition
that are strictly necessary?

I also noticed an unused bb in riscv_frm_adjust_mode_after_call that
we want to remove.  Also, if (mode != prev_mode) in mode_set is unnecessary
as mode_sw already checks that.

Regards
 Robin



Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Robin Dapp via Gcc-patches


> I see, you mean at the beginning of frm_after, we can just return the 
> incoming mode as is?
> 
> If (CALL_P (insn))
>   return mode; // Given we aware the mode is DYN_CALL already.

Yes, potentially similar for all the other ifs but I didn't check
all of them.

> Thank and will cleanup this in v8. AFAIK, the optimize_mode_switching
> only check ptr->mode != no_mode before emit, not sure if I missed
> something.

 if (mode != no_mode && mode != last_mode)
{

Shouldn't this cover us?  I didn't run the testsuite or so but it looks
like it.

Regards
 Robin



Re: Re: [PATCH 0/5] Recognize Zicond extension

2023-07-27 Thread Xiao Zeng
On Wed, Jul 26, 2023 at 01:51:00 AM  Jeff Law  wrote:
>
>
>
>On 7/19/23 04:11, Xiao Zeng wrote:
>> Hi all RISC-V folks:
>>
>> This series of patches completes support for the riscv architecture's
>> Zicond standard extension instruction set.
>>
>> Currently, Zicond is in a frozen state.
>>
>> See the Zicond specification for details:
>> https://github.com/riscv/riscv-zicond/releases/download/v1.0-rc2/riscv-zicond-v1.0-rc2.pdf
>>
>> Prior to this, other community members have also done related work, as shown 
>> in:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611767.html
>> https://sourceware.org/pipermail/binutils/2023-January/125773.html
>>
>> Xiao Zeng (5):
>>    [RISC-V] Recognize Zicond extension
>>    [RISC-V] Generate Zicond instruction for basic semantics
>>    [RISC-V] Generate Zicond instruction for select pattern with condition
>>  eq or neq to 0
>>    [RISC-V] Generate Zicond instruction for select pattern with condition
>>  eq or neq to non-zero
>>    [RISC-V] Generate Zicond instruction for conditional execution
>[ ... ]
>So what I'm thinking for the overall kit is to stage it in a bit
>differently given we have some bits which clearly can go forward as-is
>or with very minor changes and others that are going to need some
>iteration/refinement.
>
>So I'm going to suggest a few changes so that bits which are non
>controversial can move forward immediately.
>
>1/5 looked fine as-is.
>
>I would split 2/5.  The first two patterns you added are
>non-controversial and could go in immediately.  The other 4 patterns
>(which require some operand matching) will likely need at least one
>round of iteration and should be a distinct patch.
>
>
>I would split 3/5 as well.  3a would be the costing which I think just
>needs to use COSTS_N_INSNS (1) rather than 0 for the cost of a
>conditional move and could then move forward immediately.  The bits to
>wire everything up into the conditional move pattern would be a distinct
>patch.  We did something similar internally in Ventana and I'd like to
>take the time to make sure the issues we ran into are addressed in your
>version then do an evaluation of the two approaches.
>
>I think patch 4 is probably going to need some work too.  I *think* what
>we did internally at Ventana will work better (utilizing scc for a
>non-trivial condition).
>
>Let's defer patch #5 initially as well.  It's going to get tangled up in
>a whole bunch of changes I think we need to make to ifcvt.cc.
>
>The point being that with the bits from #1, #2 and #3 we can get some
>initial support in immediately.  eswincomputing and ventana can both
>reduce our divergence from the trunk and work together on the rest of
>the bits.
>
>Does that work for you?
>
>jeff 

1 Thanks Jeff for your code review feedback.

2. According to your opinions, I have modified the code, but out of caution
for upstream, I conducted a complete regression tests on patch V2, which took
some time. I was unable to reply to emails and upload patch V2 in a timely 
manner.

3 After you and other maintainers made minor modifications to my patch[1/5] 
and patch[2/5], it has been merged into the master, so I will no longer upload 
patch V2.

4 patch[1/5] and patch[2/5], which have been merged into the master, have only
completed basic support for Zicond, and further optimization work needs to be
completed. These further optimization reactions are reflected in my patch[3/5]
patch[4/5] and patch[5/5].

5 As you mentioned in your previous email 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625427.html
"eswincomputing and ventana can both reduce our divergence from the trunk
and work together on the rest of the bits...". I will reorganize patch[3/5] 
patch[4/5]
and patch[5/5], provide more detailed explanations, and submit them as an 
alternative
solution for further optimization of Zicond.

Does that work for you?

Xiao Zeng

Re: Re: [PATCH V2] RISC-V: Enable basic VLS modes support

2023-07-27 Thread juzhe.zh...@rivai.ai
Address comments.
V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625618.html 




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-27 14:52
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V2] RISC-V: Enable basic VLS modes support
Hi Juzhe-Zhong:
 
Only
 
> diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
> new file mode 100644
> index 000..c67ff386e50
> --- /dev/null
> +++ b/gcc/config/riscv/autovec-vls.md
> +(define_insn_and_split "mov"
> +  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> +   (match_operand:VLS_AVL_IMM 1 "general_operand""  m,vr, vr"))]
> +  "TARGET_VECTOR"
> +  "@
> +   #
> +   #
> +   vmv%m1r.v\t%0,%1"
> +  "&& (!register_operand (operands[0], mode)
> +   || !register_operand (operands[1], mode))"
> +  [(const_int 0)]
> +  {
> +unsigned size = GET_MODE_BITSIZE (mode).to_constant ();
> +if (size <= MAX_BITS_PER_WORD
> +&& MEM_P (operands[0]) && MEM_P (operands[1]))
 
I am thinking, is it possible to split this part into a standalone
define_split pattern
and be guarded with can_create_pseudo_p ()?
 
Since this should not work after RA, and also will cause some
suboptimal code for those modes larger than MAX_BITS_PER_WORD.
 
Consider that case: the mode size is larger than MAX_BITS_PER_WORD,
and the it get combined into (set (mem) (mem)), but RA didn't found
suitable constraint...so go to reload
 
And then we can add some check on this mov pattern to constrain it to
must be at least one
register operand like other mov patterns.
 
> +
> +(define_insn_and_split "*mov_lra"
> +  [(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand" "=vr, m,vr")
> +   (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand" "  m,vr,vr"))
> +   (clobber (match_scratch:P 2 "=&r,&r,X"))]
> +  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
 
Add && (register_operand (operands[0], mode) || register_operand
(operands[1], mode))
 


Re: [PATCH V3] RISC-V: Enable basic VLS modes support

2023-07-27 Thread Kito Cheng via Gcc-patches
Last minor thing :)

> +(define_insn_and_split "*mov"
> +  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> +   (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
> +  "TARGET_VECTOR"

Reject (set (mem) (mem)) by adding the check:

TARGET_VECTOR
&& (register_operand (operands[0], mode)
|| register_operand (operands[1], mode))"


[PATCH] Remove recursive post-dominator traversal in sinking

2023-07-27 Thread Richard Biener via Gcc-patches
The following turns the recursive post-dominator traversal in GIMPLE
code sinking to a worklist.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-sink.cc (sink_code_in_bb): Remove recursion, instead
use a worklist ...
(pass_sink_code::execute): ... in the caller.
---
 gcc/tree-ssa-sink.cc | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 74dae0fa3c0..ab1cbffb32a 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -673,7 +673,6 @@ sink_common_stores_to_bb (basic_block bb)
 static unsigned
 sink_code_in_bb (basic_block bb)
 {
-  basic_block son;
   gimple_stmt_iterator gsi;
   edge_iterator ei;
   edge e;
@@ -686,12 +685,12 @@ sink_code_in_bb (basic_block bb)
   /* If this block doesn't dominate anything, there can't be any place to sink
  the statements to.  */
   if (first_dom_son (CDI_DOMINATORS, bb) == NULL)
-goto earlyout;
+return todo;
 
   /* We can't move things across abnormal edges, so don't try.  */
   FOR_EACH_EDGE (e, ei, bb->succs)
 if (e->flags & EDGE_ABNORMAL)
-  goto earlyout;
+  return todo;
 
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
 {
@@ -765,13 +764,6 @@ sink_code_in_bb (basic_block bb)
gsi_prev (&gsi);
 
 }
- earlyout:
-  for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
-   son;
-   son = next_dom_son (CDI_POST_DOMINATORS, son))
-{
-  todo |= sink_code_in_bb (son);
-}
 
   return todo;
 }
@@ -861,7 +853,20 @@ pass_sink_code::execute (function *fun)
   memset (&sink_stats, 0, sizeof (sink_stats));
   calculate_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_POST_DOMINATORS);
-  todo |= sink_code_in_bb (EXIT_BLOCK_PTR_FOR_FN (fun));
+
+  auto_vec worklist;
+  worklist.quick_push (EXIT_BLOCK_PTR_FOR_FN (fun));
+  do
+{
+  basic_block bb = worklist.pop ();
+  todo |= sink_code_in_bb (bb);
+  for (basic_block son = first_dom_son (CDI_POST_DOMINATORS, bb);
+  son;
+  son = next_dom_son (CDI_POST_DOMINATORS, son))
+   worklist.safe_push (son);
+}
+  while (!worklist.is_empty ());
+
   statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
   statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
   free_dominance_info (CDI_POST_DOMINATORS);
-- 
2.35.3


Re: Re: [PATCH V3] RISC-V: Enable basic VLS modes support

2023-07-27 Thread juzhe.zh...@rivai.ai
Hi, kito.
I tried to reject mem->mem in this pattern:
(define_insn_and_split "*mov"
  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
(match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
  "TARGET_VECTOR
   && (register_operand (operands[0], mode)
   || register_operand (operands[1], mode))"
  "@
   #
   #
   vmv%m1r.v\t%0,%1"
  "&& reload_completed
   && (!register_operand (operands[0], mode)
   || !register_operand (operands[1], mode))"
  [(const_int 0)]
  {
bool ok_p = riscv_vector::legitimize_move (operands[0], operands[1]);
gcc_assert (ok_p);
DONE;
  }
)


It cause ICE in regression during "vregs" (Before RA).
[jzzhong@server1:/work/home/jzzhong/work/insn]$~/work/rvv-opensource/output/gcc-rv64/bin/riscv64-rivai-elf-gcc
 -march=rv64gc_zve32f -mabi=lp64d -O3 -S 
--param=riscv-autovec-preference=scalable -fdump-rtl-all auto.c
auto.c: In function 'foo0':
auto.c:15:1: error: unrecognizable insn:
   15 | }
  | ^
(insn 35 34 36 6 (set (mem:V8QI (reg/f:DI 154 [ _64 ]) [0 MEM  [(int8_t *)_64]+0 S8 A64])
(mem/u/c:V8QI (reg/f:DI 185) [0  S8 A64])) "auto.c":11:20 -1
 (nil))
during RTL pass: vregs
dump file: auto.c.259r.vregs


It seems that we need a placeholder pattern to hold mem->mem ?

Could you help me with that ?


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-27 17:19
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V3] RISC-V: Enable basic VLS modes support
Last minor thing :)
 
> +(define_insn_and_split "*mov"
> +  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> +   (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
> +  "TARGET_VECTOR"
 
Reject (set (mem) (mem)) by adding the check:
 
TARGET_VECTOR
&& (register_operand (operands[0], mode)
|| register_operand (operands[1], mode))"
 


[PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread demin . han
When pass split2 starts, which_alternative is random depending on
last set of certain pass.

Even initialized, the generated movement is redundant.
The movement can be generated by assembly output template.

Signed-off-by: demin.han 

gcc/ChangeLog:

* config/riscv/autovec.md: Delete which_alternative use in split

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.

---
 gcc/config/riscv/autovec.md | 12 
 .../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 +
 2 files changed, 13 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d899922586a..b7ea3101f5a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
   riscv_vector::RVV_TERNOP, ops, 
operands[4]);
@@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
   riscv_vector::RVV_TERNOP, ops, 
operands[4]);
@@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
@@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
@@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
@@ -1242,8 +1232,6 @@ (define_insn_and_split "*fnms"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (MINUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, 
operands[4]);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
new file mode 100644
index 000..14a9802667e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce 
--param riscv-autovec-preference=scalable" } */
+
+long
+foo (long *__restrict a, long *__restrict b, long n)
+{
+  long i;
+  for (i = 0; i < n; ++i)
+a[i] = b[i] + i * 8;
+  return a[1];
+}
+
+/* { dg-final { scan-assembler-times {\tvmv1r\.v} 1 } } */
-- 
2.41.0



Re: Re: [PATCH V3] RISC-V: Enable basic VLS modes support

2023-07-27 Thread Kito Cheng via Gcc-patches
Hmmm, does it mean we'll have (set (mem) (mem)) after legitimize_move???

Or maybe try to

use define_insn_and_split rather than define_split for the (set (mem) (mem))

On Thu, Jul 27, 2023 at 5:50 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, kito.
> I tried to reject mem->mem in this pattern:
> (define_insn_and_split "*mov"
>   [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
>   "TARGET_VECTOR
>&& (register_operand (operands[0], mode)
>|| register_operand (operands[1], mode))"
>   "@
>#
>#
>vmv%m1r.v\t%0,%1"
>   "&& reload_completed
>&& (!register_operand (operands[0], mode)
>|| !register_operand (operands[1], mode))"
>   [(const_int 0)]
>   {
> bool ok_p = riscv_vector::legitimize_move (operands[0], operands[1]);
> gcc_assert (ok_p);
> DONE;
>   }
> )
>
>
> It cause ICE in regression during "vregs" (Before RA).
> [jzzhong@server1:/work/home/jzzhong/work/insn]$~/work/rvv-opensource/output/gcc-rv64/bin/riscv64-rivai-elf-gcc
>  -march=rv64gc_zve32f -mabi=lp64d -O3 -S 
> --param=riscv-autovec-preference=scalable -fdump-rtl-all auto.c
> auto.c: In function 'foo0':
> auto.c:15:1: error: unrecognizable insn:
>15 | }
>   | ^
> (insn 35 34 36 6 (set (mem:V8QI (reg/f:DI 154 [ _64 ]) [0 MEM  signed char> [(int8_t *)_64]+0 S8 A64])
> (mem/u/c:V8QI (reg/f:DI 185) [0  S8 A64])) "auto.c":11:20 -1
>  (nil))
> during RTL pass: vregs
> dump file: auto.c.259r.vregs
>
>
> It seems that we need a placeholder pattern to hold mem->mem ?
>
> Could you help me with that ?
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-07-27 17:19
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH V3] RISC-V: Enable basic VLS modes support
> Last minor thing :)
>
> > +(define_insn_and_split "*mov"
> > +  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> > +   (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
> > +  "TARGET_VECTOR"
>
> Reject (set (mem) (mem)) by adding the check:
>
> TARGET_VECTOR
> && (register_operand (operands[0], mode)
> || register_operand (operands[1], mode))"
>


Re: Re: [PATCH V3] RISC-V: Enable basic VLS modes support

2023-07-27 Thread juzhe.zh...@rivai.ai
I change as follows:
(define_insn_and_split "*mov_mem_to_mem"
  [(set (match_operand:VLS_AVL_IMM 0 "memory_operand")
(match_operand:VLS_AVL_IMM 1 "memory_operand"))]
  "TARGET_VECTOR && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(const_int 0)]
  {
if (GET_MODE_BITSIZE (mode).to_constant () <= MAX_BITS_PER_WORD)
  {
/* Opitmize the following case:

typedef int8_t v2qi __attribute__ ((vector_size (2)));
v2qi v = *(v2qi*)in;
*(v2qi*)out = v;

We prefer scalar load/store instead of vle.v/vse.v when
the VLS modes size is smaller scalar mode.  */
machine_mode mode;
unsigned size = GET_MODE_BITSIZE (mode).to_constant ();
if (FLOAT_MODE_P (mode))
  mode = mode_for_size (size, MODE_FLOAT, 0).require ();
else
  mode = mode_for_size (size, MODE_INT, 0).require ();
emit_move_insn (gen_lowpart (mode, operands[0]),
gen_lowpart (mode, operands[1]));
DONE;
  }
else
  operands[1] = force_reg (mode, operands[1]);
  }
)

(define_insn_and_split "*mov"
  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
(match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
  "TARGET_VECTOR
   && (register_operand (operands[0], mode)
   || register_operand (operands[1], mode))"
  "@
   #
   #
   vmv%m1r.v\t%0,%1"
  "&& reload_completed
   && (!register_operand (operands[0], mode)
   || !register_operand (operands[1], mode))"
  [(const_int 0)]
  {
bool ok_p = riscv_vector::legitimize_move (operands[0], operands[1]);
gcc_assert (ok_p);
DONE;
  }
)

Is it reasonable to you?


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-27 17:57
To: juzhe.zh...@rivai.ai
CC: gcc-patches; kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V3] RISC-V: Enable basic VLS modes support
Hmmm, does it mean we'll have (set (mem) (mem)) after legitimize_move???
 
Or maybe try to
 
use define_insn_and_split rather than define_split for the (set (mem) (mem))
 
On Thu, Jul 27, 2023 at 5:50 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, kito.
> I tried to reject mem->mem in this pattern:
> (define_insn_and_split "*mov"
>   [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
>   "TARGET_VECTOR
>&& (register_operand (operands[0], mode)
>|| register_operand (operands[1], mode))"
>   "@
>#
>#
>vmv%m1r.v\t%0,%1"
>   "&& reload_completed
>&& (!register_operand (operands[0], mode)
>|| !register_operand (operands[1], mode))"
>   [(const_int 0)]
>   {
> bool ok_p = riscv_vector::legitimize_move (operands[0], operands[1]);
> gcc_assert (ok_p);
> DONE;
>   }
> )
>
>
> It cause ICE in regression during "vregs" (Before RA).
> [jzzhong@server1:/work/home/jzzhong/work/insn]$~/work/rvv-opensource/output/gcc-rv64/bin/riscv64-rivai-elf-gcc
>  -march=rv64gc_zve32f -mabi=lp64d -O3 -S 
> --param=riscv-autovec-preference=scalable -fdump-rtl-all auto.c
> auto.c: In function 'foo0':
> auto.c:15:1: error: unrecognizable insn:
>15 | }
>   | ^
> (insn 35 34 36 6 (set (mem:V8QI (reg/f:DI 154 [ _64 ]) [0 MEM  signed char> [(int8_t *)_64]+0 S8 A64])
> (mem/u/c:V8QI (reg/f:DI 185) [0  S8 A64])) "auto.c":11:20 -1
>  (nil))
> during RTL pass: vregs
> dump file: auto.c.259r.vregs
>
>
> It seems that we need a placeholder pattern to hold mem->mem ?
>
> Could you help me with that ?
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-07-27 17:19
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH V3] RISC-V: Enable basic VLS modes support
> Last minor thing :)
>
> > +(define_insn_and_split "*mov"
> > +  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> > +   (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
> > +  "TARGET_VECTOR"
>
> Reject (set (mem) (mem)) by adding the check:
>
> TARGET_VECTOR
> && (register_operand (operands[0], mode)
> || register_operand (operands[1], mode))"
>
 


Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread juzhe.zh...@rivai.ai
Oh, YES.

Thanks for fixing it. It makes sense since the ternary operations in "vector.md"
generate "vmv.v.v" according to RA.

Thanks for fixing it.

@kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy to 
commit patches :).



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2023-07-27 17:48
To: gcc-patches@gcc.gnu.org
CC: kito.ch...@gmail.com; juzhe.zh...@rivai.ai
Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
which_alternative
When pass split2 starts, which_alternative is random depending on
last set of certain pass.
 
Even initialized, the generated movement is redundant.
The movement can be generated by assembly output template.
 
Signed-off-by: demin.han 
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Delete which_alternative use in split
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.
 
---
gcc/config/riscv/autovec.md | 12 
.../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 +
2 files changed, 13 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d899922586a..b7ea3101f5a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1242,8 +1232,6 @@ (define_insn_and_split "*fnms"
   [(const_int 0)]
   {
 riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
 rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
 riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (MINUS, 
mode),
  riscv_vector::RVV_TERNOP, ops, operands[4]);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
new file mode 100644
index 000..14a9802667e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce 
--param riscv-autovec-preference=scalable" } */
+
+long
+foo (long *__restrict a, long *__restrict b, long n)
+{
+  long i;
+  for (i = 0; i < n; ++i)
+a[i] = b[i] + i * 8;
+  return a[1];
+}
+
+/* { dg-final { scan-assembler-times {\tvmv1r\.v} 1 } } */
-- 
2.41.0
 
 


RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Li, Pan2 via Gcc-patches
> Yes, potentially similar for all the other ifs but I didn't check
> all of them.

Thanks and sure thing, will clean up this in v8.

> if (mode != no_mode && mode != last_mode)
>  {

This comes from after not the emit part as I mentioned, I am not quit familiar 
with this part, as well as if the after part will effect on emit or not.

But I would like to say the if (prev_mode != mode) can be consider as defensive 
code from the Backend (aka the consumer of the mode switching framework).
I think it is not good idea to do something based on the mode switching 
framework implementation details.

As a hook (interface) provided by mode switch framework, aka RISC-V backend as 
the user of mode switching framework. The implementation of hook in RISC-V 
Should only follow the sematics, and get rid of any implementation of the 
framework itself. Or we may have many implicit dependency across kinds of 
components, and may shoot your own leg one day.

Sorry for disturbing and share some of my two cents. 

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, July 27, 2023 4:42 PM
To: Li, Pan2 ; Kito Cheng 
Cc: rdapp@gmail.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, 
Yanzhang 
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding


> I see, you mean at the beginning of frm_after, we can just return the 
> incoming mode as is?
> 
> If (CALL_P (insn))
>   return mode; // Given we aware the mode is DYN_CALL already.

Yes, potentially similar for all the other ifs but I didn't check
all of them.

> Thank and will cleanup this in v8. AFAIK, the optimize_mode_switching
> only check ptr->mode != no_mode before emit, not sure if I missed
> something.

 if (mode != no_mode && mode != last_mode)
{

Shouldn't this cover us?  I didn't run the testsuite or so but it looks
like it.

Regards
 Robin



Re: Re: [PATCH V3] RISC-V: Enable basic VLS modes support

2023-07-27 Thread juzhe.zh...@rivai.ai
>> Hmmm, does it mean we'll have (set (mem) (mem)) after legitimize_move???
The answer is yes. It's odd I know.

And I found the regression fail, for mem to mem pattern, I change it into:
(define_insn_and_split "*mov_mem_to_mem"
  [(set (match_operand:VLS_AVL_IMM 0 "memory_operand")
(match_operand:VLS_AVL_IMM 1 "memory_operand"))]
  "TARGET_VECTOR && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(const_int 0)]
  {
if (GET_MODE_BITSIZE (mode).to_constant () <= MAX_BITS_PER_WORD)
  {
/* Opitmize the following case:

typedef int8_t v2qi __attribute__ ((vector_size (2)));
v2qi v = *(v2qi*)in;
*(v2qi*)out = v;

We prefer scalar load/store instead of vle.v/vse.v when
the VLS modes size is smaller scalar mode.  */
machine_mode mode;
unsigned size = GET_MODE_BITSIZE (mode).to_constant ();
if (FLOAT_MODE_P (mode))
  mode = mode_for_size (size, MODE_FLOAT, 0).require ();
else
  mode = mode_for_size (size, MODE_INT, 0).require ();
emit_move_insn (gen_lowpart (mode, operands[0]),
gen_lowpart (mode, operands[1]));
  }
else
  {
operands[1] = force_reg (mode, operands[1]);
emit_move_insn (operands[0], operands[1]);
  }
  DONE;
  }
)

Now regression passed




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-27 17:57
To: juzhe.zh...@rivai.ai
CC: gcc-patches; kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V3] RISC-V: Enable basic VLS modes support
Hmmm, does it mean we'll have (set (mem) (mem)) after legitimize_move???
 
Or maybe try to
 
use define_insn_and_split rather than define_split for the (set (mem) (mem))
 
On Thu, Jul 27, 2023 at 5:50 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, kito.
> I tried to reject mem->mem in this pattern:
> (define_insn_and_split "*mov"
>   [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
>   "TARGET_VECTOR
>&& (register_operand (operands[0], mode)
>|| register_operand (operands[1], mode))"
>   "@
>#
>#
>vmv%m1r.v\t%0,%1"
>   "&& reload_completed
>&& (!register_operand (operands[0], mode)
>|| !register_operand (operands[1], mode))"
>   [(const_int 0)]
>   {
> bool ok_p = riscv_vector::legitimize_move (operands[0], operands[1]);
> gcc_assert (ok_p);
> DONE;
>   }
> )
>
>
> It cause ICE in regression during "vregs" (Before RA).
> [jzzhong@server1:/work/home/jzzhong/work/insn]$~/work/rvv-opensource/output/gcc-rv64/bin/riscv64-rivai-elf-gcc
>  -march=rv64gc_zve32f -mabi=lp64d -O3 -S 
> --param=riscv-autovec-preference=scalable -fdump-rtl-all auto.c
> auto.c: In function 'foo0':
> auto.c:15:1: error: unrecognizable insn:
>15 | }
>   | ^
> (insn 35 34 36 6 (set (mem:V8QI (reg/f:DI 154 [ _64 ]) [0 MEM  signed char> [(int8_t *)_64]+0 S8 A64])
> (mem/u/c:V8QI (reg/f:DI 185) [0  S8 A64])) "auto.c":11:20 -1
>  (nil))
> during RTL pass: vregs
> dump file: auto.c.259r.vregs
>
>
> It seems that we need a placeholder pattern to hold mem->mem ?
>
> Could you help me with that ?
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-07-27 17:19
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH V3] RISC-V: Enable basic VLS modes support
> Last minor thing :)
>
> > +(define_insn_and_split "*mov"
> > +  [(set (match_operand:VLS_AVL_IMM 0 "reg_or_mem_operand" "=vr, m, vr")
> > +   (match_operand:VLS_AVL_IMM 1 "reg_or_mem_operand" "  m,vr, vr"))]
> > +  "TARGET_VECTOR"
>
> Reject (set (mem) (mem)) by adding the check:
>
> TARGET_VECTOR
> && (register_operand (operands[0], mode)
> || register_operand (operands[1], mode))"
>
 


[PATCH] XFAIL parts broken deliberately by r13-1762-gf9d4c3b45c5ed5

2023-07-27 Thread Richard Biener via Gcc-patches
The following XFAILs recognizing a complex store as memset.

Tested on x86_64-unknown-linux-gnu, pushed to trunk and branch.

PR tree-optimization/110829
* gcc.dg/pr56837.c: XFAIL part of the testcase.
---
 gcc/testsuite/gcc.dg/pr56837.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr56837.c b/gcc/testsuite/gcc.dg/pr56837.c
index 0bb958f0b0d..6466b119489 100644
--- a/gcc/testsuite/gcc.dg/pr56837.c
+++ b/gcc/testsuite/gcc.dg/pr56837.c
@@ -62,5 +62,5 @@ fv (void)
 /* { dg-final { scan-tree-dump-times "memset ..d, 68, 8192.;" 1 "optimized" } 
} */
 /* { dg-final { scan-tree-dump-times "memset ..l, 124, 8192.;" 1 "optimized" } 
} */
 /* { dg-final { scan-tree-dump-times "memset ..b, 1, 1024.;" 1 "optimized" } } 
*/
-/* { dg-final { scan-tree-dump-times "memset ..c, 68, 16384.;" 1 "optimized" } 
} */
+/* { dg-final { scan-tree-dump-times "memset ..c, 68, 16384.;" 1 "optimized" { 
xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times "memset ..v, 18, 16384.;" 1 "optimized" } 
} */
-- 
2.35.3


[PATCH v1] RISC-V: Remove unnecessary vread_csr/vwrite_csr intrinsic.

2023-07-27 Thread Pan Li via Gcc-patches
From: Pan Li 

According to below RVV doc, the related intrinsic is not longer needed.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/249

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv_vector.h (enum RVV_CSR): Removed.
(vread_csr): Ditto.
(vwrite_csr): Ditto.
---
 gcc/config/riscv/riscv_vector.h | 51 -
 1 file changed, 51 deletions(-)

diff --git a/gcc/config/riscv/riscv_vector.h b/gcc/config/riscv/riscv_vector.h
index ff54b6be863..3366fd972b5 100644
--- a/gcc/config/riscv/riscv_vector.h
+++ b/gcc/config/riscv/riscv_vector.h
@@ -35,57 +35,6 @@
 extern "C" {
 #endif
 
-enum RVV_CSR {
-  RVV_VSTART = 0,
-  RVV_VXSAT,
-  RVV_VXRM,
-  RVV_VCSR,
-};
-
-__extension__ extern __inline unsigned long
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vread_csr(enum RVV_CSR csr)
-{
-  unsigned long rv = 0;
-  switch (csr)
-{
-case RVV_VSTART:
-  __asm__ __volatile__ ("csrr\t%0,vstart" : "=r"(rv) : : "memory");
-  break;
-case RVV_VXSAT:
-  __asm__ __volatile__ ("csrr\t%0,vxsat" : "=r"(rv) : : "memory");
-  break;
-case RVV_VXRM:
-  __asm__ __volatile__ ("csrr\t%0,vxrm" : "=r"(rv) : : "memory");
-  break;
-case RVV_VCSR:
-  __asm__ __volatile__ ("csrr\t%0,vcsr" : "=r"(rv) : : "memory");
-  break;
-}
-  return rv;
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vwrite_csr(enum RVV_CSR csr, unsigned long value)
-{
-  switch (csr)
-{
-case RVV_VSTART:
-  __asm__ __volatile__ ("csrw\tvstart,%z0" : : "rJ"(value) : "memory");
-  break;
-case RVV_VXSAT:
-  __asm__ __volatile__ ("csrw\tvxsat,%z0" : : "rJ"(value) : "memory");
-  break;
-case RVV_VXRM:
-  __asm__ __volatile__ ("csrw\tvxrm,%z0" : : "rJ"(value) : "memory");
-  break;
-case RVV_VCSR:
-  __asm__ __volatile__ ("csrw\tvcsr,%z0" : : "rJ"(value) : "memory");
-  break;
-}
-}
-
 /* NOTE: This implementation of riscv_vector.h is intentionally short.  It does
not define the RVV types and intrinsic functions directly in C and C++
code, but instead uses the following pragma to tell GCC to insert the
-- 
2.34.1



Re: [PATCH v1] RISC-V: Remove unnecessary vread_csr/vwrite_csr intrinsic.

2023-07-27 Thread Kito Cheng via Gcc-patches
Ok, thanks:)

Pan Li via Gcc-patches  於 2023年7月27日 週四 18:45 寫道:

> From: Pan Li 
>
> According to below RVV doc, the related intrinsic is not longer needed.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/249
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv_vector.h (enum RVV_CSR): Removed.
> (vread_csr): Ditto.
> (vwrite_csr): Ditto.
> ---
>  gcc/config/riscv/riscv_vector.h | 51 -
>  1 file changed, 51 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv_vector.h
> b/gcc/config/riscv/riscv_vector.h
> index ff54b6be863..3366fd972b5 100644
> --- a/gcc/config/riscv/riscv_vector.h
> +++ b/gcc/config/riscv/riscv_vector.h
> @@ -35,57 +35,6 @@
>  extern "C" {
>  #endif
>
> -enum RVV_CSR {
> -  RVV_VSTART = 0,
> -  RVV_VXSAT,
> -  RVV_VXRM,
> -  RVV_VCSR,
> -};
> -
> -__extension__ extern __inline unsigned long
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vread_csr(enum RVV_CSR csr)
> -{
> -  unsigned long rv = 0;
> -  switch (csr)
> -{
> -case RVV_VSTART:
> -  __asm__ __volatile__ ("csrr\t%0,vstart" : "=r"(rv) : : "memory");
> -  break;
> -case RVV_VXSAT:
> -  __asm__ __volatile__ ("csrr\t%0,vxsat" : "=r"(rv) : : "memory");
> -  break;
> -case RVV_VXRM:
> -  __asm__ __volatile__ ("csrr\t%0,vxrm" : "=r"(rv) : : "memory");
> -  break;
> -case RVV_VCSR:
> -  __asm__ __volatile__ ("csrr\t%0,vcsr" : "=r"(rv) : : "memory");
> -  break;
> -}
> -  return rv;
> -}
> -
> -__extension__ extern __inline void
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vwrite_csr(enum RVV_CSR csr, unsigned long value)
> -{
> -  switch (csr)
> -{
> -case RVV_VSTART:
> -  __asm__ __volatile__ ("csrw\tvstart,%z0" : : "rJ"(value) :
> "memory");
> -  break;
> -case RVV_VXSAT:
> -  __asm__ __volatile__ ("csrw\tvxsat,%z0" : : "rJ"(value) : "memory");
> -  break;
> -case RVV_VXRM:
> -  __asm__ __volatile__ ("csrw\tvxrm,%z0" : : "rJ"(value) : "memory");
> -  break;
> -case RVV_VCSR:
> -  __asm__ __volatile__ ("csrw\tvcsr,%z0" : : "rJ"(value) : "memory");
> -  break;
> -}
> -}
> -
>  /* NOTE: This implementation of riscv_vector.h is intentionally short.
> It does
> not define the RVV types and intrinsic functions directly in C and C++
> code, but instead uses the following pragma to tell GCC to insert the
> --
> 2.34.1
>
>


Re: [PATCH] fix pdp11_expand_epilogue (PR target/107841)

2023-07-27 Thread Maciej W. Rozycki
On Thu, 13 Jul 2023, Jeff Law via Gcc-patches wrote:

> > Question for the experts: how is this handled?  Do I need to apply this
> > change to my workspace and commit it, with Mikael as the change author?
> That's what I usually do for someone without write access.  commit it locally
> using the --author flag, then push.

 FWIW `git am' taking the e-mail containing a change on stdin is I believe 
the most straightforward way, as long as the submission has been properly 
formatted.

 The command automatically sets the author correctly, picking the `From:' 
entry from the e-mail headers or the first line of the e-mail body as 
appropriate, sets the author date from the `Date:' header, picks up the 
contents of the `Subject:' header as the change heading and the body of 
the e-mail sans any `From:' entry as the change description, and strips 
any discussion between the non-commit delimiter (`--') and the patch 
itself.

 The command was designed for maintainers to import changes submitted by 
e-mail so it does what expected.  Any issues with an imported change can 
then be addressed with `git commit --amend', etc.

 HTH,

  Maciej


Re: [PATCH v2] RISC-V: testsuite: Add vector_hw and zvfh_hw checks.

2023-07-27 Thread Kito Cheng via Gcc-patches
LGTM, I just found this patch still on the list, I mostly tested with
qemu, so I don't think that is a problem before, but I realize it's a
problem when we run on a real board that does not support those
extensions.

On Sun, Jun 18, 2023 at 6:07 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/15/23 09:06, Robin Dapp wrote:
> > Hi,
> >
> > Changes from v1:
> >   - Revamped the target selectors again.
> >   - Fixed some syntax as well as caching errors that were still present.
> >   - Adjusted some test cases I missed.
> >
> > The current situation with target selectors is improvable at best.
> > We definitely need to discern between being able to build a
> > test with the current configuration and running the test on the
> > current target which this patch attempts to do.  There might
> > be a need for more fine-grained checks in the future that could
> > also go into our target-specific riscv.exp in the subdirectories
> > but for now I think we're good.
> >
> > A bit more detail is in the patch description below.  The testsuite
> > is as clean as before for the configurations I tried: default, rv64gcv,
> > rv64gcv_zfhmin, rv64gc, rv64gc_zfh, rv64gc_zfhmin.  I hope I didn't
> > overlook tests that appear unsupported now but shouldn't be.
> >
> > @Pan: No need to check the old version anymore, thanks.  This patch
> > is preferred.
> >
> > Regards
> >   Robin
> >
> >
> > This introduces new checks for run tests.  Currently we have
> > riscv_vector as well as rv32 and rv64 which all check if GCC (with the
> > current configuration) can build the respective tests.
> >
> > Many tests specify e.g. a different -march for vector which
> > makes the check fail even though we could build as well as run
> > those tests.
> >
> > vector_hw now tries to compile, link and execute a simple vector example
> > file.  If this succeeds the respective test can run.
> >
> > Similarly we introduce a zvfh_hw check which will be used in the
> > upcoming floating-point unop/binop tests as well as rv32_hw and
> > rv64_hw checks that are currently unused.
> >
> > To conclude:
> >   - If we want a testcase to only compile when the current configuration
> > has vector support we use {riscv_vector}.
> >   - If we want a testcase to run when the current target can supports
> > executing vector instructions we use {riscv_vector_hw}.
> > It still needs to be ensured that we can actually build the test
> > which can be achieved by either
> > (1) compiling with e.g. -march=rv64gcv or
> > (2) only enabling the test when the current configuration supports
> >   vector via {riscsv_vector}.
> >
> > The same principle applies for zfh, zfhmin and zvfh but we do not yet
> > have all target selectors.  In the meanwhile we need to make sure to
> > specify the proper -march flags like in (1).
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/rvv/autovec/binop/shift-run.c: Use
> >   riscv_vector_hw.
> >   * gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c:
> >   Dito.
> >   * gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c:
> >   Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c:
> >   Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:
> >   Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c:
> >   Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c:
> >   Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: Dito.
> >   * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: Dito.
> >   * gcc.target/riscv/rvv/auto

Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread Kito Cheng via Gcc-patches
My first impression is those emit_insn (gen_rtx_SET()) seems
necessary, but I got the point after I checked vector.md :P

Committed to trunk, thanks :)


On Thu, Jul 27, 2023 at 6:23 PM juzhe.zh...@rivai.ai
 wrote:
>
> Oh, YES.
>
> Thanks for fixing it. It makes sense since the ternary operations in 
> "vector.md"
> generate "vmv.v.v" according to RA.
>
> Thanks for fixing it.
>
> @kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy 
> to commit patches :).
>
>
>
> juzhe.zh...@rivai.ai
>
> From: demin.han
> Date: 2023-07-27 17:48
> To: gcc-patches@gcc.gnu.org
> CC: kito.ch...@gmail.com; juzhe.zh...@rivai.ai
> Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
> which_alternative
> When pass split2 starts, which_alternative is random depending on
> last set of certain pass.
>
> Even initialized, the generated movement is redundant.
> The movement can be generated by assembly output template.
>
> Signed-off-by: demin.han 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Delete which_alternative use in split
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.
>
> ---
> gcc/config/riscv/autovec.md | 12 
> .../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 +
> 2 files changed, 13 insertions(+), 12 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index d899922586a..b7ea3101f5a 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
> -if (which_alternative == 2)
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
> (mode),
>riscv_vector::RVV_TERNOP, ops, operands[4]);
> @@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
> -if (which_alternative == 2)
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
> (mode),
> riscv_vector::RVV_TERNOP, ops, operands[4]);
> @@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
> -if (which_alternative == 2)
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
> mode),
>   riscv_vector::RVV_TERNOP, ops, operands[4]);
> @@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
> -if (which_alternative == 2)
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
> mode),
>   riscv_vector::RVV_TERNOP, ops, operands[4]);
> @@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
> -if (which_alternative == 2)
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
> mode),
>   riscv_vector::RVV_TERNOP, ops, operands[4]);
> @@ -1242,8 +1232,6 @@ (define_insn_and_split "*fnms"
>[(const_int 0)]
>{
>  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
> -if (which_alternative == 2)
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[0]};
>  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (MINUS, 
> mode),
>   riscv_vector::RVV_TERNOP, ops, operands[4]);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> new file mode 100644
> index 000..14a9802667e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce 
> --param riscv-autovec-preference=scalable" } */
> +
> +long
> +foo (long *__restrict a, long *__restrict b, long n)
> +{
> +  long i;
> +  for (i = 0; i < n; ++i)
> +a[i] = b[i] +

[PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

2023-07-27 Thread Richard Biener via Gcc-patches
The following fixes the lack of simplification of a vector shift
by an out-of-bounds shift value.  For scalars this is done both
by CCP and VRP but vectors are not handled there.  This results
in PR91838 differences in outcome dependent on whether a vector
shift ISA is available and thus vector lowering does or does not
expose scalar shifts here.

The following adds a match.pd pattern to catch uniform out-of-bound
shifts, simplifying them to zero when not sanitizing shift amounts.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/91838
* match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
---
 gcc/match.pd | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index a443dc48634..eace7d635e7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1059,6 +1059,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Shifts by precision or greater result in zero.  */
+(for shift (lshift rshift)
+ (simplify
+  (shift @0 uniform_integer_cst_p@1)
+  (if (!(flag_sanitize & SANITIZE_SHIFT_EXPONENT)
+   /* Use a signed compare to leave negative shift counts alone.  */
+   && wi::ges_p (wi::to_wide (uniform_integer_cst_p (@1)),
+element_precision (type)))
+   { build_zero_cst (type); })))
+
 /* Shifts by constants distribute over several binary operations,
hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
 (for op (plus minus)
-- 
2.35.3


Re: [PATCH v2] RISC-V: testsuite: Add vector_hw and zvfh_hw checks.

2023-07-27 Thread Robin Dapp via Gcc-patches
> LGTM, I just found this patch still on the list, I mostly tested with
> qemu, so I don't think that is a problem before, but I realize it's a
> problem when we run on a real board that does not support those
> extensions.

I think we can skip this one as I needed to introduce vector_hw and
zvfh_hw with another patch anyway.  What I still intended to do is an
-march-ext=... switch but that might be superseded already by Jörn's patch
that I wanted to have a look at soon anyway.

Regards
 Robin


RE: [PATCH v1] RISC-V: Remove unnecessary vread_csr/vwrite_csr intrinsic.

2023-07-27 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Thursday, July 27, 2023 6:50 PM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Remove unnecessary vread_csr/vwrite_csr 
intrinsic.

Ok, thanks:)

Pan Li via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> 於 2023年7月27日 週四 18:45 
寫道:
From: Pan Li mailto:pan2...@intel.com>>

According to below RVV doc, the related intrinsic is not longer needed.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/249

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv_vector.h (enum RVV_CSR): Removed.
(vread_csr): Ditto.
(vwrite_csr): Ditto.
---
 gcc/config/riscv/riscv_vector.h | 51 -
 1 file changed, 51 deletions(-)

diff --git a/gcc/config/riscv/riscv_vector.h b/gcc/config/riscv/riscv_vector.h
index ff54b6be863..3366fd972b5 100644
--- a/gcc/config/riscv/riscv_vector.h
+++ b/gcc/config/riscv/riscv_vector.h
@@ -35,57 +35,6 @@
 extern "C" {
 #endif

-enum RVV_CSR {
-  RVV_VSTART = 0,
-  RVV_VXSAT,
-  RVV_VXRM,
-  RVV_VCSR,
-};
-
-__extension__ extern __inline unsigned long
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vread_csr(enum RVV_CSR csr)
-{
-  unsigned long rv = 0;
-  switch (csr)
-{
-case RVV_VSTART:
-  __asm__ __volatile__ ("csrr\t%0,vstart" : "=r"(rv) : : "memory");
-  break;
-case RVV_VXSAT:
-  __asm__ __volatile__ ("csrr\t%0,vxsat" : "=r"(rv) : : "memory");
-  break;
-case RVV_VXRM:
-  __asm__ __volatile__ ("csrr\t%0,vxrm" : "=r"(rv) : : "memory");
-  break;
-case RVV_VCSR:
-  __asm__ __volatile__ ("csrr\t%0,vcsr" : "=r"(rv) : : "memory");
-  break;
-}
-  return rv;
-}
-
-__extension__ extern __inline void
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vwrite_csr(enum RVV_CSR csr, unsigned long value)
-{
-  switch (csr)
-{
-case RVV_VSTART:
-  __asm__ __volatile__ ("csrw\tvstart,%z0" : : "rJ"(value) : "memory");
-  break;
-case RVV_VXSAT:
-  __asm__ __volatile__ ("csrw\tvxsat,%z0" : : "rJ"(value) : "memory");
-  break;
-case RVV_VXRM:
-  __asm__ __volatile__ ("csrw\tvxrm,%z0" : : "rJ"(value) : "memory");
-  break;
-case RVV_VCSR:
-  __asm__ __volatile__ ("csrw\tvcsr,%z0" : : "rJ"(value) : "memory");
-  break;
-}
-}
-
 /* NOTE: This implementation of riscv_vector.h is intentionally short.  It does
not define the RVV types and intrinsic functions directly in C and C++
code, but instead uses the following pragma to tell GCC to insert the
--
2.34.1


Re: [PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 12:00:56PM +, Richard Biener wrote:
> The following fixes the lack of simplification of a vector shift
> by an out-of-bounds shift value.  For scalars this is done both
> by CCP and VRP but vectors are not handled there.  This results
> in PR91838 differences in outcome dependent on whether a vector
> shift ISA is available and thus vector lowering does or does not
> expose scalar shifts here.
> 
> The following adds a match.pd pattern to catch uniform out-of-bound
> shifts, simplifying them to zero when not sanitizing shift amounts.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/91838
>   * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.

The !(flag_sanitize & SANITIZE_SHIFT_EXPONENT)
should be !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
or maybe even
GIMPLE || !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
because the shift ubsan instrumentation is done on GENERIC, so it can be
optimized on GIMPLE even with ubsan.

Otherwise LGTM.

Jakub



Re: [PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

2023-07-27 Thread Richard Biener via Gcc-patches
On Thu, 27 Jul 2023, Jakub Jelinek wrote:

> On Thu, Jul 27, 2023 at 12:00:56PM +, Richard Biener wrote:
> > The following fixes the lack of simplification of a vector shift
> > by an out-of-bounds shift value.  For scalars this is done both
> > by CCP and VRP but vectors are not handled there.  This results
> > in PR91838 differences in outcome dependent on whether a vector
> > shift ISA is available and thus vector lowering does or does not
> > expose scalar shifts here.
> > 
> > The following adds a match.pd pattern to catch uniform out-of-bound
> > shifts, simplifying them to zero when not sanitizing shift amounts.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR tree-optimization/91838
> > * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
> 
> The !(flag_sanitize & SANITIZE_SHIFT_EXPONENT)
> should be !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
> or maybe even
> GIMPLE || !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
> because the shift ubsan instrumentation is done on GENERIC, so it can be
> optimized on GIMPLE even with ubsan.
> 
> Otherwise LGTM.

So like the following, will push after re-testing succeeded.

Thanks,
Richard.

>From c6d348acdc2143fc4c2849e33075a3975fe29b26 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 27 Jul 2023 13:08:32 +0200
Subject: [PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C
To: gcc-patches@gcc.gnu.org

The following fixes the lack of simplification of a vector shift
by an out-of-bounds shift value.  For scalars this is done both
by CCP and VRP but vectors are not handled there.  This results
in PR91838 differences in outcome dependent on whether a vector
shift ISA is available and thus vector lowering does or does not
expose scalar shifts here.

The following adds a match.pd pattern to catch uniform out-of-bound
shifts, simplifying them to zero when not sanitizing shift amounts.

PR tree-optimization/91838
* gimple-match-head.cc: Include attribs.h and asan.h.
* generic-match-head.cc: Likewise.
* match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
---
 gcc/generic-match-head.cc |  2 ++
 gcc/gimple-match-head.cc  |  2 ++
 gcc/match.pd  | 10 ++
 3 files changed, 14 insertions(+)

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index b4b5bc88f4b..a71c0727b0b 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "langhooks.h"
 #include "tree-pass.h"
+#include "attribs.h"
+#include "asan.h"
 
 /* Routine to determine if the types T1 and T2 are effectively
the same for GENERIC.  If T1 or T2 is not a type, the test
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index d795066e53e..5d6d26d009b 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "gimple-range.h"
 #include "langhooks.h"
+#include "attribs.h"
+#include "asan.h"
 
 tree do_valueize (tree, tree (*)(tree), bool &);
 tree do_valueize (tree (*)(tree), tree);
diff --git a/gcc/match.pd b/gcc/match.pd
index a443dc48634..fcb1a735507 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1059,6 +1059,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@1)))
(lshift @0 @2)))
 
+/* Shifts by precision or greater result in zero.  */
+(for shift (lshift rshift)
+ (simplify
+  (shift @0 uniform_integer_cst_p@1)
+  (if ((GIMPLE || !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT))
+   /* Use a signed compare to leave negative shift counts alone.  */
+   && wi::ges_p (wi::to_wide (uniform_integer_cst_p (@1)),
+element_precision (type)))
+   { build_zero_cst (type); })))
+
 /* Shifts by constants distribute over several binary operations,
hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
 (for op (plus minus)
-- 
2.35.3

k


Re: [PATCH] tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 01:07:58PM +, Richard Biener wrote:
> On Thu, 27 Jul 2023, Jakub Jelinek wrote:
> 
> > On Thu, Jul 27, 2023 at 12:00:56PM +, Richard Biener wrote:
> > > The following fixes the lack of simplification of a vector shift
> > > by an out-of-bounds shift value.  For scalars this is done both
> > > by CCP and VRP but vectors are not handled there.  This results
> > > in PR91838 differences in outcome dependent on whether a vector
> > > shift ISA is available and thus vector lowering does or does not
> > > expose scalar shifts here.
> > > 
> > > The following adds a match.pd pattern to catch uniform out-of-bound
> > > shifts, simplifying them to zero when not sanitizing shift amounts.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > 
> > > OK?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > >   PR tree-optimization/91838
> > >   * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
> > 
> > The !(flag_sanitize & SANITIZE_SHIFT_EXPONENT)
> > should be !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
> > or maybe even
> > GIMPLE || !sanitize_flags_p (SANITIZE_SHIFT_EXPONENT)
> > because the shift ubsan instrumentation is done on GENERIC, so it can be
> > optimized on GIMPLE even with ubsan.
> > 
> > Otherwise LGTM.
> 
> So like the following, will push after re-testing succeeded.

Yes, thanks.

Jakub



[PATCH] [x86] Add UNSPEC_MASKOP to vpbroadcastm pattern.

2023-07-27 Thread liuhongt via Gcc-patches
Prevent rtl optimization of vec_duplicate + zero_extend to
vpbroadcastm since there could be an extra kmov after RA.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready to push to trunk.

gcc/ChangeLog:

PR target/110788
* config/i386/sse.md (avx512cd_maskb_vec_dup): Add
UNSPEC_MASKOP.
(avx512cd_maskw_vec_dup: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110788.c: New test.
---
 gcc/config/i386/sse.md   |  8 ++--
 gcc/testsuite/gcc.target/i386/pr110788.c | 11 +++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110788.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 35fd66ed4aa..51961bbfc0b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -26778,11 +26778,14 @@ (define_insn 
"avx512dq_broadcast_1"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+;; Use unspec to prevent rtl optimizer to optimize zero_extend + vec_duplicate
+;; to pbroadcastm, there could be an extra kmov after RA.
 (define_insn "avx512cd_maskb_vec_dup"
   [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
(vec_duplicate:VI8_AVX512VL
  (zero_extend:DI
-   (match_operand:QI 1 "register_operand" "k"]
+   (match_operand:QI 1 "register_operand" "k"
+   (unspec [(const_int 0)] UNSPEC_MASKOP)]
   "TARGET_AVX512CD"
   "vpbroadcastmb2q\t{%1, %0|%0, %1}"
   [(set_attr "type" "mskmov")
@@ -26793,7 +26796,8 @@ (define_insn "avx512cd_maskw_vec_dup"
   [(set (match_operand:VI4_AVX512VL 0 "register_operand" "=v")
(vec_duplicate:VI4_AVX512VL
  (zero_extend:SI
-   (match_operand:HI 1 "register_operand" "k"]
+   (match_operand:HI 1 "register_operand" "k"
+   (unspec [(const_int 0)] UNSPEC_MASKOP)]
   "TARGET_AVX512CD"
   "vpbroadcastmw2d\t{%1, %0|%0, %1}"
   [(set_attr "type" "mskmov")
diff --git a/gcc/testsuite/gcc.target/i386/pr110788.c 
b/gcc/testsuite/gcc.target/i386/pr110788.c
new file mode 100644
index 000..4cf1676ccb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110788.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=cascadelake --param vect-partial-vector-usage=2" } 
*/
+/* { dg-final { scan-assembler-not "vpbroadcastm" } } */
+
+double a[1024], b[1024];
+
+void foo (int n)
+{
+  for (int i = 0; i < n; ++i)
+a[i] = b[i] * 3.;
+}
-- 
2.39.1.388.g2fc9e9ca3c



Fix profile_count::apply_probability

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
profile_count::apply_probability misses check for uninitialized
probability which leads to completely random results on applying
uninitialized probability to initialized scale.  This can make
difference when i.e. inlining -fno-guess-branch-probability function to
-fguess-branch-probability one.

Boootstrapped/regtested x86_64-linux, commited.
gcc/ChangeLog:

* profile-count.h (profile_count::apply_probability): Fix
handling of uninitialized probabilities, optimize scaling
by probability 1.

diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index bf1136782a3..e860c5db540 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -1129,11 +1132,11 @@ public:
   /* Scale counter according to PROB.  */
   profile_count apply_probability (profile_probability prob) const
 {
-  if (*this == zero ())
+  if (*this == zero () || prob == profile_probability::always ())
return *this;
   if (prob == profile_probability::never ())
return zero ();
-  if (!initialized_p ())
+  if (!initialized_p () || !prob.initialized_p ())
return uninitialized ();
   profile_count ret;
   uint64_t tmp;


Fix profile update in tree-ssa-loop-im.cc

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
this fixes two bugs in tree-ssa-loop-im.cc.  First is that cap probability is 
not
reliable, but it is constructed with adjusted quality.  Second is that sometimes
the conditional has wrong joiner BB count.  This is visible on
testsuite/gcc.dg/pr102385.c however the testcase triggers another profile
update bug in pcom, so I will update it in followup patch.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (execute_sm_if_changed): Turn cap probability
to guessed; fix count of new_bb.

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index f5b01e986ae..268f466bdc9 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2059,7 +2059,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, 
tree flag,
nbbs++;
 }
 
-  profile_probability cap = profile_probability::always ().apply_scale (2, 3);
+  profile_probability cap
+ = profile_probability::guessed_always ().apply_scale (2, 3);
 
   if (flag_probability.initialized_p ())
 ;
@@ -2103,6 +2104,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, 
tree flag,
 
   old_dest = ex->dest;
   new_bb = split_edge (ex);
+  if (append_cond_position)
+new_bb->count += last_cond_fallthru->count ();
   then_bb = create_empty_bb (new_bb);
   then_bb->count = new_bb->count.apply_probability (flag_probability);
   if (irr)


Fix profile update in tree_transform_and_unroll_loop

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
This patch fixes profile update in tree_transform_and_unroll_loop which is used
by predictive comming.  I stared by attempt to fix
gcc.dg/tree-ssa/update-unroll-1.c I xfailed last week, but it turned to be
harder job.

Unrolling was never fixed for changes in duplicate_loop_body_to_header_edge
which is now smarter on getting profile right when some exists are eliminated.
A lot of manual profile can thus now be done using existing infrastructure.

I also noticed that scale_dominated_blocks_in_loop does job identical
to loop I wrote in scale_loop_profile and thus I commonized the implementaiton
and removed recursion.

I also extended duplicate_loop_body_to_header_edge to handle flat profiles same
way as we do in vectorizer. Without it we end up with less then 0 iteration
count in gcc.dg/tree-ssa/update-unroll-1.c (it is unrolled 32times but predicted
to iterated fewer times) and added missing code to update loop_info.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfgloopmanip.cc (scale_dominated_blocks_in_loop): Move here from
tree-ssa-loop-manip.cc and avoid recursion.
(scale_loop_profile): Use scale_dominated_blocks_in_loop.
(duplicate_loop_body_to_header_edge): Add DLTHE_FLAG_FLAT_PROFILE
flag.
* cfgloopmanip.h (DLTHE_FLAG_FLAT_PROFILE): Define.
(scale_dominated_blocks_in_loop): Declare.
* predict.cc (dump_prediction): Do not ICE on uninitialized probability.
(change_edge_frequency): Remove.
* predict.h (change_edge_frequency): Remove.
* tree-ssa-loop-manip.cc (scale_dominated_blocks_in_loop): Move to
cfgloopmanip.cc.
(niter_for_unrolled_loop): Remove.
(tree_transform_and_unroll_loop): Fix profile update.

gcc/testsuite/ChangeLog:

* gcc.dg/pr102385.c: Check for no profile mismatches.
* gcc.dg/pr96931.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-1.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-2.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-3.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-4.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-5.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-7.c: Check for one profile mismatch.
* gcc.dg/tree-ssa/predcom-8.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-1.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-10.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-11.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-12.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-2.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-3.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-4.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-5.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-6.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-7.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-8.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-9.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/update-unroll-1.c: Unxfail.

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 3012a8d60f7..c3d292d0dd4 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -499,6 +499,32 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
   free (bbs);
 }
 
+/* Scales the frequencies of all basic blocks in LOOP that are strictly
+   dominated by BB by NUM/DEN.  */
+
+void
+scale_dominated_blocks_in_loop (class loop *loop, basic_block bb,
+   profile_count num, profile_count den)
+{
+  basic_block son;
+
+  if (!den.nonzero_p () && !(num == profile_count::zero ()))
+return;
+  auto_vec  worklist;
+  worklist.safe_push (bb);
+
+  while (!worklist.is_empty ())
+for (son = first_dom_son (CDI_DOMINATORS, worklist.pop ());
+son;
+son = next_dom_son (CDI_DOMINATORS, son))
+  {
+   if (!flow_bb_inside_loop_p (loop, son))
+ continue;
+   son->count = son->count.apply_scale (num, den);
+   worklist.safe_push (son);
+  }
+}
+
 /* Scale profile in LOOP by P.
If ITERATION_BOUND is not -1, scale even further if loop is predicted
to iterate too many times.
@@ -649,19 +675,9 @@ scale_loop_profile (class loop *loop, profile_probability 
p,
   if (other_edge && other_edge->dest == loop->latch)
loop->latch->count -= new_exit_count - old_exit_count;
   else
-   {
- basic_block *body = get_loop_body (loop);
- profile_count new_count = exit_edge->src->count - new_exit_count;
- profile_count old_count = exit_edge->src->count - old_exit_count;
-
- for (unsigned int i = 0

Re: [PATCH 0/5] Recognize Zicond extension

2023-07-27 Thread Jeff Law via Gcc-patches




On 7/27/23 02:43, Xiao Zeng wrote:



2. According to your opinions, I have modified the code, but out of caution
for upstream, I conducted a complete regression tests on patch V2, which took
some time. I was unable to reply to emails and upload patch V2 in a timely 
manner.
Sorry to have wasted your time -- zicond/xventanacondops has lingered 
for quite a while and I had a bit of free time yesterday.  I felt it was 
most useful to try and move this stuff forward.






3 After you and other maintainers made minor modifications to my patch[1/5]
and patch[2/5], it has been merged into the master, so I will no longer upload 
patch V2.

Agreed.



4 patch[1/5] and patch[2/5], which have been merged into the master, have only
completed basic support for Zicond, and further optimization work needs to be
completed. These further optimization reactions are reflected in my patch[3/5]
patch[4/5] and patch[5/5].

Agreed.



5 As you mentioned in your previous email 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625427.html
"eswincomputing and ventana can both reduce our divergence from the trunk
and work together on the rest of the bits...". I will reorganize patch[3/5] 
patch[4/5]
and patch[5/5], provide more detailed explanations, and submit them as an 
alternative
solution for further optimization of Zicond.

Does that work for you?
I'm going to look at 3/5 today pretty closely.  Exposing zicond to 
movcc is something we had implemented inside Ventana and I want to 
compare/contrast your work with ours.


What I like about yours is it keeps all the logic in riscv.cc rather 
than scattering it across riscv.cc and riscv.md.  What I like about the 
internal Ventana bits is its ability to support arbitrary comparisons by 
utilizing sCC if the original is not an eq/ne comparison.


Ideally we'll be able to get the best of both.

Jeff



[committed] libstdc++: Fix std::format alternate form for floating-point [PR108046]

2023-07-27 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk. Backport to gcc-13 to follow.

-- >8 --

A decimal point was being added to the end of the string for {:#.0}
because the __expc character was not being set, for the _Pres_none
presentation type, so __s.find(__expc) didn't the 'e' in "1e+01" and so
we created "1e+01." by appending the radix char to the end.

This can be fixed by ensuring that __expc='e' is set for the _Pres_none
case. I realized we can also set __expc='P' and __expc='E' when needed,
to save a call to std::toupper later.

For the {:#.0g} format, __expc='e' was being set and so the 'e' was
found in "1e+10" but then __z = __prec - __sigfigs would wraparound to
SIZE_MAX. That meant we would decide not to add a radix char because the
number of extra characters to insert would be 1+SIZE_MAX i.e. zero.

This can be fixed by using __z == 0 when __prec == 0.

libstdc++-v3/ChangeLog:

PR libstdc++/108046
* include/std/format (__formatter_fp::format): Ensure __expc is
always set for all presentation types. Set __z correctly for
zero precision.
* testsuite/std/format/functions/format.cc: Check problem cases.
---
 libstdc++-v3/include/std/format | 17 +
 .../testsuite/std/format/functions/format.cc|  4 
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 0c6069b2681..1e0ef612ddd 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1430,22 +1430,24 @@ namespace __format
  chars_format __fmt{};
  bool __upper = false;
  bool __trailing_zeros = false;
- char __expc = 0;
+ char __expc = 'e';
 
  switch (_M_spec._M_type)
  {
case _Pres_A:
  __upper = true;
+ __expc = 'P';
  [[fallthrough]];
case _Pres_a:
- __expc = 'p';
+ if (_M_spec._M_type != _Pres_A)
+   __expc = 'p';
  __fmt = chars_format::hex;
  break;
case _Pres_E:
  __upper = true;
+ __expc = 'E';
  [[fallthrough]];
case _Pres_e:
- __expc = 'e';
  __use_prec = true;
  __fmt = chars_format::scientific;
  break;
@@ -1455,10 +1457,10 @@ namespace __format
  break;
case _Pres_G:
  __upper = true;
+ __expc = 'E';
  [[fallthrough]];
case _Pres_g:
  __trailing_zeros = true;
- __expc = 'e';
  __use_prec = true;
  __fmt = chars_format::general;
  break;
@@ -1511,7 +1513,6 @@ namespace __format
{
  for (char* __p = __start; __p != __res.ptr; ++__p)
*__p = std::toupper(*__p);
- __expc = std::toupper(__expc);
}
 
  // Add sign for non-negative values.
@@ -1545,15 +1546,15 @@ namespace __format
  __p = __s.find(__expc);
  if (__p == __s.npos)
__p = __s.size();
- __d = __p;
+ __d = __p; // Position where '.' should be inserted.
  __sigfigs = __d;
}
 
- if (__trailing_zeros)
+ if (__trailing_zeros && __prec != 0)
{
  if (!__format::__is_xdigit(__s[0]))
--__sigfigs;
- __z = __prec - __sigfigs;
+ __z = __prec - __sigfigs; // Number of zeros to insert.
}
 
  if (size_t __extras = int(__d == __p) + __z)
diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index 3485535e3cb..bd914df6d7c 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -152,6 +152,10 @@ test_alternate_forms()
 
   s = std::format("{:#.2g}", -0.0);
   VERIFY( s == "-0.0" );
+
+  // PR libstdc++/108046
+  s = std::format("{0:#.0} {0:#.1} {0:#.0g}", 10.0);
+  VERIFY( s == "1.e+01 1.e+01 1.e+01" );
 }
 
 struct euro_punc : std::numpunct
-- 
2.41.0



[committed] OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725] (was: Re: [patch] OpenMP/Fortran: Reject declarations between target + teams (was: [Patch] OpenMP/Fortran: Rej

2023-07-27 Thread Tobias Burnus

Yet another omission, the flag was not properly set for deeply buried
'omp teams' as I stopped too early when walking up the stack.

Now fixed by commit r14-2826-g081e25d3cfd86c

* * *

This was found when 'repairing' the feature on the OG13
(devel/omp/gcc-13) branch for metadirectives, cf. the second attached
patch, applied after cherry-picking the mainline patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 081e25d3cfd86c4094999ded0bbe99b91762013c
Author: Tobias Burnus 
Date:   Thu Jul 27 18:14:11 2023 +0200

OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725]

The previous version failed to diagnose when the 'teams' was nested
more deeply inside the target region, e.g. inside a DO or some
block or structured block.

PR fortran/110725
PR middle-end/71065

gcc/fortran/ChangeLog:

* openmp.cc (resolve_omp_target): Minor cleanup.
* parse.cc (decode_omp_directive): Find TARGET statement
also higher in the stack.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/teams-6.f90: Extend.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 52eeaf2d4da..2952cd300ac 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -10666,15 +10666,14 @@ resolve_omp_target (gfc_code *code)
 
   if (!code->ext.omp_clauses->contains_teams_construct)
 return;
+  gfc_code *c = code->block->next;
   if (code->ext.omp_clauses->target_first_st_is_teams
-  && ((GFC_IS_TEAMS_CONSTRUCT (code->block->next->op)
-	   && code->block->next->next == NULL)
-	  || (code->block->next->op == EXEC_BLOCK
-	  && code->block->next->next
-	  && GFC_IS_TEAMS_CONSTRUCT (code->block->next->next->op)
-	  && code->block->next->next->next == NULL)))
+  && ((GFC_IS_TEAMS_CONSTRUCT (c->op) && c->next == NULL)
+	  || (c->op == EXEC_BLOCK
+	  && c->next
+	  && GFC_IS_TEAMS_CONSTRUCT (c->next->op)
+	  && c->next->next == NULL)))
 return;
-  gfc_code *c = code->block->next;
   while (c && !GFC_IS_TEAMS_CONSTRUCT (c->op))
 c = c->next;
   if (c)
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index aa6bb663def..e797402b59f 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1318,32 +1318,27 @@ decode_omp_directive (void)
 case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO:
 case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
 case ST_OMP_TEAMS_LOOP:
-  if (gfc_state_stack->previous && gfc_state_stack->previous->tail)
-	{
-	  gfc_state_data *stk = gfc_state_stack;
-	  do {
-	   stk = stk->previous;
-	 } while (stk && stk->tail && stk->tail->op == EXEC_BLOCK);
-	  if (stk && stk->tail)
-	switch (stk->tail->op)
-	  {
-	  case EXEC_OMP_TARGET:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_LOOP:
-	  case EXEC_OMP_TARGET_PARALLEL:
-	  case EXEC_OMP_TARGET_PARALLEL_DO:
-	  case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_PARALLEL_LOOP:
-	  case EXEC_OMP_TARGET_SIMD:
-		stk->tail->ext.omp_clauses->contains_teams_construct = 1;
-		break;
-	  default:
-	break;
-	  }
-	}
+  for (gfc_state_data *stk = gfc_state_stack->previous; stk;
+	   stk = stk->previous)
+	if (stk && stk->tail)
+	  switch (stk->tail->op)
+	{
+	case EXEC_OMP_TARGET:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
+	case EXEC_OMP_TARGET_TEAMS_LOOP:
+	case EXEC_OMP_TARGET_PARALLEL:
+	case EXEC_OMP_TARGET_PARALLEL_DO:
+	case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
+	case EXEC_OMP_TARGET_PARALLEL_LOOP:
+	case EXEC_OMP_TARGET_SIMD:
+	  stk->tail->ext.omp_clauses->contains_teams_construct = 1;
+	  break;
+	default:
+	  break;
+	}
   break;
 case ST_OMP_ERROR:
   if (new_st.ext.omp_clauses->at != OMP_AT_EXECUTION)
diff --git a/gcc/testsuite/gfortran.dg/gomp/teams-6.f90 b/gcc/testsuite/gfortran.dg/gomp/teams-6.f90
index be453f27f40..0bd7735e738 100644
--- a/gcc/testsuite/gfortran.dg/gomp/teams-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/teams-6.f90
@@ -37,6 +37,16 @@ end block
   i = 5
   !$omp end teams
 !$omp end target
+
+
+!$omp target  ! { dg-error "OMP TARGET region at .1. with a nested TEAMS may not contain any other statement, declaration or directive outside of the single TEAMS construct" }
+block
+  do i = 5, 8
+!$omp teams
+block; end block
+ 

[PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
[PATCH 0/5] GCC _BitInt support [PR102989]

The following patch series introduces support for C23 bit-precise integer
types.  In short, they are similar to other integral types in many ways,
just aren't subject for integral promotions if smaller than int and they can
have even much wider precisions than ordinary integer types.

It is enabled only on targets which have agreed on processor specific
ABI how to lay those out or pass as function arguments/return values,
which currently is just x86-64 I believe, would be nice if target maintainers
helped to get agreement on psABI changes and GCC 14 could enable it on far
more architectures than just one.

C23 says that  defines BITINT_MAXWIDTH macro and that is the
largest supported precision of the _BitInt types, smallest is precision
of unsigned long long (but due to lack of psABI agreement we'll violate
that on architectures which don't have the support done yet).
The following series uses for the time just WIDE_INT_MAX_PRECISION as
that BITINT_MAXWIDTH, with the intent to increase it incrementally later
on.  WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller
on lots of architectures.  This is the largest precision we can support
without changes of wide_int/widest_int representation (to make those non-POD
and allow use of some allocated buffer rather than the included fixed size
one).  Once that would be overcome, there is another internal enforced limit,
INTEGER_CST in current layout allows at most 255 64-bit limbs, which is
16320 bits as another cap.  And if that is overcome, then we have limitation
of TYPE_PRECISION being 16-bit, so 65535 as maximum precision.  Perhaps
we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use
32-bit precision in that case later.  Latest Clang/LLVM I think supports
on paper up to 8388608 bits, but is hardly usable even with much shorter
precisions.

Besides this hopefully temporary cap on supported precision and support
only on targets which buy into it, the support has the following limitations:

- _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like
  to enable those incrementally, but don't really see details on how such
  bit-fields should be laid-out in memory nor passed inside of function
  arguments; LLVM implements something, but it is a question if that is what
  the various ABIs want

- conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
  aren't support and emit a sorry; I'm not familiar enough with DFP stuff
  to implement that

- _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
  mention how those should be passed/returned; in a limited way they are
  supported internally because the internal functions into which
  __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
  hack to return 2 values without using references/pointers

- vectors of _BitInt(N) aren't supported, both because psABIs don't specify
  how that works and because I'm not really sure it would be useful given
  lack of hw support for anything but bit-precise integers with the same
  bit precision as standard integer types

Because the bit-precise types have different behavior both in the C FE
(e.g. the lack of promotion) and do or can have different behavior in type
layout and function argument passing/returning values, the patch introduces
a new integral type, BITINT_TYPE, so various spots which explicitly check
for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted.
Also the assumption that all integral types have scalar integer type mode
is no longer true, larger BITINT_TYPEs have BLKmode type.

The patch makes 4 different categories of _BitInt depending on the target hook
decisions and their precision.  The x86-64 psABI says that _BitInt which fit
into signed/unsigned char, short, int, long and long long are laid out and
passed as those types (with padding bits undefined if they don't have mode
precision).  Such smallest precision bit-precise integer types are categorized
as small, the target hook gives for specific precision a scalar integral mode
where a single such mode contains all the bits.  Such small _BitInt types are
generally kept in the IL until expansion into RTL, with minor tweaks during
expansion to avoid relying on the padding bit values.  All larger precision
_BitInt types are supposed to be handled as structure containing an array
of limbs or so, where a limb has some integral mode (for libgcc purposes
best if it has word-size) and the limbs have either little or big endian
ordering in the array.  The padding bits in the most significant limb if any
are either undefined or should be always sign/zero extended (but support for 
this
isn't in yet, we don't know if any psABI will require it).  As mentioned in
some psABI proposals, while currently there is just one limb mode, if the limb
ordering would follow normal target endianity, there is always a possibility
to have two limb modes,

RE: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

2023-07-27 Thread Roger Sayle

Hi Richard,

You're 100% right.  It’s possible to significantly clean-up this code, replacing
the body of the conditional with a call to force_reg and simplifying the 
conditions
under which it is called.  These improvements are implemented in the patch
below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and
make -k check, both with and without -m32, as usual.

Interestingly, the CONCAT clause afterwards is still required (I've learned 
something
new),  as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a 
CONCAT
instead of a REG, so although the code looks dead, it's required to build 
libgcc during
a bootstrap.  But the remaining clean-up is good, reducing the number of source 
lines
and making the logic easier to understand.

Ok for mainline?

2023-07-27  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* expr.cc (emit_group_load_1): Simplify logic for calling
force_reg on ORIG_SRC, to avoid making a copy if the source
is already in a pseudo register.

Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 25 July 2023 12:50
> 
> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle 
> wrote:
> >
> > This patch is the third in series of fixes for PR
> > rtl-optimization/110587, a compile-time regression with -O0, that
> > attempts to address the underlying cause.  As noted previously, the
> > pathological test case pr28071.c contains a large number of useless
> > register-to-register moves that can produce quadratic behaviour (in
> > LRA).  These move are generated during RTL expansion in
> > emit_group_load_1, where the middle-end attempts to simplify the
> > source before calling extract_bit_field.  This is reasonable if the
> > source is a complex expression (from before the tree-ssa optimizers),
> > or a SUBREG, or a hard register, but it's not particularly useful to
> > copy a pseudo register into a new pseudo register.  This patch eliminates 
> > that
> redundancy.
> >
> > The -fdump-tree-expand for pr28071.c compiled with -O0 currently
> > contains 777K lines, with this patch it contains 717K lines, i.e.
> > saving about 60K lines (admittedly of debugging text output, but it makes 
> > the
> point).
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > As always, I'm happy to revert this change quickly if there's a
> > problem, and investigate why this additional copy might (still) be
> > needed on other
> > non-x86 targets.
> 
> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src,
> tree type,
>  be loaded directly into the destination.  */
>src = orig_src;
>if (!MEM_P (orig_src)
> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
>   && (!CONSTANT_P (orig_src)
>   || (GET_MODE (orig_src) != mode
>   && GET_MODE (orig_src) != VOIDmode)))
> 
> so that means the code guarded by the conditional could instead be transformed
> to
> 
>src = force_reg (mode, orig_src);
> 
> ?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) !=
> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the
> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) ==
> VOIDmode ? mode : GET_MODE (orig_src), orig_src))
> 
> Otherwise I think this is OK, as said, using force_reg somehow would improve
> readability here I think.
> 
> I also wonder how the
> 
>   else if (GET_CODE (src) == CONCAT)
> 
> case will ever trigger with the current code.
> 
> Richard.
> 
> >
> > 2023-07-25  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR middle-end/28071
> > PR rtl-optimization/110587
> > * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
> > a new pseudo register, i.e. only copy hard regs into a new pseudo.
> >
> >

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fff09dc..174f8ac 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2622,16 +2622,11 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, 
tree type,
 be loaded directly into the destination.  */
   src = orig_src;
   if (!MEM_P (orig_src)
- && (!CONSTANT_P (orig_src)
- || (GET_MODE (orig_src) != mode
- && GET_MODE (orig_src) != VOIDmode)))
+ && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
+ && !CONSTANT_P (orig_src))
{
- if (GET_MODE (orig_src) == VOIDmode)
-   src = gen_reg_rtx (mode);
- else
-   src = gen_reg_rtx (GET_MODE (orig_src));
-
- emit_move_insn (src, orig_src);
+ gcc_assert (GET_MODE (orig_src) != VOIDmode);
+ src = force_reg (GET_MODE (orig_src), orig_src);
}
 
   /* Optimize the access just a bit.  */


[PATCH 2/5] libgcc _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point.
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies.  For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that.  In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway.  Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.

2023-07-27  Jakub Jelinek  

PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.

--- libgcc/config/aarch64/t-softfp.jj   2023-03-13 00:11:52.330213322 +0100
+++ libgcc/config/aarch64/t-softfp  2023-07-14 12:38:30.764869473 +0200
@@ -3,7 +3,7 @@ softfp_int_modes := si di ti
 softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floatdibf floatundibf floattibf floatuntibf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
--- libgcc/config/i386/64/t-softfp.jj   2023-03-10 20:39:43.849687830 +0100
+++ libgcc/config/i386/64/t-softfp  2023-07-14 12:37:55.422344930 +0200
@@ -1,4 +1,4 @@
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floattibf floatuntibf
 
 CFLAGS-fixhfti.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj  2023-07-11 13:39:49.760107863 
+0200
+++ libgcc/config/i386/libgcc-glibc.ver 2023-07-17 09:45:43.128281615 +0200
@@ -226,3 +226,13 @@ GCC_13.0.0 {
   __truncxfbf2
   __trunchfbf2
 }
+
+%inherit GCC_14.0.0 GCC_13.0.0
+GCC_14.0.0 {
+  __PFX__fixxfbitint
+  __PFX__fixtfbitint
+  __PFX__floatbitintbf
+  __PFX__floatbitinthf
+  __PFX__floatbitintxf
+  __PFX__floatbitinttf
+}
--- libgcc/config/i386/t-softfp.jj  2022-10-14 09:35:56.268989311 +0200
+++ libgcc/config/i386/t-softfp 2023-07-17 09:38:43.575980078 +0200
@@ -10,7 +10,7 @@ sof

[PATCH 3/5] C _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the C FE support, c-family support, small libcpp change
so that 123wb and 42uwb suffixes are handled plus glimits.h change
to define BITINT_MAXWIDTH macro.

The previous two patches really do nothing without this, which enables
all the support.

2023-07-27  Jakub Jelinek  

PR c/102989
gcc/
* glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is
predefined.
gcc/c-family/
* c-common.cc (c_common_reswords): Add _BitInt as keyword.
(c_common_signed_or_unsigned_type): Handle BITINT_TYPE.
(check_builtin_function_arguments): Handle BITINT_TYPE like
INTEGER_TYPE.
(keyword_begins_type_specifier): Handle RID_BITINT.
* c-common.h (enum rid): Add RID_BITINT enumerator.
* c-cppbuiltin.cc (c_cpp_builtins): For C call
targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__
and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and
__LIBGCC_BITINT_ORDER__ macros if _BitInt is supported.
* c-lex.cc (interpret_integer): Handle CPP_N_BITINT.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier,
c_pretty_printer::direct_abstract_declarator): Handle BITINT_TYPE.
(pp_c_integer_constant): Handle printing of large precision wide_ints
which would buffer overflow digit_buffer.
* c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT
for type0 type descriptor.
gcc/c/
* c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE.
* c-decl.cc (declspecs_add_type): Formatting fixes.  Handle
cts_bitint.  Adjust for added union in *specs.  Handle RID_BITINT.
(finish_declspecs): Handle cts_bitint.  Adjust for added union in
*specs.
* c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs,
c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle
RID_BITINT.
* c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment.
Add cts_bitint enumerator.
(struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union
and add bitint_prec there as well.
* c-typeck.cc (composite_type, c_common_type, comptypes_internal):
Handle BITINT_TYPE.
(build_array_ref, build_unary_op, build_conditional_expr,
convert_for_assignment, digest_init, build_binary_op): Likewise.
libcpp/
* expr.cc (interpret_int_suffix): Handle wb and WB suffixes.
* include/cpplib.h (CPP_N_BITINT): Define.

--- gcc/glimits.h.jj2023-01-03 00:20:35.071086812 +0100
+++ gcc/glimits.h   2023-07-27 15:03:24.238234396 +0200
@@ -157,6 +157,11 @@ see the files COPYING3 and COPYING.RUNTI
 # undef BOOL_WIDTH
 # define BOOL_WIDTH 1
 
+# ifdef __BITINT_MAXWIDTH__
+#  undef BITINT_MAXWIDTH
+#  define BITINT_MAXWIDTH __BITINT_MAXWIDTH__
+# endif
+
 # define __STDC_VERSION_LIMITS_H__ 202311L
 #endif
 
--- gcc/c-family/c-common.cc.jj 2023-07-24 17:48:26.436041278 +0200
+++ gcc/c-family/c-common.cc2023-07-27 15:03:24.276233865 +0200
@@ -349,6 +349,7 @@ const struct c_common_resword c_common_r
   { "_Alignas",RID_ALIGNAS,   D_CONLY },
   { "_Alignof",RID_ALIGNOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
+  { "_BitInt", RID_BITINT,D_CONLY },
   { "_Bool",   RID_BOOL,  D_CONLY },
   { "_Complex",RID_COMPLEX,0 },
   { "_Imaginary",  RID_IMAGINARY, D_CONLY },
@@ -2728,6 +2729,9 @@ c_common_signed_or_unsigned_type (int un
   || TYPE_UNSIGNED (type) == unsignedp)
 return type;
 
+  if (TREE_CODE (type) == BITINT_TYPE)
+return build_bitint_type (TYPE_PRECISION (type), unsignedp);
+
 #define TYPE_OK(node)  \
   (TYPE_MODE (type) == TYPE_MODE (node)
\
&& TYPE_PRECISION (type) == TYPE_PRECISION (node))
@@ -6341,8 +6345,10 @@ check_builtin_function_arguments (locati
  code0 = TREE_CODE (TREE_TYPE (args[0]));
  code1 = TREE_CODE (TREE_TYPE (args[1]));
  if (!((code0 == REAL_TYPE && code1 == REAL_TYPE)
-   || (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
-   || (code0 == INTEGER_TYPE && code1 == REAL_TYPE)))
+   || (code0 == REAL_TYPE
+   && (code1 == INTEGER_TYPE || code1 == BITINT_TYPE))
+   || ((code0 == INTEGER_TYPE || code0 == BITINT_TYPE)
+   && code1 == REAL_TYPE)))
{
  error_at (loc, "non-floating-point arguments in call to "
"function %qE", fndecl);
@@ -8402,6 +8408,7 @@ keyword_begins_type_specifier (enum rid
 case RID_FRACT:
 case RID_ACCUM:
 case RID_BOOL:
+case RID_BITINT:
 case RID_WCHAR:
 case RID_CHAR8:
 case RID_CHAR16:
--- gcc/c-family/c-common.h.jj  2023-06-26 09:27:04.276367532 +0200
+++ gcc/c-family/c-common.h 2023-

Re: [PATCH 4/5] testsuite part 1 for _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 07:15:28PM +0200, Jakub Jelinek via Gcc-patches wrote:
> testcases, I've been using
> https://defuse.ca/big-number-calculator.htm
> tool, a randombitint tool I wrote (will post as a reply to this) plus
> LLVM trunk on godbolt and the WIP GCC support checking if both compilers
> agree on stuff (and in case of differences tried constant evaluation etc.).

So, the randombitint.c tool is attached, when invoked like
./randombitint 174
it prints pseudo random 174 bit integer in decimal, when invoked as
./randombitint 575 mask
it prints all ones number as decimal for the 575 bit precision, and
./randombitint 275 0x432445aebe435646567547567647
prints the given hexadecimal number as decimal (all using gmp).

In the tests I've often used
__attribute__((noipa)) void
printme (const void *p, int n)
{
  __builtin_printf ("0x");
  if ((n & 7) != 0)
__builtin_printf ("%02x", ((const unsigned char *) p)[n / 8] & ((1 << (n & 
7)) - 1));
  for (int i = (n / 8) - 1; i >= 0; --i)
__builtin_printf ("%02x", ((const unsigned char *) p)[i]);
  __builtin_printf ("\n");
}
function to print hexadecimal values (temporaries or finals) and then used
the third invocation of the tool to convert those to decimal.
For unsigned _BitInt just called the above like
  printme (&whatever, 575);
where 575 was the N from unsigned _BitInt(N) whatever, or
  _BitInt(575) x = ...
  if (x < 0)
{
  __builtin_printf ("-");
  x = -x;
}
  printme (&x, 575);
to print it signed.

Jakub
#include 
#include 
#include 
#include 
#include 

int
main (int argc, const char *argv[])
{
  int n = atoi (argv[1]);
  int m = (n + 7) / 8;
  char *p = __builtin_alloca (m * 2 + 1);
  const char *q;
  srandom (getpid ());
  for (int i = 0; i < m; ++i)
{
  unsigned char v = random ();
  if (argc >= 3 && strcmp (argv[2], "mask") == 0)
v = 0xff;
  if (i == 0 && (n & 7) != 0)
v &= (1 << (n & 7)) - 1;
  sprintf (&p[2 * i], "%02x", v);
}
  p[m * 2] = '\0';
  mpz_t a;
  if (argc >= 3 && strcmp (argv[2], "mask") != 0)
{
  q = argv[2];
  if (q[0] == '0' && q[1] == 'x')
q += 2;
}
  else
q = p;
  gmp_sscanf (q, "%Zx", a);
  gmp_printf ("0x%s\n%Zd\n", q, a);
  return 0;
}


[PATCH] bpf: correct pseudo-C template for add3 and sub3

2023-07-27 Thread David Faust via Gcc-patches
The pseudo-C output templates for these instructions were incorrectly
using operand 1 rather than operand 2 on the RHS, which led to some
very incorrect assembly generation with -masm=pseudoc.

Tested on bpf-unknown-none.
OK?

gcc/

* config/bpf/bpf.md (add3): Use %w2 instead of %w1
in pseudo-C dialect output template.
(sub3): Likewise.

gcc/testsuite/

* gcc.target/bpf/alu-2.c: New test.
* gcc.target/bpf/alu-pseudoc-2.c: Likewise.
---
 gcc/config/bpf/bpf.md|  4 ++--
 gcc/testsuite/gcc.target/bpf/alu-2.c | 12 
 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c | 13 +
 3 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/alu-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 2ffc4ebd17e..66436397bb7 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -131,7 +131,7 @@ (define_insn "add3"
 (plus:AM (match_operand:AM 1 "register_operand"   " 0,0")
  (match_operand:AM 2 "reg_or_imm_operand" " r,I")))]
   "1"
-  "{add\t%0,%2|%w0 += %w1}"
+  "{add\t%0,%2|%w0 += %w2}"
   [(set_attr "type" "")])
 
 ;;; Subtraction
@@ -144,7 +144,7 @@ (define_insn "sub3"
 (minus:AM (match_operand:AM 1 "register_operand" " 0")
   (match_operand:AM 2 "register_operand" " r")))]
   ""
-  "{sub\t%0,%2|%w0 -= %w1}"
+  "{sub\t%0,%2|%w0 -= %w2}"
   [(set_attr "type" "")])
 
 ;;; Negation
diff --git a/gcc/testsuite/gcc.target/bpf/alu-2.c 
b/gcc/testsuite/gcc.target/bpf/alu-2.c
new file mode 100644
index 000..0444a9bc68a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/alu-2.c
@@ -0,0 +1,12 @@
+/* Check add and sub instructions.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+long foo (long x, long y)
+{
+  return y - x + 4;
+}
+
+/* { dg-final { scan-assembler-not {sub\t(%r.),\1\n} } } */
+/* { dg-final { scan-assembler {sub\t(\%r.),(\%r.)\n} } } */
+/* { dg-final { scan-assembler {add\t(\%r.),4\n} } } */
diff --git a/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c 
b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
new file mode 100644
index 000..751db2477c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
@@ -0,0 +1,13 @@
+/* Check add and sub instructions (pseudoc asm dialect).  */
+/* { dg-do compile } */
+/* { dg-options "-masm=pseudoc" } */
+
+long foo (long x, long y)
+{
+  return y - x + 4;
+}
+
+/* { dg-final { scan-assembler-not {\t(r.) -= \1\n} } } */
+/* { dg-final { scan-assembler {\t(r.) -= (r.)\n} } } */
+/* { dg-final { scan-assembler {\t(r.) \+= 4\n} } } */
+
-- 
2.40.1



Fix profile update after RTL unrolling

2023-07-27 Thread Jan Hubicka via Gcc-patches
This patch fixes profile update after RTL unroll, that is now done same way as
in tree one.  We still produce (slightly) corrupted profile for multiple exit
loops I can try to fix incrementally.

I also updated testcases to look for profile mismatches so they do not creep
back in again.

Bootstrapped/regtested x86_64-liux, comitted.

gcc/ChangeLog:

* cfgloop.h (single_dom_exit): Declare.
* cfgloopmanip.h (update_exit_probability_after_unrolling): Declare.
* cfgrtl.cc (struct cfg_hooks): Fix comment.
* loop-unroll.cc (unroll_loop_constant_iterations): Update exit edge.
* tree-ssa-loop-ivopts.h (single_dom_exit): Do not declare it here.
* tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling):
Break out from ...
(tree_transform_and_unroll_loop): ... here;

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/peel-1.c: Test for profile mismatches.
* gcc.dg/tree-prof/unroll-1.c: Test for profile mismatches.
* gcc.dg/tree-ssa/peel1.c: Test for profile mismatches.
* gcc.dg/unroll-1.c: Test for profile mismatches.
* gcc.dg/unroll-3.c: Test for profile mismatches.
* gcc.dg/unroll-4.c: Test for profile mismatches.
* gcc.dg/unroll-5.c: Test for profile mismatches.
* gcc.dg/unroll-6.c: Test for profile mismatches.

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 22293e1c237..c4622d4b853 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -921,6 +921,7 @@ extern bool get_estimated_loop_iterations (class loop 
*loop, widest_int *nit);
 extern bool get_max_loop_iterations (const class loop *loop, widest_int *nit);
 extern bool get_likely_max_loop_iterations (class loop *loop, widest_int *nit);
 extern int bb_loop_depth (const_basic_block);
+extern edge single_dom_exit (class loop *);
 
 /* Converts VAL to widest_int.  */
 
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index af6a29f70c4..dab7b31c1e7 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -68,5 +68,6 @@ class loop * loop_version (class loop *, void *,
 void adjust_loop_info_after_peeling (class loop *loop, int npeel, bool 
precise);
 void scale_dominated_blocks_in_loop (class loop *loop, basic_block bb,
 profile_count num, profile_count den);
+void update_exit_probability_after_unrolling (class loop *loop, edge new_exit);
 
 #endif /* GCC_CFGLOOPMANIP_H */
diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index 36e43d0d737..abcb472e2a2 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -5409,7 +5409,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
   rtl_flow_call_edges_add,
   NULL, /* execute_on_growing_pred */
   NULL, /* execute_on_shrinking_pred */
-  duplicate_loop_body_to_header_edge, /* duplicate loop for trees */
+  duplicate_loop_body_to_header_edge, /* duplicate loop for rtl */
   rtl_lv_add_condition_to_bb, /* lv_add_condition_to_bb */
   NULL, /* lv_adjust_loop_header_phi*/
   rtl_extract_cond_bb_edges, /* extract_cond_bb_edges */
diff --git a/gcc/loop-unroll.cc b/gcc/loop-unroll.cc
index 9d8ba11..bbfa6ccc770 100644
--- a/gcc/loop-unroll.cc
+++ b/gcc/loop-unroll.cc
@@ -487,6 +487,7 @@ unroll_loop_constant_iterations (class loop *loop)
   bool exit_at_end = loop_exit_at_end_p (loop);
   struct opt_info *opt_info = NULL;
   bool ok;
+  bool flat = maybe_flat_loop_profile (loop);
 
   niter = desc->niter;
 
@@ -603,9 +604,14 @@ unroll_loop_constant_iterations (class loop *loop)
   ok = duplicate_loop_body_to_header_edge (
 loop, loop_latch_edge (loop), max_unroll, wont_exit, desc->out_edge,
 &remove_edges,
-DLTHE_FLAG_UPDATE_FREQ | (opt_info ? DLTHE_RECORD_COPY_NUMBER : 0));
+DLTHE_FLAG_UPDATE_FREQ | (opt_info ? DLTHE_RECORD_COPY_NUMBER : 0)
+| (flat ? DLTHE_FLAG_FLAT_PROFILE : 0));
   gcc_assert (ok);
 
+  edge new_exit = single_dom_exit (loop);
+  if (new_exit)
+update_exit_probability_after_unrolling (loop, new_exit);
+
   if (opt_info)
 {
   apply_opt_in_copies (opt_info, max_unroll, true, true);
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index 88a6431c21a..e860c5db540 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -650,6 +650,9 @@ public:
   return *this;
 }
 
+  /* Compute n-th power.  */
+  profile_probability pow (int) const;
+
   /* Get the value of the count.  */
   uint32_t value () const { return m_val; }
 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
index 7245b68c1ee..32ecccb16da 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops 
-fpeel-loops" } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details-blocks 
-fdump-tree-optimized-details-blocks -fno-unroll-loops -fpeel-loops" } */
 void abort();
 
 int a[1000];
@@ -21,3 +21,5 @@ main()
   return 0;
 }
 /* { dg-final-use { scan-tree-dump "Peeled loop ., 1 times" "cunr

Make store likely in optimize_mask_stores

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
as discussed with Richard, we want store to be likely in
optimize_mask_stores.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* tree-vect-loop.cc (optimize_mask_stores): Make store
likely.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2561552fe6e..a83952aff60 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11741,7 +11741,7 @@ optimize_mask_stores (class loop *loop)
   e->flags = EDGE_TRUE_VALUE;
   efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
   /* Put STORE_BB to likely part.  */
-  efalse->probability = profile_probability::unlikely ();
+  efalse->probability = profile_probability::likely ();
   e->probability = efalse->probability.invert ();
   store_bb->count = efalse->count ();
   make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);


Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-27 Thread Joseph Myers
On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd 
> like
>   to enable those incrementally, but don't really see details on how such
>   bit-fields should be laid-out in memory nor passed inside of function
>   arguments; LLVM implements something, but it is a question if that is what
>   the various ABIs want

So if the x86-64 ABI (or any other _BitInt ABI that already exists) 
doesn't specify this adequately then an issue should be filed (at 
https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case).

(Note that the language specifies that e.g. _BitInt(123):45 gets promoted 
to _BitInt(123) by the integer promotions, rather than left as a type with 
the bit-field width.)

> - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
>   aren't support and emit a sorry; I'm not familiar enough with DFP stuff
>   to implement that

Doing things incrementally might indicate first doing this only for BID 
(so sufficing for x86-64), with DPD support to be added when _BitInt 
support is added for an architecture using DPD, i.e. powerpc / s390.

This conversion is a mix of base conversion and things specific to DFP 
types.

For conversion *from DFP to _BitInt*, the DFP value needs to be 
interpreted (hopefully using existing libbid code) as the product of a 
sign, an integer and a power of 10, with appropriate truncation of the 
fractional part if there is one (and appropriate handling of infinity / 
NaN / values where the integer part obviously doesn't fit in the type as 
raising "invalid" and returning an arbitrary result).  Then it's just a 
matter of doing an integer multiplication and producing an appropriately 
signed result (which might itself overflow the range of representable 
values with the given sign, meaning "invalid" should be raised).  
Precomputed tables of powers of 10 in binary might speed up the 
multiplication process (don't know if various existing tables in libbid 
are usable for that).  It's unspecified whether "inexact" is raised for 
non-integer DFP values.

For conversion *from _BitInt to DFP*, the _BitInt value needs to be 
expressed in decimal.  In the absence of optimized multiplication / 
division for _BitInt, it seems reasonable enough to do this naively 
(repeatedly dividing by a power of 10 that fits in one limb to determine 
base 10^N digits from the least significant end, for example), modulo 
detecting obvious overflow cases up front (if the absolute value is at 
least 10^97, conversion to _Decimal32 definitely overflows in all rounding 
modes, for example, so you just need to do an overflowing computation that 
produces a result with the right sign in order to get the correct 
rounding-mode-dependent result and exceptions).  Probably it isn't 
necessary to convert most of those base 10^N digits into base 10 digits.  
Rather, it's enough to find the leading M (= precision of the DFP type in 
decimal digits) base 10 digits, plus to know whether what follows is 
exactly 0, exactly 0.5, between 0 and 0.5, or between 0.5 and 1.

Then adding two appropriate DFP values with the right sign produces the 
final DFP result.  Those DFP values would need to be produced from integer 
digits together with the relevant power of 10.  And there might be 
multiple possible choices for the DFP quantum exponent; the preferred 
exponent for exact results is 0, so the resulting exponent needs to be 
chosen to be as close to 0 as possible (which also produces correct 
results when the result is inexact).  (If the result is 0, note that 
quantum exponent of 0 is not the same as the zero from default 
initialization, which has the least exponent possible.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] bpf: correct pseudo-C template for add3 and sub3

2023-07-27 Thread Jose E. Marchesi via Gcc-patches


> The pseudo-C output templates for these instructions were incorrectly
> using operand 1 rather than operand 2 on the RHS, which led to some
> very incorrect assembly generation with -masm=pseudoc.
>
> Tested on bpf-unknown-none.
> OK?

OK.  Thanks for spotting and fixing this!

>
> gcc/
>
>   * config/bpf/bpf.md (add3): Use %w2 instead of %w1
>   in pseudo-C dialect output template.
>   (sub3): Likewise.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/alu-2.c: New test.
>   * gcc.target/bpf/alu-pseudoc-2.c: Likewise.
> ---
>  gcc/config/bpf/bpf.md|  4 ++--
>  gcc/testsuite/gcc.target/bpf/alu-2.c | 12 
>  gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c | 13 +
>  3 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/alu-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 2ffc4ebd17e..66436397bb7 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -131,7 +131,7 @@ (define_insn "add3"
>  (plus:AM (match_operand:AM 1 "register_operand"   " 0,0")
>   (match_operand:AM 2 "reg_or_imm_operand" " r,I")))]
>"1"
> -  "{add\t%0,%2|%w0 += %w1}"
> +  "{add\t%0,%2|%w0 += %w2}"
>[(set_attr "type" "")])
>  
>  ;;; Subtraction
> @@ -144,7 +144,7 @@ (define_insn "sub3"
>  (minus:AM (match_operand:AM 1 "register_operand" " 0")
>(match_operand:AM 2 "register_operand" " r")))]
>""
> -  "{sub\t%0,%2|%w0 -= %w1}"
> +  "{sub\t%0,%2|%w0 -= %w2}"
>[(set_attr "type" "")])
>  
>  ;;; Negation
> diff --git a/gcc/testsuite/gcc.target/bpf/alu-2.c 
> b/gcc/testsuite/gcc.target/bpf/alu-2.c
> new file mode 100644
> index 000..0444a9bc68a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/alu-2.c
> @@ -0,0 +1,12 @@
> +/* Check add and sub instructions.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +long foo (long x, long y)
> +{
> +  return y - x + 4;
> +}
> +
> +/* { dg-final { scan-assembler-not {sub\t(%r.),\1\n} } } */
> +/* { dg-final { scan-assembler {sub\t(\%r.),(\%r.)\n} } } */
> +/* { dg-final { scan-assembler {add\t(\%r.),4\n} } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c 
> b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
> new file mode 100644
> index 000..751db2477c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
> @@ -0,0 +1,13 @@
> +/* Check add and sub instructions (pseudoc asm dialect).  */
> +/* { dg-do compile } */
> +/* { dg-options "-masm=pseudoc" } */
> +
> +long foo (long x, long y)
> +{
> +  return y - x + 4;
> +}
> +
> +/* { dg-final { scan-assembler-not {\t(r.) -= \1\n} } } */
> +/* { dg-final { scan-assembler {\t(r.) -= (r.)\n} } } */
> +/* { dg-final { scan-assembler {\t(r.) \+= 4\n} } } */
> +


Re: [PATCH] Use substituted GDCFLAGS

2023-07-27 Thread Iain Buclaw via Gcc-patches
Excerpts from Andreas Schwab via Gcc-patches's message of Juli 24, 2023 11:15 
am:
> Ping?
> 

OK from me.

Thanks,
Iain.


Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

2023-07-27 Thread Richard Biener via Gcc-patches



> Am 27.07.2023 um 19:12 schrieb Roger Sayle :
> 
> 
> Hi Richard,
> 
> You're 100% right.  It’s possible to significantly clean-up this code, 
> replacing
> the body of the conditional with a call to force_reg and simplifying the 
> conditions
> under which it is called.  These improvements are implemented in the patch
> below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and
> make -k check, both with and without -m32, as usual.
> 
> Interestingly, the CONCAT clause afterwards is still required (I've learned 
> something
> new),  as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a 
> CONCAT
> instead of a REG,

Heh, interesting.

> so although the code looks dead, it's required to build libgcc during
> a bootstrap.  But the remaining clean-up is good, reducing the number of 
> source lines
> and making the logic easier to understand.
> 
> Ok for mainline?

Ok.

Thanks,
Richard 

> 2023-07-27  Roger Sayle  
>Richard Biener  
> 
> gcc/ChangeLog
>PR middle-end/28071
>PR rtl-optimization/110587
>* expr.cc (emit_group_load_1): Simplify logic for calling
>force_reg on ORIG_SRC, to avoid making a copy if the source
>is already in a pseudo register.
> 
> Roger
> --
> 
>> -Original Message-
>> From: Richard Biener 
>> Sent: 25 July 2023 12:50
>> 
>>> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle 
>>> wrote:
>>> 
>>> This patch is the third in series of fixes for PR
>>> rtl-optimization/110587, a compile-time regression with -O0, that
>>> attempts to address the underlying cause.  As noted previously, the
>>> pathological test case pr28071.c contains a large number of useless
>>> register-to-register moves that can produce quadratic behaviour (in
>>> LRA).  These move are generated during RTL expansion in
>>> emit_group_load_1, where the middle-end attempts to simplify the
>>> source before calling extract_bit_field.  This is reasonable if the
>>> source is a complex expression (from before the tree-ssa optimizers),
>>> or a SUBREG, or a hard register, but it's not particularly useful to
>>> copy a pseudo register into a new pseudo register.  This patch eliminates 
>>> that
>> redundancy.
>>> 
>>> The -fdump-tree-expand for pr28071.c compiled with -O0 currently
>>> contains 777K lines, with this patch it contains 717K lines, i.e.
>>> saving about 60K lines (admittedly of debugging text output, but it makes 
>>> the
>> point).
>>> 
>>> 
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check, both with and without --target_board=unix{-m32}
>>> with no new failures.  Ok for mainline?
>>> 
>>> As always, I'm happy to revert this change quickly if there's a
>>> problem, and investigate why this additional copy might (still) be
>>> needed on other
>>> non-x86 targets.
>> 
>> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src,
>> tree type,
>> be loaded directly into the destination.  */
>>   src = orig_src;
>>   if (!MEM_P (orig_src)
>> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
>>  && (!CONSTANT_P (orig_src)
>>  || (GET_MODE (orig_src) != mode
>>  && GET_MODE (orig_src) != VOIDmode)))
>> 
>> so that means the code guarded by the conditional could instead be 
>> transformed
>> to
>> 
>>   src = force_reg (mode, orig_src);
>> 
>> ?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) !=
>> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the
>> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) 
>> ==
>> VOIDmode ? mode : GET_MODE (orig_src), orig_src))
>> 
>> Otherwise I think this is OK, as said, using force_reg somehow would improve
>> readability here I think.
>> 
>> I also wonder how the
>> 
>>  else if (GET_CODE (src) == CONCAT)
>> 
>> case will ever trigger with the current code.
>> 
>> Richard.
>> 
>>> 
>>> 2023-07-25  Roger Sayle  
>>> 
>>> gcc/ChangeLog
>>>PR middle-end/28071
>>>PR rtl-optimization/110587
>>>* expr.cc (emit_group_load_1): Avoid copying a pseudo register into
>>>a new pseudo register, i.e. only copy hard regs into a new pseudo.
>>> 
>>> 
> 
> 


Re: [PATCH 5/5] testsuite part 2 for _BitInt support [PR102989]

2023-07-27 Thread Joseph Myers
I think there should be tests for _Atomic _BitInt types.  Hopefully atomic 
compound assignment just works via the logic for compare-and-exchange 
loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?

-- 
Joseph S. Myers
jos...@codesourcery.com


[r14-2797 Regression] FAIL: 23_containers/vector/bool/110807.cc (test for excess errors) on Linux/x86_64

2023-07-27 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

7931a1de9ec87b996d51d3d60786f5c81f63919f is the first bad commit
commit 7931a1de9ec87b996d51d3d60786f5c81f63919f
Author: Jonathan Wakely 
Date:   Wed Jul 26 14:09:24 2023 +0100

libstdc++: Avoid bogus overflow warnings in std::vector [PR110807]

caused

FAIL: 23_containers/vector/bool/110807.cc (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2797/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825]

2023-07-27 Thread Harald Anlauf via Gcc-patches
Dear all,

when passing a character actual argument to an assumed-type dummy
(TYPE(*)), we should not pass the character length for that argument,
as otherwise other hidden arguments that are passed as part of the
gfortran ABI will not be interpreted correctly.  This is in line
with the current way the procedure decl is generated.

The attached patch fixes the caller and clarifies the behavior
in the documentation.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 199e09c9862f5afe7e583839bc1b108c741a7efb Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 27 Jul 2023 21:30:26 +0200
Subject: [PATCH] Fortran: do not pass hidden character length for TYPE(*)
 dummy [PR110825]

gcc/fortran/ChangeLog:

	PR fortran/110825
	* gfortran.texi: Clarify argument passing convention.
	* trans-expr.cc (gfc_conv_procedure_call): Do not pass the character
	length as hidden argument when the declared dummy argument is
	assumed-type.

gcc/testsuite/ChangeLog:

	PR fortran/110825
	* gfortran.dg/assumed_type_18.f90: New test.
---
 gcc/fortran/gfortran.texi |  3 +-
 gcc/fortran/trans-expr.cc |  1 +
 gcc/testsuite/gfortran.dg/assumed_type_18.f90 | 52 +++
 3 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/assumed_type_18.f90

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 7786d23265f..f476a3719f5 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -3750,7 +3750,8 @@ front ends of GCC, e.g. to GCC's C99 compiler for @code{_Bool}
 or GCC's Ada compiler for @code{Boolean}.)

 For arguments of @code{CHARACTER} type, the character length is passed
-as a hidden argument at the end of the argument list.  For
+as a hidden argument at the end of the argument list, except when the
+corresponding dummy argument is declared as @code{TYPE(*)}.  For
 deferred-length strings, the value is passed by reference, otherwise
 by value.  The character length has the C type @code{size_t} (or
 @code{INTEGER(kind=C_SIZE_T)} in Fortran).  Note that this is
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index ef3e6d08f78..764565476af 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -7521,6 +7521,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  && !(fsym && fsym->ts.type == BT_DERIVED && fsym->ts.u.derived
 	   && fsym->ts.u.derived->intmod_sym_id == ISOCBINDING_PTR
 	   && fsym->ts.u.derived->from_intmod == INTMOD_ISO_C_BINDING )
+	  && !(fsym && fsym->ts.type == BT_ASSUMED)
 	  && !(fsym && UNLIMITED_POLY (fsym)))
 	vec_safe_push (stringargs, parmse.string_length);

diff --git a/gcc/testsuite/gfortran.dg/assumed_type_18.f90 b/gcc/testsuite/gfortran.dg/assumed_type_18.f90
new file mode 100644
index 000..a3d791919a2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/assumed_type_18.f90
@@ -0,0 +1,52 @@
+! { dg-do run }
+! PR fortran/110825 - TYPE(*) and character actual arguments
+
+program foo
+  use iso_c_binding, only: c_loc, c_ptr, c_associated
+  implicit none
+  character(100):: not_used = ""
+  character(:), allocatable :: deferred
+  character :: c42(6,7) = "*"
+  call sub  (not_used,  "123")
+  call sub  ("0"  , "123")
+  deferred = "d"
+  call sub  (deferred , "123")
+  call sub2 ([1.0,2.0], "123")
+  call sub2 (["1","2"], "123")
+  call sub3 (c42  , "123")
+
+contains
+
+  subroutine sub (useless_var, print_this)
+type(*),  intent(in) :: useless_var
+character(*), intent(in) :: print_this
+if (len  (print_this) /= 3) stop 1
+if (len_trim (print_this) /= 3) stop 2
+  end
+
+  subroutine sub2 (a, c)
+type(*),  intent(in) :: a(:)
+character(*), intent(in) :: c
+if (len  (c) /= 3) stop 10
+if (len_trim (c) /= 3) stop 11
+if (size (a) /= 2) stop 12
+  end
+
+  subroutine sub3 (a, c)
+type(*),  intent(in), target, optional :: a(..)
+character(*), intent(in)   :: c
+type(c_ptr) :: cpt
+if (len  (c) /= 3) stop 20
+if (len_trim (c) /= 3) stop 21
+if (.not. present (a)) stop 22
+if (rank (a) /= 2) stop 23
+if (size (a)/= 42) stop 24
+if (any (shape  (a) /= [6,7])) stop 25
+if (any (lbound (a) /= [1,1])) stop 26
+if (any (ubound (a) /= [6,7])) stop 27
+if (.not. is_contiguous (a))   stop 28
+cpt = c_loc (a)
+if (.not. c_associated (cpt))  stop 29
+  end
+
+end
--
2.35.3



Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread Patrick O'Neill

The newly added testcase fails on rv32 targets with this message:
FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
for excess errors)

verbose log:
compiler exited with status 1
output is:
cc1: error: ABI requires '-march=rv32'

Something like this appears to fix the issue:

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
index 14a9802667e..e10a9e9d0f5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce --param 
riscv-autovec-preference=scalable" } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
-fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable"
 } */
 
 long

 foo (long *__restrict a, long *__restrict b, long n)

On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote:


My first impression is those emit_insn (gen_rtx_SET()) seems
necessary, but I got the point after I checked vector.md :P

Committed to trunk, thanks :)


On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai
  wrote:

Oh, YES.

Thanks for fixing it. It makes sense since the ternary operations in "vector.md"
generate "vmv.v.v" according to RA.

Thanks for fixing it.

@kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy to 
commit patches :).



juzhe.zh...@rivai.ai

From: demin.han
Date: 2023-07-27 17:48
To:gcc-patches@gcc.gnu.org
CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai
Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
which_alternative
When pass split2 starts, which_alternative is random depending on
last set of certain pass.

Even initialized, the generated movement is redundant.
The movement can be generated by assembly output template.

Signed-off-by: demin.han

gcc/ChangeLog:

* config/riscv/autovec.md: Delete which_alternative use in split

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.

---
gcc/config/riscv/autovec.md | 12 
.../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 +
2 files changed, 13 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d899922586a..b7ea3101f5a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
 riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1242,8 +1232,6 @@ (define_insn_and_split "*fnms"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_r

Re: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect

2023-07-27 Thread Thomas Schwinge
Hi Tobias!

On 2023-07-25T23:45:54+0200, Tobias Burnus  wrote:
> The attached patch calls CUDA's cuMemcopy2D and cuMemcpy3D
> for omp_target_memcpy_rect[,_async} for dim=2/dim=3. This should
> speed up the data transfer for noncontiguous data.

ACK, thanks.

> While being there, I ended up adding support for device to other device
> copying; while potentially slow, it is still better than not being able to
> copy - and with shared-memory, it shouldn't be that bad.

Makes sense, I guess.

> Comments, suggestions, remarks?
> If there are none, will commit it...

You're so quick -- I'm so slow...  ;-)

I've not verified all the logic in here, but I've got a few comments.

> Disclaimer: While I have done correctness tests (system with two nvptx GPUs,
> I have not done any performance tests.

Well, we should, eventually.

> (I also tested it without offloading
> configured, but that's rather boring.)

> OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
>
> When copying a 2D or 3D rectangular memmory block, the performance is
> better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
> data one by one. That's what this commit does.

So you've actually done some performance verification?

> Additionally, it permits device-to-device copies, if neccessary using a
> temporary variable on the host.

> --- a/include/cuda/cuda.h
> +++ b/include/cuda/cuda.h

I note that you're not actually using everything you're adding here.
(..., but I understand you're simply adding everying that relates to
these 'cuMemcpy[...]' routines -- OK as far as I'm concerned.)

> @@ -47,6 +47,7 @@ typedef void *CUevent;
>  typedef void *CUfunction;
>  typedef void *CUlinkState;
>  typedef void *CUmodule;
> +typedef void *CUarray;
>  typedef size_t (*CUoccupancyB2DSize)(int);
>  typedef void *CUstream;
>
> @@ -54,7 +55,10 @@ typedef enum {
>CUDA_SUCCESS = 0,
>CUDA_ERROR_INVALID_VALUE = 1,
>CUDA_ERROR_OUT_OF_MEMORY = 2,
> +  CUDA_ERROR_NOT_INITIALIZED = 3,
> +  CUDA_ERROR_DEINITIALIZED = 4,
>CUDA_ERROR_INVALID_CONTEXT = 201,
> +  CUDA_ERROR_INVALID_HANDLE = 400,
>CUDA_ERROR_NOT_FOUND = 500,
>CUDA_ERROR_NOT_READY = 600,
>CUDA_ERROR_LAUNCH_FAILED = 719,
> @@ -126,6 +130,75 @@ typedef enum {
>CU_LIMIT_MALLOC_HEAP_SIZE = 0x02,
>  } CUlimit;
>
> +typedef enum {
> +  CU_MEMORYTYPE_HOST = 0x01,
> +  CU_MEMORYTYPE_DEVICE = 0x02,
> +  CU_MEMORYTYPE_ARRAY = 0x03,
> +  CU_MEMORYTYPE_UNIFIED = 0x04
> +} CUmemorytype;
> +
> +typedef struct {
> +  size_t srcXInBytes, srcY;
> +  CUmemorytype srcMemoryType;
> +  const void *srcHost;
> +  CUdeviceptr srcDevice;
> +  CUarray srcArray;
> +  size_t srcPitch;
> +
> +  size_t dstXInBytes, dstY;
> +  CUmemorytype dstMemoryType;
> +  const void *dstHost;

That last one isn't 'const'.  ;-)

> +  CUdeviceptr dstDevice;
> +  CUarray dstArray;
> +  size_t dstPitch;
> +
> +  size_t WidthInBytes, Height;
> +} CUDA_MEMCPY2D;
> +
> +typedef struct {
> +  size_t srcXInBytes, srcY, srcZ;
> +  size_t srcLOD;
> +  CUmemorytype srcMemoryType;
> +  const void *srcHost;
> +  CUdeviceptr srcDevice;
> +  CUarray srcArray;
> +  void *dummy;

A 'cuda.h' that I looked at calls that last one 'reserved0', with comment
"Must be NULL".

> +  size_t srcPitch, srcHeight;
> +
> +  size_t dstXInBytes, dstY, dstZ;
> +  size_t dstLOD;
> +  CUmemorytype dstMemoryType;
> +  const void *dstHost;

Again, not 'const'.

> +  CUdeviceptr dstDevice;
> +  CUarray dstArray;
> +  void *dummy2;

Similar to above: 'reserved1', with comment "Must be NULL".

> +  size_t dstPitch, dstHeight;
> +
> +  size_t WidthInBytes, Height, Depth;
> +} CUDA_MEMCPY3D;
> +
> +typedef struct {
> +  size_t srcXInBytes, srcY, srcZ;
> +  size_t srcLOD;
> +  CUmemorytype srcMemoryType;
> +  const void *srcHost;
> +  CUdeviceptr srcDevice;
> +  CUarray srcArray;
> +  CUcontext srcContext;
> +  size_t srcPitch, srcHeight;
> +
> +  size_t dstXInBytes, dstY, dstZ;
> +  size_t dstLOD;
> +  CUmemorytype dstMemoryType;
> +  const void *dstHost;
> +  CUdeviceptr dstDevice;
> +  CUarray dstArray;
> +  CUcontext dstContext;
> +  size_t dstPitch, dstHeight;
> +
> +  size_t WidthInBytes, Height, Depth;
> +} CUDA_MEMCPY3D_PEER;
> +
>  #define cuCtxCreate cuCtxCreate_v2
>  CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
>  #define cuCtxDestroy cuCtxDestroy_v2
> @@ -183,6 +256,18 @@ CUresult cuMemcpyDtoHAsync (void *, CUdeviceptr, size_t, 
> CUstream);
>  CUresult cuMemcpyHtoD (CUdeviceptr, const void *, size_t);
>  #define cuMemcpyHtoDAsync cuMemcpyHtoDAsync_v2
>  CUresult cuMemcpyHtoDAsync (CUdeviceptr, const void *, size_t, CUstream);
> +#define cuMemcpy2D cuMemcpy2D_v2
> +CUresult cuMemcpy2D (const CUDA_MEMCPY2D *);
> +#define cuMemcpy2DAsync cuMemcpy2DAsync_v2
> +CUresult cuMemcpy2DAsync (const CUDA_MEMCPY2D *, CUstream);
> +#define cuMemcpy2DUnaligned cuMemcpy2DUnaligned_v2
> +CUresult cuMemcpy2DUnaligned (const CUDA_MEMCPY2D *);
> +#define cuMemcpy3D cuMemcpy3D_v2
> +CUresult cuMemcpy3D (const CUDA_MEMCPY3D *);
> +

[PATCH] bpf: minor doc cleanup for command-line options

2023-07-27 Thread David Faust via Gcc-patches
This patch makes some minor cleanups to eBPF options documented in
invoke.texi:
 - Delete some vestigal docs for removed -mkernel option
 - Add -mbswap and -msdiv to the option summary
 - Note the negative versions of several options
 - Note that -mcpu=v4 also enables -msdiv.

gcc/

* doc/invoke.texi (Option Summary): Remove -mkernel eBPF option.
Add -mbswap and -msdiv eBPF options.
(eBPF Options): Remove -mkernel.  Add -mno-{jmpext, jmp32,
alu32, v3-atomics, bswap, sdiv}.  Document that -mcpu=v4 also
enables -msdiv.
---
 gcc/doc/invoke.texi | 48 ++---
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e0fd7bd5b72..91113dd5821 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -945,9 +945,10 @@ Objective-C and Objective-C++ Dialects}.
 -mmemory-latency=@var{time}}
 
 @emph{eBPF Options}
-@gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
+@gccoptlist{-mbig-endian -mlittle-endian
 -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
--mjmp32 -malu32 -mv3-atomics -mcpu=@var{version} -masm=@var{dialect}}
+-mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -mcpu=@var{version}
+-masm=@var{dialect}}
 
 @emph{FR30 Options}
 @gccoptlist{-msmall-model  -mno-lsim}
@@ -24674,18 +24675,6 @@ the value that can be specified should be less than or 
equal to
 @samp{32767}.  Defaults to whatever limit is imposed by the version of
 the Linux kernel targeted.
 
-@opindex mkernel
-@item -mkernel=@var{version}
-This specifies the minimum version of the kernel that will run the
-compiled program.  GCC uses this version to determine which
-instructions to use, what kernel helpers to allow, etc.  Currently,
-@var{version} can be one of @samp{4.0}, @samp{4.1}, @samp{4.2},
-@samp{4.3}, @samp{4.4}, @samp{4.5}, @samp{4.6}, @samp{4.7},
-@samp{4.8}, @samp{4.9}, @samp{4.10}, @samp{4.11}, @samp{4.12},
-@samp{4.13}, @samp{4.14}, @samp{4.15}, @samp{4.16}, @samp{4.17},
-@samp{4.18}, @samp{4.19}, @samp{4.20}, @samp{5.0}, @samp{5.1},
-@samp{5.2}, @samp{latest} and @samp{native}.
-
 @opindex mbig-endian
 @item -mbig-endian
 Generate code for a big-endian target.
@@ -24696,30 +24685,38 @@ Generate code for a little-endian target.  This is 
the default.
 
 @opindex mjmpext
 @item -mjmpext
-Enable generation of extra conditional-branch instructions.
+@itemx -mno-jmpext
+Enable or disable generation of extra conditional-branch instructions.
 Enabled for CPU v2 and above.
 
 @opindex mjmp32
 @item -mjmp32
-Enable 32-bit jump instructions. Enabled for CPU v3 and above.
+@itemx -mno-jmp32
+Enable or disable generation of 32-bit jump instructions.
+Enabled for CPU v3 and above.
 
 @opindex malu32
 @item -malu32
-Enable 32-bit ALU instructions. Enabled for CPU v3 and above.
+@itemx -mno-alu32
+Enable or disable generation of 32-bit ALU instructions.
+Enabled for CPU v3 and above.
+
+@opindex mv3-atomics
+@item -mv3-atomics
+@itemx -mno-v3-atomics
+Enable or disable instructions for general atomic operations introduced
+in CPU v3.  Enabled for CPU v3 and above.
 
 @opindex mbswap
 @item -mbswap
-Enable byte swap instructions.  Enabled for CPU v4 and above.
+@itemx -mno-bswap
+Enable or disable byte swap instructions.  Enabled for CPU v4 and above.
 
 @opindex msdiv
 @item -msdiv
-Enable signed division and modulus instructions.  Enabled for CPU v4
-and above.
-
-@opindex mv3-atomics
-@item -mv3-atomics
-Enable instructions for general atomic operations introduced in CPU v3.
-Enabled for CPU v3 and above.
+@itemx -mno-sdiv
+Enable or disable signed division and modulus instructions.  Enabled for
+CPU v4 and above.
 
 @opindex mcpu
 @item -mcpu=@var{version}
@@ -24747,6 +24744,7 @@ All features of v2, plus:
 All features of v3, plus:
 @itemize @minus
 @item Byte swap instructions, as in @option{-mbswap}
+@item Signed division and modulus instructions, as in @option{-msdiv}
 @end itemize
 @end table
 
-- 
2.40.1



[PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782, PR110784]

2023-07-27 Thread David Faust via Gcc-patches
BPF ISA V4 introduces sign-extending move and load operations.  This
patch makes the BPF backend generate those instructions, when enabled
and useful.

A new option, -m[no-]smov gates generation of these instructions, and is
enabled by default for -mcpu=v4 and above.  Tests for the new
instructions and documentation for the new options are included.

Tested on bpf-unknown-none.
OK?

gcc/

* config/bpf/bpf.opt (msmov): New option.
* config/bpf/bpf.cc (bpf_option_override): Handle it here.
* config/bpf/bpf.md (*extendsidi2): New.
(extendhidi2): New.
(extendqidi2): New.
(extendsisi2): New.
(extendhisi2): New.
(extendqisi2): New.
* doc/invoke.texi (Option Summary): Add -msmov eBPF option.
(eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
also enables -msmov.

gcc/testsuite/

* gcc.target/bpf/sload-1.c: New test.
* gcc.target/bpf/sload-pseudoc-1.c: New test.
* gcc.target/bpf/smov-1.c: New test.
* gcc.target/bpf/smov-pseudoc-1.c: New test.
---
 gcc/config/bpf/bpf.cc |  3 ++
 gcc/config/bpf/bpf.md | 50 +++
 gcc/config/bpf/bpf.opt|  4 ++
 gcc/doc/invoke.texi   |  9 +++-
 gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++
 .../gcc.target/bpf/sload-pseudoc-1.c  | 16 ++
 gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++
 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++
 8 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 0e07b416add..b5b5674edbb 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -262,6 +262,9 @@ bpf_option_override (void)
   if (bpf_has_sdiv == -1)
 bpf_has_sdiv = (bpf_isa >= ISA_V4);
 
+  if (bpf_has_smov == -1)
+bpf_has_smov = (bpf_isa >= ISA_V4);
+
   /* Disable -fstack-protector as it is not supported in BPF.  */
   if (flag_stack_protect)
 {
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 66436397bb7..a69a239b9d6 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -307,6 +307,56 @@ (define_expand "extendsidi2"
   DONE;
 })
 
+;; ISA V4 introduces sign-extending move and load operations.
+
+(define_insn "*extendsidi2"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))]
+  "bpf_has_smov"
+  "@
+   {movs\t%0,%1,32|%0 = (s32) %1}
+   {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}"
+  [(set_attr "type" "alu,ldx")])
+
+(define_insn "extendhidi2"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))]
+  "bpf_has_smov"
+  "@
+   {movs\t%0,%1,16|%0 = (s16) %1}
+   {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}"
+  [(set_attr "type" "alu,ldx")])
+
+(define_insn "extendqidi2"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))]
+  "bpf_has_smov"
+  "@
+   {movs\t%0,%1,8|%0 = (s8) %1}
+   {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
+  [(set_attr "type" "alu,ldx")])
+
+(define_insn "extendsisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
+  "bpf_has_smov"
+  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
+  [(set_attr "type" "alu")])
+
+(define_insn "extendhisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
+  "bpf_has_smov"
+  "{movs32\t%0,%1,16|%w0 = (s16) %w1}"
+  [(set_attr "type" "alu")])
+
+(define_insn "extendqisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))]
+  "bpf_has_smov"
+  "{movs32\t%0,%1,8|%w0 = (s8) %w1}"
+  [(set_attr "type" "alu")])
+
  Data movement
 
 (define_mode_iterator MM [QI HI SI DI SF DF])
diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index b21cfcab9ea..8e240d397e4 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -71,6 +71,10 @@ msdiv
 Target Var(bpf_has_sdiv) Init(-1)
 Enable signed division and modulus instructions.
 
+msmov
+Target Var(bpf_has_smov) Init(-1)
+Enable signed move and memory load instructions.
+
 mcpu=
 Target RejectNegative Joined Var(bpf_isa) Enum(bpf_isa) Init(ISA_V4)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 91113dd5821..e574acfd612 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -947,7 +947,7 @@ Objective-C and Objective-C++ Dialects}.
 @emph{eBPF Options}
 @gccoptlist{-mbig-endian -mlittle-endian
 -mframe-limit=@var{b

Re: [PATCH] Add -fsarif-time-report [PR109361]

2023-07-27 Thread David Malcolm via Gcc-patches
On Tue, 2023-04-11 at 08:43 +, Richard Biener wrote:
> On Tue, 4 Apr 2023, David Malcolm wrote:
> 
> > Richi, Jakub: I can probably self-approve this, but it's
> > technically a
> > new feature.  OK if I push this to trunk in stage 4?  I believe
> > it's
> > low risk, and is very useful for benchmarking -fanalyzer.
> 
> Please wait for stage1 at this point.  One comment on the patch
> below ...
> 
> > 
> > This patch adds support for embeddding profiling information about
> > the
> > compiler itself into the SARIF output.
> > 
> > In an earlier version of this patch I extended -ftime-report so
> > that
> > as well as writing to stderr, it would embed the information in any
> > SARIF output.  This turned out to be awkward to use, in that I
> > found
> > myself needing to get the data in JSON form without also having it
> > emitted on stderr (which was affecting the output of the build).
> > 
> > Hence this version of the patch adds a new -fsarif-time-report,
> > similar
> > to the existing -ftime-report for requesting GCC profile itself
> > using
> > the timevar machinery.
> > 
> > Specifically, if -fsarif-time-report is specified, the timing
> > information will be captured (as if -ftime-report were specified),
> > and
> > will be embedded in JSON form within any SARIF as a
> > "gcc/timeReport"
> > property within a property bag of the "invocation" object.
> > 
> > Here's an example of the output:
> > 
> >   "invocations": [
> >   {
> >   "executionSuccessful": true,
> >   "toolExecutionNotifications": [],
> >   "properties": {
> >   "gcc/timeReport": {
> >   "timevars": [
> >   {
> >   "name": "phase setup",
> >   "elapsed": {
> >   "user": 0.04,
> >   "sys": 0,
> >   "wall": 0.04,
> >   "ggc_mem": 1863472
> >   }
> >   },
> > 
> >   [...snip...]
> > 
> >   {
> >   "name": "analyzer: processing worklist",
> >   "elapsed": {
> >   "user": 0.06,
> >   "sys": 0,
> >   "wall": 0.06,
> >   "ggc_mem": 48
> >   }
> >   },
> >   {
> >   "name": "analyzer: emitting diagnostics",
> >   "elapsed": {
> >   "user": 0.01,
> >   "sys": 0,
> >   "wall": 0.01,
> >   "ggc_mem": 0
> >   }
> >   },
> >   {
> >   "name": "TOTAL",
> >   "elapsed": {
> >   "user": 0.21,
> >   "sys": 0.03,
> >   "wall": 0.24,
> >   "ggc_mem": 3368736
> >   }
> >   }
> >   ],
> >   "CHECKING_P": true,
> >   "flag_checking": true
> >   }
> >   }
> >   }
> >   ]
> > 
> > I have successfully used this in my analyzer integration tests to
> > get
> > timing information about which source files get slowed down by the
> > analyzer.  I've validated the generated .sarif files against the
> > SARIF
> > schema.
> > 
> > The documentation notes that the precise output format is subject
> > to change.
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > gcc/ChangeLog:
> > PR analyzer/109361
> > * common.opt (fsarif-time-report): New option.
> 
> 'sarif' is currently used only with -fdiagnostics-format= it seems.
> We already have
> 
> ftime-report
> Common Var(time_report)
> Report the time taken by each compiler pass.
> 
> ftime-report-details
> Common Var(time_report_details)
> Record times taken by sub-phases separately. 
> 
> so -fsarif-time-report is not a) -ftime-report-sarif and b) it's
> unclear if it applies to -ftime-report or to both -ftime-report
> and -ftime-report-details?  (note -ftime-report-details needs
> -ftime-report to be effective)
> 
> I'd rather have a -ftime-report-format= (or -freport-format in
> case we want to cover -fmem-report, -fmem-report-wpa,
> -fpre-ipa-mem-report and -fpost-ipa-mem-report as well?)
> 
> ISTR there's a summer of code project in this are as well.
> 
> Thanks,
> Richard.

Revisiting this; sorry about the delay.

As I understand the status quo, we currently have:
* -ftime-report: enable capturing of timing information (with a slight
speed hit), and report it to stderr
* -ftime-report-details: tweak how that information is captured (if -
ftim

Re: [PATCH] bpf: minor doc cleanup for command-line options

2023-07-27 Thread Jose E. Marchesi via Gcc-patches


Hi David, thanks for the patch.
OK.


> This patch makes some minor cleanups to eBPF options documented in
> invoke.texi:
>  - Delete some vestigal docs for removed -mkernel option
>  - Add -mbswap and -msdiv to the option summary
>  - Note the negative versions of several options
>  - Note that -mcpu=v4 also enables -msdiv.
>
> gcc/
>
>   * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option.
>   Add -mbswap and -msdiv eBPF options.
>   (eBPF Options): Remove -mkernel.  Add -mno-{jmpext, jmp32,
>   alu32, v3-atomics, bswap, sdiv}.  Document that -mcpu=v4 also
>   enables -msdiv.
> ---
>  gcc/doc/invoke.texi | 48 ++---
>  1 file changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e0fd7bd5b72..91113dd5821 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -945,9 +945,10 @@ Objective-C and Objective-C++ Dialects}.
>  -mmemory-latency=@var{time}}
>  
>  @emph{eBPF Options}
> -@gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
> +@gccoptlist{-mbig-endian -mlittle-endian
>  -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
> --mjmp32 -malu32 -mv3-atomics -mcpu=@var{version} -masm=@var{dialect}}
> +-mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -mcpu=@var{version}
> +-masm=@var{dialect}}
>  
>  @emph{FR30 Options}
>  @gccoptlist{-msmall-model  -mno-lsim}
> @@ -24674,18 +24675,6 @@ the value that can be specified should be less than 
> or equal to
>  @samp{32767}.  Defaults to whatever limit is imposed by the version of
>  the Linux kernel targeted.
>  
> -@opindex mkernel
> -@item -mkernel=@var{version}
> -This specifies the minimum version of the kernel that will run the
> -compiled program.  GCC uses this version to determine which
> -instructions to use, what kernel helpers to allow, etc.  Currently,
> -@var{version} can be one of @samp{4.0}, @samp{4.1}, @samp{4.2},
> -@samp{4.3}, @samp{4.4}, @samp{4.5}, @samp{4.6}, @samp{4.7},
> -@samp{4.8}, @samp{4.9}, @samp{4.10}, @samp{4.11}, @samp{4.12},
> -@samp{4.13}, @samp{4.14}, @samp{4.15}, @samp{4.16}, @samp{4.17},
> -@samp{4.18}, @samp{4.19}, @samp{4.20}, @samp{5.0}, @samp{5.1},
> -@samp{5.2}, @samp{latest} and @samp{native}.
> -
>  @opindex mbig-endian
>  @item -mbig-endian
>  Generate code for a big-endian target.
> @@ -24696,30 +24685,38 @@ Generate code for a little-endian target.  This is 
> the default.
>  
>  @opindex mjmpext
>  @item -mjmpext
> -Enable generation of extra conditional-branch instructions.
> +@itemx -mno-jmpext
> +Enable or disable generation of extra conditional-branch instructions.
>  Enabled for CPU v2 and above.
>  
>  @opindex mjmp32
>  @item -mjmp32
> -Enable 32-bit jump instructions. Enabled for CPU v3 and above.
> +@itemx -mno-jmp32
> +Enable or disable generation of 32-bit jump instructions.
> +Enabled for CPU v3 and above.
>  
>  @opindex malu32
>  @item -malu32
> -Enable 32-bit ALU instructions. Enabled for CPU v3 and above.
> +@itemx -mno-alu32
> +Enable or disable generation of 32-bit ALU instructions.
> +Enabled for CPU v3 and above.
> +
> +@opindex mv3-atomics
> +@item -mv3-atomics
> +@itemx -mno-v3-atomics
> +Enable or disable instructions for general atomic operations introduced
> +in CPU v3.  Enabled for CPU v3 and above.
>  
>  @opindex mbswap
>  @item -mbswap
> -Enable byte swap instructions.  Enabled for CPU v4 and above.
> +@itemx -mno-bswap
> +Enable or disable byte swap instructions.  Enabled for CPU v4 and above.
>  
>  @opindex msdiv
>  @item -msdiv
> -Enable signed division and modulus instructions.  Enabled for CPU v4
> -and above.
> -
> -@opindex mv3-atomics
> -@item -mv3-atomics
> -Enable instructions for general atomic operations introduced in CPU v3.
> -Enabled for CPU v3 and above.
> +@itemx -mno-sdiv
> +Enable or disable signed division and modulus instructions.  Enabled for
> +CPU v4 and above.
>  
>  @opindex mcpu
>  @item -mcpu=@var{version}
> @@ -24747,6 +24744,7 @@ All features of v2, plus:
>  All features of v3, plus:
>  @itemize @minus
>  @item Byte swap instructions, as in @option{-mbswap}
> +@item Signed division and modulus instructions, as in @option{-msdiv}
>  @end itemize
>  @end table


Re: [PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]

2023-07-27 Thread Jose E. Marchesi via Gcc-patches


Hi David.
Thanks for the patch.

> BPF ISA V4 introduces sign-extending move and load operations.  This
> patch makes the BPF backend generate those instructions, when enabled
> and useful.
>
> A new option, -m[no-]smov gates generation of these instructions, and is
> enabled by default for -mcpu=v4 and above.  Tests for the new
> instructions and documentation for the new options are included.
>
> Tested on bpf-unknown-none.
> OK?
>
> gcc/
>
>   * config/bpf/bpf.opt (msmov): New option.
>   * config/bpf/bpf.cc (bpf_option_override): Handle it here.
>   * config/bpf/bpf.md (*extendsidi2): New.
>   (extendhidi2): New.
>   (extendqidi2): New.
>   (extendsisi2): New.
>   (extendhisi2): New.
>   (extendqisi2): New.
>   * doc/invoke.texi (Option Summary): Add -msmov eBPF option.
>   (eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
>   also enables -msmov.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/sload-1.c: New test.
>   * gcc.target/bpf/sload-pseudoc-1.c: New test.
>   * gcc.target/bpf/smov-1.c: New test.
>   * gcc.target/bpf/smov-pseudoc-1.c: New test.

Looks like you forgot to mention the bugzilla PR in the changelog
entries.  Would be nice to have them there so automatic updates happen
in the bugzillas.

Other than that, OK.
Thanks!

> ---
>  gcc/config/bpf/bpf.cc |  3 ++
>  gcc/config/bpf/bpf.md | 50 +++
>  gcc/config/bpf/bpf.opt|  4 ++
>  gcc/doc/invoke.texi   |  9 +++-
>  gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++
>  .../gcc.target/bpf/sload-pseudoc-1.c  | 16 ++
>  gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++
>  gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++
>  8 files changed, 133 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 0e07b416add..b5b5674edbb 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -262,6 +262,9 @@ bpf_option_override (void)
>if (bpf_has_sdiv == -1)
>  bpf_has_sdiv = (bpf_isa >= ISA_V4);
>  
> +  if (bpf_has_smov == -1)
> +bpf_has_smov = (bpf_isa >= ISA_V4);
> +
>/* Disable -fstack-protector as it is not supported in BPF.  */
>if (flag_stack_protect)
>  {
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 66436397bb7..a69a239b9d6 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -307,6 +307,56 @@ (define_expand "extendsidi2"
>DONE;
>  })
>  
> +;; ISA V4 introduces sign-extending move and load operations.
> +
> +(define_insn "*extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))]
> +  "bpf_has_smov"
> +  "@
> +   {movs\t%0,%1,32|%0 = (s32) %1}
> +   {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}"
> +  [(set_attr "type" "alu,ldx")])
> +
> +(define_insn "extendhidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))]
> +  "bpf_has_smov"
> +  "@
> +   {movs\t%0,%1,16|%0 = (s16) %1}
> +   {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}"
> +  [(set_attr "type" "alu,ldx")])
> +
> +(define_insn "extendqidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))]
> +  "bpf_has_smov"
> +  "@
> +   {movs\t%0,%1,8|%0 = (s8) %1}
> +   {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
> +  [(set_attr "type" "alu,ldx")])
> +
> +(define_insn "extendsisi2"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
> +  "bpf_has_smov"
> +  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
> +  [(set_attr "type" "alu")])
> +
> +(define_insn "extendhisi2"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
> +  "bpf_has_smov"
> +  "{movs32\t%0,%1,16|%w0 = (s16) %w1}"
> +  [(set_attr "type" "alu")])
> +
> +(define_insn "extendqisi2"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))]
> +  "bpf_has_smov"
> +  "{movs32\t%0,%1,8|%w0 = (s8) %w1}"
> +  [(set_attr "type" "alu")])
> +
>   Data movement
>  
>  (define_mode_iterator MM [QI HI SI DI SF DF])
> diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
> index b21cfcab9ea..8e240d397e4 100644
> --- a/gcc/config/bpf/bpf.opt
> +++ b/gcc/config/bpf/bpf.opt
> @@ -71,6 +71,10 @@ msdiv
>  Target Var(bpf_has_sdiv) Init(-1)
>  Enable signed division and modulus instructions.
>  
> +msmov
> +Target V

Re: [PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]

2023-07-27 Thread David Faust via Gcc-patches



On 7/27/23 15:27, Jose E. Marchesi wrote:
> 
> Hi David.
> Thanks for the patch.
> 
>> BPF ISA V4 introduces sign-extending move and load operations.  This
>> patch makes the BPF backend generate those instructions, when enabled
>> and useful.
>>
>> A new option, -m[no-]smov gates generation of these instructions, and is
>> enabled by default for -mcpu=v4 and above.  Tests for the new
>> instructions and documentation for the new options are included.
>>
>> Tested on bpf-unknown-none.
>> OK?
>>
>> gcc/
>>
>>  * config/bpf/bpf.opt (msmov): New option.
>>  * config/bpf/bpf.cc (bpf_option_override): Handle it here.
>>  * config/bpf/bpf.md (*extendsidi2): New.
>>  (extendhidi2): New.
>>  (extendqidi2): New.
>>  (extendsisi2): New.
>>  (extendhisi2): New.
>>  (extendqisi2): New.
>>  * doc/invoke.texi (Option Summary): Add -msmov eBPF option.
>>  (eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
>>  also enables -msmov.
>>
>> gcc/testsuite/
>>
>>  * gcc.target/bpf/sload-1.c: New test.
>>  * gcc.target/bpf/sload-pseudoc-1.c: New test.
>>  * gcc.target/bpf/smov-1.c: New test.
>>  * gcc.target/bpf/smov-pseudoc-1.c: New test.
> 
> Looks like you forgot to mention the bugzilla PR in the changelog
> entries.  Would be nice to have them there so automatic updates happen
> in the bugzillas.

Good catch, thanks!

> 
> Other than that, OK.
> Thanks!

Pushed, with PRs added in the changelog and a tiny reword to the doc below.

> 
>> ---
>>  gcc/config/bpf/bpf.cc |  3 ++
>>  gcc/config/bpf/bpf.md | 50 +++
>>  gcc/config/bpf/bpf.opt|  4 ++
>>  gcc/doc/invoke.texi   |  9 +++-
>>  gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++
>>  .../gcc.target/bpf/sload-pseudoc-1.c  | 16 ++
>>  gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++
>>  gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++
>>  8 files changed, 133 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c
>>
>> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
>> index 0e07b416add..b5b5674edbb 100644
>> --- a/gcc/config/bpf/bpf.cc
>> +++ b/gcc/config/bpf/bpf.cc
>> @@ -262,6 +262,9 @@ bpf_option_override (void)
>>if (bpf_has_sdiv == -1)
>>  bpf_has_sdiv = (bpf_isa >= ISA_V4);
>>  
>> +  if (bpf_has_smov == -1)
>> +bpf_has_smov = (bpf_isa >= ISA_V4);
>> +
>>/* Disable -fstack-protector as it is not supported in BPF.  */
>>if (flag_stack_protect)
>>  {
>> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
>> index 66436397bb7..a69a239b9d6 100644
>> --- a/gcc/config/bpf/bpf.md
>> +++ b/gcc/config/bpf/bpf.md
>> @@ -307,6 +307,56 @@ (define_expand "extendsidi2"
>>DONE;
>>  })
>>  
>> +;; ISA V4 introduces sign-extending move and load operations.
>> +
>> +(define_insn "*extendsidi2"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))]
>> +  "bpf_has_smov"
>> +  "@
>> +   {movs\t%0,%1,32|%0 = (s32) %1}
>> +   {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}"
>> +  [(set_attr "type" "alu,ldx")])
>> +
>> +(define_insn "extendhidi2"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))]
>> +  "bpf_has_smov"
>> +  "@
>> +   {movs\t%0,%1,16|%0 = (s16) %1}
>> +   {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}"
>> +  [(set_attr "type" "alu,ldx")])
>> +
>> +(define_insn "extendqidi2"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))]
>> +  "bpf_has_smov"
>> +  "@
>> +   {movs\t%0,%1,8|%0 = (s8) %1}
>> +   {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
>> +  [(set_attr "type" "alu,ldx")])
>> +
>> +(define_insn "extendsisi2"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
>> +  "bpf_has_smov"
>> +  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
>> +  [(set_attr "type" "alu")])
>> +
>> +(define_insn "extendhisi2"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
>> +  "bpf_has_smov"
>> +  "{movs32\t%0,%1,16|%w0 = (s16) %w1}"
>> +  [(set_attr "type" "alu")])
>> +
>> +(define_insn "extendqisi2"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))]
>> +  "bpf_has_smov"
>> +  "{movs32\t%0,%1,8|%w0 = (s8) %w1}"
>> +  [(set_attr "type" "alu")])
>> +
>>   Data movement
>>  
>>  (define_mode_iterator MM [QI HI SI DI SF DF])
>> diff --git a/gcc/config/bpf/

[PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-27 Thread Lewis Hyatt via Gcc-patches
In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare.
(c_lex_enable_token_streaming): Declare.
* c-opts.cc (c_common_init): Call c_init_preprocess ().
* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
(c_lex_enable_token_streaming): New function.
(cb_def_pragma): Add a comment.
(get_token): New function wrapping cpp_get_token.
(c_lex_with_flags): Use the new wrapper function to support
obtaining tokens in preprocess_only mode.
(lex_string): Likewise.
* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
when needed.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
---

Notes:
Hello-

Here is version 2 of the patch, incorporating Jason's feedback from
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html

Thanks again, please let me know if it's OK? Bootstrap + regtest all
languages on x86-64 Linux looks good.

-Lewis

 gcc/c-family/c-common.h|  4 +++
 gcc/c-family/c-lex.cc  | 49 +
 gcc/c-family/c-opts.cc |  1 +
 gcc/c-family/c-ppoutput.cc | 17 +---
 gcc/c-family/c-pragma.cc   | 56 ++
 gcc/c-family/c-pragma.h|  2 ++
 gcc/c/c-parser.cc  | 21 ++
 gcc/cp/parser.cc   | 45 ++
 8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if a given STATEMENT_LIST represents the outermost binding
@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
 /* In c-lex.cc.  */
 extern enum cpp_ttype
 conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
 
 /* In c-pch.cc  */
 extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const 
cpp_string *);
 static void cb_def_pragma (cpp_reader *, unsigned int);
 static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
 static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
+
+/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
+   tokens obtained here need to be streamed to the preprocessor.  */
+stat

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-27 Thread Jason Merrill via Gcc-patches

On 7/23/23 20:26, Ben Boeckel wrote:

On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:

It occurs to me that the model I am envisioning is similar to CMake's object
libraries.  Object libraries are a convenient name for a bunch of object files.
IIUC they're linked by naming the individual object files (or I think the could
be implemented as a static lib linked with --whole-archive path/to/libfoo.a
-no-whole-archive.  But for this conversation consider them a bunch of separate
object files with a convenient group name.


Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.


Consider also that object libraries could themselves contain object libraries (I
don't know of they can, but it seems like a useful concept).  Then one could
create an object library from a collection of object files and object libraries
(recursively).  CMake would handle the transitive gtaph.


I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

 A (OBJECT library)
 B (library of some kind; links PUBLIC to A)
 C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.


Now, allow an object library to itself have some kind of tangible, on-disk
representation.  *BUT* not like a static library -- it doesn't include the
object files.


Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports
of a CMI.  CMake knows them, because it knows the graph.


Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.


But since the direct imports need to be rebuilt themselves if the 
transitive imports change, the build graph should be the same whether or 
not the transitive imports are repeated?  Either way, if a transitive 
import changes you need to rebuild the direct import and then the importer.


I guess it shouldn't hurt to have the transitive imports in the -MF 
file, as long as they aren't also in the p1689 file, so I'm not 
particularly opposed to this change, but I don't see how it makes a 
practical difference.


Jason



[PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Pan Li via Gcc-patches
From: Pan Li 

Update in PATCH v8:

1. Emit non-abnormal backup insn to edge.
2. Fix _after return when call.
3. Refine some run tests.
4. Cleanup code.

Original commit logs:

In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.

During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.

1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.

We will perfrom some steps to make above happen.

1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.

Let's take a flow for this.

   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+

When call is the last insn of one bb, we take care of it when needed
for each insn by inserting one frm backup (frrm) insn to the end of
the current bb.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro.
(STATIC_FRM_P): Ditto.
(struct mode_switching_info): New struct for mode switching.
(struct machine_function): Add new field mode switching.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_adjust_mode_after_call): New function for call mode.
(riscv_frm_emit_after_call_in_bb_end): New function for emit
insn when call as the end of bb.
(riscv_frm_mode_needed): New function for frm mode needed.
(frm_unknown_dynamic_p): Remove call check.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
(riscv_mode_entry): Remove backup rtl initialization.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: New test.
* gcc.target/riscv/rvv/base

Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-27 Thread Jason Merrill via Gcc-patches

On 7/27/23 18:59, Lewis Hyatt wrote:

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare.
(c_lex_enable_token_streaming): Declare.
* c-opts.cc (c_common_init): Call c_init_preprocess ().
* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
(c_lex_enable_token_streaming): New function.
(cb_def_pragma): Add a comment.
(get_token): New function wrapping cpp_get_token.
(c_lex_with_flags): Use the new wrapper function to support
obtaining tokens in preprocess_only mode.
(lex_string): Likewise.
* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
when needed.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
---

Notes:
 Hello-
 
 Here is version 2 of the patch, incorporating Jason's feedback from

 https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
 
 Thanks again, please let me know if it's OK? Bootstrap + regtest all

 languages on x86-64 Linux looks good.
 
 -Lewis


  gcc/c-family/c-common.h|  4 +++
  gcc/c-family/c-lex.cc  | 49 +
  gcc/c-family/c-opts.cc |  1 +
  gcc/c-family/c-ppoutput.cc | 17 +---
  gcc/c-family/c-pragma.cc   | 56 ++
  gcc/c-family/c-pragma.h|  2 ++
  gcc/c/c-parser.cc  | 21 ++
  gcc/cp/parser.cc   | 45 ++
  8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
  
  extern void c_parse_final_cleanups (void);
  
+/* This initializes for preprocess-only mode.  */

+extern void c_init_preprocess (void);
+
  /* These macros provide convenient access to the various _STMT nodes.  */
  
  /* Nonzero if a given STATEMENT_LIST represents the outermost binding

@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
  /* In c-lex.cc.  */
  extern enum cpp_ttype
  conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
  
  /* In c-pch.cc  */

  extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const 
cpp_string *);
  static void cb_def_pragma (cpp_reader *, unsigned int);
  static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
  static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
+
+/* Flag to remember if we are in a mode (such as flag_preprocess_only) in whi

Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread Demin Han
Sorry for not consider rv32 config.
The fix is OK. If convenient, please commit it.

On 2023/7/28 4:46, Patrick O'Neill wrote:
> The newly added testcase fails on rv32 targets with this message:
> FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
> for excess errors)
> 
> verbose log:
> compiler exited with status 1
> output is:
> cc1: error: ABI requires '-march=rv32'
> 
> Something like this appears to fix the issue:
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> index 14a9802667e..e10a9e9d0f5 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce 
> --param riscv-autovec-preference=scalable" } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
> -fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable"
>  } */
>  
>  long
>  foo (long *__restrict a, long *__restrict b, long n)
> 
> On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote:
> 
>> My first impression is those emit_insn (gen_rtx_SET()) seems
>> necessary, but I got the point after I checked vector.md :P
>>
>> Committed to trunk, thanks :)
>>
>>
>> On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai
>>   wrote:
>>> Oh, YES.
>>>
>>> Thanks for fixing it. It makes sense since the ternary operations in 
>>> "vector.md"
>>> generate "vmv.v.v" according to RA.
>>>
>>> Thanks for fixing it.
>>>
>>> @kito: Could you confirm it? If it's ok to you, commit it for Han (I am 
>>> lazy to commit patches :).
>>>
>>>
>>>
>>> juzhe.zh...@rivai.ai
>>>
>>> From: demin.han
>>> Date: 2023-07-27 17:48
>>> To:gcc-patches@gcc.gnu.org
>>> CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai
>>> Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
>>> which_alternative
>>> When pass split2 starts, which_alternative is random depending on
>>> last set of certain pass.
>>>
>>> Even initialized, the generated movement is redundant.
>>> The movement can be generated by assembly output template.
>>>
>>> Signed-off-by: demin.han
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/autovec.md: Delete which_alternative use in split
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.
>>>
>>> ---
>>> gcc/config/riscv/autovec.md | 12 
>>> .../gcc.target/riscv/rvv/autovec/madd-split2-1.c    | 13 +
>>> 2 files changed, 13 insertions(+), 12 deletions(-)
>>> create mode 100644 
>>> gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
>>>
>>> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
>>> index d899922586a..b7ea3101f5a 100644
>>> --- a/gcc/config/riscv/autovec.md
>>> +++ b/gcc/config/riscv/autovec.md
>>> @@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
>>> (mode),
>>>     riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
>>> (mode),
>>>  riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
>>> mode),
>>>    riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg 
>>> (PLUS, mode),
>>>    riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
>>>     [(const_int 0)]
>>>   

[PATCH v3 1/2] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-27 Thread Ken Matsui via Gcc-patches
This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT macro, which will be used
as a flag to toggle the use of built-in traits in the type_traits header
through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
source code.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
(_GLIBCXX_HAS_BUILTIN): Keep defined.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/bits/c++config | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index dd47f274d5f..984985d6fff 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -854,7 +854,15 @@ namespace __gnu_cxx
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
 #endif
 
-#undef _GLIBCXX_HAS_BUILTIN
+// Returns 1 if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
+// has a corresponding built-in type trait, 0 otherwise.
+// _GLIBCXX_NO_BUILTIN_TRAITS can be defined to disable the use of built-in
+// traits.
+#ifndef _GLIBCXX_NO_BUILTIN_TRAITS
+# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) _GLIBCXX_HAS_BUILTIN(BT)
+#else
+# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) 0
+#endif
 
 // Mark code that should be ignored by the compiler, but seen by Doxygen.
 #define _GLIBCXX_DOXYGEN_ONLY(X)
-- 
2.41.0



[PATCH v3 2/2] libstdc++: Use _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-27 Thread Ken Matsui via Gcc-patches
This patch uses _GLIBCXX_HAS_BUILTIN_TRAIT macro instead of
__has_builtin in the type_traits header. This macro supports to toggle
the use of built-in traits in the type_traits header through
_GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
source code.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__has_builtin): Replace with ...
(_GLIBCXX_HAS_BUILTIN): ... this.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 9f086992ebc..12423361b6e 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1411,7 +1411,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __bool_constant<__is_base_of(_Base, _Derived)>
 { };
 
-#if __has_builtin(__is_convertible)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_convertible)
   template
 struct is_convertible
 : public __bool_constant<__is_convertible(_From, _To)>
@@ -1462,7 +1462,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus >= 202002L
 #define __cpp_lib_is_nothrow_convertible 201806L
 
-#if __has_builtin(__is_nothrow_convertible)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_nothrow_convertible)
   /// is_nothrow_convertible_v
   template
 inline constexpr bool is_nothrow_convertible_v
@@ -1537,7 +1537,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { using type = _Tp; };
 
   /// remove_cv
-#if __has_builtin(__remove_cv)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cv)
   template
 struct remove_cv
 { using type = __remove_cv(_Tp); };
@@ -1606,7 +1606,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Reference transformations.
 
   /// remove_reference
-#if __has_builtin(__remove_reference)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_reference)
   template
 struct remove_reference
 { using type = __remove_reference(_Tp); };
@@ -2963,7 +2963,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template(_S_get())),
   typename = decltype(_S_conv<_Tp>(_S_get())),
-#if __has_builtin(__reference_converts_from_temporary)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary)
   bool _Dangle = __reference_converts_from_temporary(_Tp, _Res_t)
 #else
   bool _Dangle = false
@@ -3420,7 +3420,7 @@ template
*/
 #define __cpp_lib_remove_cvref 201711L
 
-#if __has_builtin(__remove_cvref)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cvref)
   template
 struct remove_cvref
 { using type = __remove_cvref(_Tp); };
@@ -3515,7 +3515,7 @@ template
 : public bool_constant>
 { };
 
-#if __has_builtin(__is_layout_compatible)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_layout_compatible)
 
   /// @since C++20
   template
@@ -3529,7 +3529,7 @@ template
 constexpr bool is_layout_compatible_v
   = __is_layout_compatible(_Tp, _Up);
 
-#if __has_builtin(__builtin_is_corresponding_member)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_corresponding_member)
 #define __cpp_lib_is_layout_compatible 201907L
 
   /// @since C++20
@@ -3540,7 +3540,7 @@ template
 #endif
 #endif
 
-#if __has_builtin(__is_pointer_interconvertible_base_of)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_pointer_interconvertible_base_of)
   /// True if `_Derived` is standard-layout and has a base class of type 
`_Base`
   /// @since C++20
   template
@@ -3554,7 +3554,7 @@ template
 constexpr bool is_pointer_interconvertible_base_of_v
   = __is_pointer_interconvertible_base_of(_Base, _Derived);
 
-#if __has_builtin(__builtin_is_pointer_interconvertible_with_class)
+#if 
_GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_pointer_interconvertible_with_class)
 #define __cpp_lib_is_pointer_interconvertible 201907L
 
   /// True if `__mp` points to the first member of a standard-layout type
@@ -3590,8 +3590,8 @@ template
   template
 inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value;
 
-#if __has_builtin(__reference_constructs_from_temporary) \
-  && __has_builtin(__reference_converts_from_temporary)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_constructs_from_temporary) \
+  && _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary)
 
 #define __cpp_lib_reference_from_temporary 202202L
 
@@ -3632,7 +3632,7 @@ template
   template
 inline constexpr bool reference_converts_from_temporary_v
   = reference_converts_from_temporary<_Tp, _Up>::value;
-#endif // __has_builtin for reference_from_temporary
+#endif // _GLIBCXX_HAS_BUILTIN_TRAIT for reference_from_temporary
 #endif // C++23
 
 #if _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED
-- 
2.41.0



[PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]

2023-07-27 Thread Li Xu
From: xuli 

Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the
rounding mode, therefore the intrinsics of these instructions do not have
the parameter for rounding mode control.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of 
vsadd[u] and vssub[u].
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-12.C: Adapt testcase.
* g++.target/riscv/rvv/base/bug-14.C: Ditto.
* g++.target/riscv/rvv/base/bug-18.C: Ditto.
* g++.target/riscv/rvv/base/bug-19.C: Ditto.
* g++.target/riscv/rvv/base/bug-20.C: Ditto.
* g++.target/riscv/rvv/base/bug-21.C: Ditto.
* g++.target/riscv/rvv/base/bug-22.C: Ditto.
* g++.target/riscv/rvv/base/bug-23.C: Ditto.
* g++.target/riscv/rvv/base/bug-3.C: Ditto.
* g++.target/riscv/rvv/base/bug-8.C: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
* gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test.
* gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  6 --
 gcc/config/riscv/vector.md| 42 +++---
 .../g++.target/riscv/rvv/base/bug-12.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-14.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-18.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-19.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-20.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-21.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-22.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-23.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-3.C |  2 +-
 .../g++.target/riscv/rvv/base/bug-8.C |  2 +-
 .../riscv/rvv/base/binop_vx_constraint-100.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-101.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-102.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-103.c  | 28 +++
 .../riscv/rvv/base/binop_vx_constraint-104.c  | 16 ++--
 .../riscv/rvv/base/binop_vx_constraint-105.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-106.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-107.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-108.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-109.c  | 28 +++
 .../riscv/rvv/base/binop_vx_constraint-110.c  | 16 ++--
 .../riscv/rvv/base/binop_vx_constraint-111.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-112.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-113.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-114.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-115.c  | 16 ++--
 .../riscv/rvv/base/binop_vx_constraint-116.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-117.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-118.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-119.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-97.c   | 28 +++
 .../riscv/rvv/base/binop_vx_constraint-98.c   | 16 ++--
 .../riscv/rvv/base/fixed-point-vxrm-error.c   | 24 ++
 .../riscv/rvv/base/fixed-point-vxrm.c | 81 +++
 .../riscv/rvv/base/merge_constraint-1.c   |  4 +-
 37 files changed, 233 insertions(+), 152 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/fixed-point-vxrm.c

diff --git a/gcc/conf

Re: [PATCH] Add -fsarif-time-report [PR109361]

2023-07-27 Thread Richard Biener via Gcc-patches
On Fri, Jul 28, 2023 at 12:23 AM David Malcolm via Gcc-patches
 wrote:
>
> On Tue, 2023-04-11 at 08:43 +, Richard Biener wrote:
> > On Tue, 4 Apr 2023, David Malcolm wrote:
> >
> > > Richi, Jakub: I can probably self-approve this, but it's
> > > technically a
> > > new feature.  OK if I push this to trunk in stage 4?  I believe
> > > it's
> > > low risk, and is very useful for benchmarking -fanalyzer.
> >
> > Please wait for stage1 at this point.  One comment on the patch
> > below ...
> >
> > >
> > > This patch adds support for embeddding profiling information about
> > > the
> > > compiler itself into the SARIF output.
> > >
> > > In an earlier version of this patch I extended -ftime-report so
> > > that
> > > as well as writing to stderr, it would embed the information in any
> > > SARIF output.  This turned out to be awkward to use, in that I
> > > found
> > > myself needing to get the data in JSON form without also having it
> > > emitted on stderr (which was affecting the output of the build).
> > >
> > > Hence this version of the patch adds a new -fsarif-time-report,
> > > similar
> > > to the existing -ftime-report for requesting GCC profile itself
> > > using
> > > the timevar machinery.
> > >
> > > Specifically, if -fsarif-time-report is specified, the timing
> > > information will be captured (as if -ftime-report were specified),
> > > and
> > > will be embedded in JSON form within any SARIF as a
> > > "gcc/timeReport"
> > > property within a property bag of the "invocation" object.
> > >
> > > Here's an example of the output:
> > >
> > >   "invocations": [
> > >   {
> > >   "executionSuccessful": true,
> > >   "toolExecutionNotifications": [],
> > >   "properties": {
> > >   "gcc/timeReport": {
> > >   "timevars": [
> > >   {
> > >   "name": "phase setup",
> > >   "elapsed": {
> > >   "user": 0.04,
> > >   "sys": 0,
> > >   "wall": 0.04,
> > >   "ggc_mem": 1863472
> > >   }
> > >   },
> > >
> > >   [...snip...]
> > >
> > >   {
> > >   "name": "analyzer: processing worklist",
> > >   "elapsed": {
> > >   "user": 0.06,
> > >   "sys": 0,
> > >   "wall": 0.06,
> > >   "ggc_mem": 48
> > >   }
> > >   },
> > >   {
> > >   "name": "analyzer: emitting diagnostics",
> > >   "elapsed": {
> > >   "user": 0.01,
> > >   "sys": 0,
> > >   "wall": 0.01,
> > >   "ggc_mem": 0
> > >   }
> > >   },
> > >   {
> > >   "name": "TOTAL",
> > >   "elapsed": {
> > >   "user": 0.21,
> > >   "sys": 0.03,
> > >   "wall": 0.24,
> > >   "ggc_mem": 3368736
> > >   }
> > >   }
> > >   ],
> > >   "CHECKING_P": true,
> > >   "flag_checking": true
> > >   }
> > >   }
> > >   }
> > >   ]
> > >
> > > I have successfully used this in my analyzer integration tests to
> > > get
> > > timing information about which source files get slowed down by the
> > > analyzer.  I've validated the generated .sarif files against the
> > > SARIF
> > > schema.
> > >
> > > The documentation notes that the precise output format is subject
> > > to change.
> > >
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > >
> > > gcc/ChangeLog:
> > > PR analyzer/109361
> > > * common.opt (fsarif-time-report): New option.
> >
> > 'sarif' is currently used only with -fdiagnostics-format= it seems.
> > We already have
> >
> > ftime-report
> > Common Var(time_report)
> > Report the time taken by each compiler pass.
> >
> > ftime-report-details
> > Common Var(time_report_details)
> > Record times taken by sub-phases separately.
> >
> > so -fsarif-time-report is not a) -ftime-report-sarif and b) it's
> > unclear if it applies to -ftime-report or to both -ftime-report
> > and -ftime-report-details?  (note -ftime-report-details needs
> > -ftime-report to be effective)
> >
> > I'd rather have a -ftime-report-format= (or -freport-format in
> > case we want to cover -fmem-report, -fmem-report-wpa,
> > -fpre-ipa-mem-report and -fpost-ipa-mem-report as well?)
> >
> > ISTR there's a summer of code project in thi

Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-27 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 8:19 PM Drew Ross  wrote:
>
> Here is what I came up with for combining the two:
>
> /* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
>unsigned x OR truncate into the precision(type) - c lowest bits
>of signed x (if they have mode precision or a precision of 1)  */
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
>   (if (TYPE_UNSIGNED (type))
>(bit_and @0 (rshift { build_minus_one_cst (type); } @1))
>(if (INTEGRAL_TYPE_P (type))
> (with {
>   int width = element_precision (type) - tree_to_uhwi (@1);
>   tree stype = build_nonstandard_integer_type (width, 0);
>  }
>  (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>   (convert (convert:stype @0
>
> Let me know what you think.

Looks good to me.

Thanks,
Richard.

> > Btw, I wonder whether we can handle
> > some cases of widening/truncating converts between the shifts?
>
> I will look into this.
>
> Drew
>
> On Wed, Jul 26, 2023 at 4:40 AM Richard Biener  
> wrote:
>>
>> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross  wrote:
>> >
>> > > With that fixed I think for non-vector integrals the above is the most 
>> > > suitable
>> > > canonical form of a sign-extension.  Note it should also work for any 
>> > > other
>> > > constant shift amount - just use the appropriate intermediate precision 
>> > > for
>> > > the truncating type.
>> > > We _might_ want
>> > > to consider to only use the converts when the intermediate type has
>> > > mode precision (and as a special case allow one bit as in your above 
>> > > case)
>> > > so it can expand to (sign_extend: (subreg: reg)).
>> >
>> > Here is a pattern that that only matches to truncations that result in 
>> > mode precision (or precision of 1):
>> >
>> > (simplify
>> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >  (if (INTEGRAL_TYPE_P (type)
>> >   && !TYPE_UNSIGNED (type)
>> >   && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN 
>> > (TREE_TYPE (@1
>> >   (with {
>> > int width = element_precision (type) - tree_to_uhwi (@1);
>> > tree stype = build_nonstandard_integer_type (width, 0);
>> >}
>> >(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
>> > (convert (convert:stype @0))
>> >
>> > Look ok?
>>
>> I suppose so.  Can you see to amend the existing
>>
>> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>>types.  */
>> (simplify
>>  (rshift (lshift @0 INTEGER_CST@1) @1)
>>  (if (TYPE_UNSIGNED (type)
>>   && (wi::ltu_p (wi::to_wide (@1), element_precision (type
>>   (bit_and @0 (rshift { build_minus_one_cst (type); } @1
>>
>> pattern?  You will get a duplicate pattern diagnostic otherwise.  It
>> also looks like this
>> one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
>> some cases of widening/truncating converts between the shifts?
>>
>> Richard.
>>
>> > > You might also want to verify what RTL expansion
>> > > produces before/after - it at least shouldn't be worse.
>> >
>> > The RTL is slightly better for the mode precision cases and slightly worse 
>> > for the precision 1 case.
>> >
>> > > That said - do you have any testcase where the canonicalization is an 
>> > > enabler
>> > > for further transforms or was this requested stand-alone?
>> >
>> > No, I don't have any specific test cases. This patch is just in response 
>> > to pr101955.
>> >
>> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener 
>> >  wrote:
>> >>
>> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek  wrote:
>> >> >
>> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches 
>> >> > wrote:
>> >> > > So would something like
>> >> > >
>> >> > > (simplify
>> >> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> >> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> >> > >  (if (INTEGRAL_TYPE_P (type)
>> >> > >   && !TYPE_UNSIGNED (type)
>> >> > >   && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> >> > >   (convert (convert:stype @0)
>> >> > >
>> >> > > work?
>> >> >
>> >> > Certainly swap the if and with and the (with then should be indented by 
>> >> > 1
>> >> > column to the right of (if and (convert one further (the reason for the
>> >> > swapping is not to call build_nonstandard_integer_type when it will not 
>> >> > be
>> >> > needed, which will be probably far more often then an actual match).
>> >>
>> >> With that fixed I think for non-vector integrals the above is the most 
>> >> suitable
>> >> canonical form of a sign-extension.  Note it should also work for any 
>> >> other
>> >> constant shift amount - just use the appropriate intermediate precision 
>> >> for
>> >> the truncating type.  You might also want to verify what RTL expansion
>> >> produces before/after - it at least shouldn't be worse.  We _might_ want
>> >> to consider

Re: [PATCH] Fix 100864: `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-07-27 Thread Andrew Pinski via Gcc-patches
On Sun, Jul 23, 2023 at 1:39 AM Richard Biener via Gcc-patches
 wrote:
>
>
>
> > Am 23.07.2023 um 01:27 schrieb Andrew Pinski via Gcc-patches 
> > :
> >
> > This adds a special case of the `(a&~b) | b` pattern where
> > `b` and `~b` are comparisons.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Don’t we have an existing match for inversion s we could amend?

We don't currently but I might be able to pattern the function off of
what was similarly done for bitwise_equal_p .
I noticed the patch which added bitwise_equal_p even could benefit
from this similar thing.

Thanks,
Andrew

>
> > gcc/ChangeLog:
> >
> >PR tree-optimization/100864
> >* match.pd ((~x & y) | x -> x | y): Add comparison variant.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.dg/tree-ssa/bitops-3.c: New test.
> > ---
> > gcc/match.pd | 17 +-
> > gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 
> > 2 files changed, 83 insertions(+), 1 deletion(-)
> > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index bfd15d6cd4a..dd4a2df537d 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1928,7 +1928,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* (~x & y) | x -> x | y */
> >  (simplify
> >   (bitop:c (rbitop:c (bit_not @0) @1) @0)
> > -  (bitop @0 @1)))
> > +  (bitop @0 @1))
> > + /* Similar but for comparisons which have been inverted already,
> > +Note it is hard to simulate the inverted tcc_comparison due
> > +NaNs; That is == and != are sometimes inversions and sometimes not.
> > +So a double for loop is needed and then compare the inverse code
> > +with the result of invert_tree_comparison is needed.
> > +This works fine for vector compares as -1 and 0 are bitwise
> > +inverses.  */
> > + (for cmp (tcc_comparison)
> > +  (for icmp (tcc_comparison)
> > +   (simplify
> > +(bitop:c (rbitop:c (icmp @0 @1) @2) (cmp@3 @0 @1))
> > + (with { enum tree_code ic = invert_tree_comparison
> > + (cmp, HONOR_NANS (@0)); }
> > +  (if (ic == icmp)
> > +   (bitop @3 @2)))
> >
> > /* ((x | y) & z) | x -> (z & y) | x */
> > (simplify
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > new file mode 100644
> > index 000..68fff4edce9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > @@ -0,0 +1,67 @@
> > +/* PR tree-optimization/100864 */
> > +
> > +/* { dg-do run } */
> > +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> > +
> > +#define op_ne !=
> > +#define op_eq ==
> > +#define op_lt <
> > +#define op_le <=
> > +#define op_gt >
> > +#define op_ge >=
> > +
> > +#define operators(t) \
> > +t(ne) \
> > +t(eq) \
> > +t(lt) \
> > +t(le) \
> > +t(gt) \
> > +t(ge)
> > +
> > +#define cmpfunc(v, op) \
> > +__attribute__((noipa)) \
> > +_Bool func_##op##_##v(v int a, v int b, v _Bool e) \
> > +{ \
> > +  v _Bool c = (a op_##op b); \
> > +  v _Bool d = !c; \
> > +  return (e & d) | c; \
> > +}
> > +
> > +#define cmp_funcs(op) \
> > +cmpfunc(, op) \
> > +cmpfunc(volatile , op)
> > +
> > +operators(cmp_funcs)
> > +
> > +#define test(op) \
> > +if (func_##op##_ (a, b, e) != func_##op##_volatile (a, b, e)) \
> > + __builtin_abort();
> > +
> > +int main()
> > +{
> > +  for(int a = -3; a <= 3; a++)
> > +for(int b = -3; b <= 3; b++)
> > +  {
> > +_Bool e = 0;
> > +operators(test)
> > +e = 1;
> > +operators(test)
> > +  }
> > +  return 0;
> > +}
> > +
> > +/* Check to make sure we optimize `(a&!b) | b` -> `a | b`. */
> > +/* There are 6 different comparison operators testing here. */
> > +/* bit_not_expr and bit_and_expr should show up for each one (volatile). */
> > +/* Each operator should show up twice
> > +   (except for `!=` which shows up 2*6 (each tester) + 2 (the 2 loops) 
> > extra = 16). */
> > +/* bit_ior_expr will show up for each operator twice (non-volatile and 
> > volatile). */
> > +/* { dg-final { scan-tree-dump-times "ne_expr,"  16 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "eq_expr,"   2 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "lt_expr,"   2 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "le_expr,"   2 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "gt_expr,"   2 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "ge_expr,"   2 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "bit_not_expr,"  6 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "bit_and_expr,"  6 "optimized"} } */
> > +/* { dg-final { scan-tree-dump-times "bit_ior_expr," 12 "optimized"} } */
> > \ No newline at end of file
> > --
> > 2.31.1
> >


Re: Re: [PATCH 0/5] Recognize Zicond extension

2023-07-27 Thread Xiao Zeng
On Thu, Jul 27, 2023 at 10:43:00 PM  Jeff Law  wrote:
>
>
>
>On 7/27/23 02:43, Xiao Zeng wrote:
>
>>
>> 2. According to your opinions, I have modified the code, but out of caution
>> for upstream, I conducted a complete regression tests on patch V2, which took
>> some time. I was unable to reply to emails and upload patch V2 in a timely 
>> manner.
>Sorry to have wasted your time 

It's okay
I am very willing to accept opinions from the gcc community.

>-- zicond/xventanacondops has lingered
>for quite a while and I had a bit of free time yesterday.  I felt it was
>most useful to try and move this stuff forward.
>
>
>
>>
>> 3 After you and other maintainers made minor modifications to my patch[1/5]
>> and patch[2/5], it has been merged into the master, so I will no longer 
>> upload patch V2.
>Agreed.
>
>>
>> 4 patch[1/5] and patch[2/5], which have been merged into the master, have 
>> only
>> completed basic support for Zicond, and further optimization work needs to be
>> completed. These further optimization reactions are reflected in my 
>> patch[3/5]
>> patch[4/5] and patch[5/5].
>Agreed.
>
>>
>> 5 As you mentioned in your previous email 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625427.html
>> "eswincomputing and ventana can both reduce our divergence from the trunk
>> and work together on the rest of the bits...". I will reorganize patch[3/5] 
>> patch[4/5]
>> and patch[5/5], provide more detailed explanations, and submit them as an 
>> alternative
>> solution for further optimization of Zicond.
>>
>> Does that work for you?
>I'm going to look at 3/5 today pretty closely.  Exposing zicond to
>movcc is something we had implemented inside Ventana and I want to
>compare/contrast your work with ours. 

What a coincidence!

>
>What I like about yours is it keeps all the logic in riscv.cc rather
>than scattering it across riscv.cc and riscv.md.  

Yes, when I use enough test cases, I cannot find a concise way to optimize
all test cases. When I enumerated all possible cases in the movcc
function of the RISC-V backend, I found a method that satisfied me, which
is the method in patch [3/5].

>What I like about the
>internal Ventana bits is its ability to support arbitrary comparisons by
>utilizing sCC if the original is not an eq/ne comparison.
> 

If it's just for the Zicond instruction set, is it necessary to make judgments
outside of eq/ne? After all, it does not support comparison actions other
than eq/ne. Of course, it is also possible to use a special technique to use
Zicond in non eq/ne comparisons.

>Ideally we'll be able to get the best of both. 

Of course, it is best to unify all situations in one framework.

>
>Jeff

Now that the code on the master has preliminary support for
Zicond, I will still submit the optimization patches for Zicond to
the community for the convenience of finding the ideal method.

Thanks
Xiao Zeng

loop-split improvements, part 1

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
while looking on profile misupdate on hmmer I noticed that loop splitting pass 
is not
able to handle the loop it has as an example it should apply on:

   One transformation of loops like:

   for (i = 0; i < 100; i++)
 {
   if (i < 50)
 A;
   else
 B;
 }

   into:

   for (i = 0; i < 50; i++)
 {
   A;
 }
   for (; i < 100; i++)
 {
   B;
 }

The problem is that ivcanon turns the test into i != 100 and the pass
explicitly gives up on any loops ending with != test.  It needs to know
the directoin of the induction variable in order to derive right conditions,
but that can be done also from step.

It turns out that there are no testcases for basic loop splitting.  I will add
some with the profile update fix.

There are other issues, like VRP will turn i < 99 into i == 99 based on
value range which also makes the pass to give up.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-split.cc (split_loop): Also support NE driven
loops when IV test is not overflowing.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ifc-12.c: Disable loop splitting.
* gcc.target/i386/avx2-gather-6.c: Likewise.
* gcc.target/i386/avx2-vect-aggressive.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
index 9468c070489..bedf29c7dbc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-ifcvt-stats-blocks-details" } */
+/* { dg-options "-Ofast -fdump-tree-ifcvt-stats-blocks-details 
-fno-split-loops" } */
 /* { dg-require-visibility "" } */
 
 struct st
diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c 
b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
index b9119581ae2..47a95dbe989 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details 
-mtune=skylake" } */
+/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details 
-mtune=skylake -fno-split-loops" } */
 
 #include "avx2-gather-5.c"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c 
b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
index 57192791857..fa336e70e84 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target avx2 } */
-/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details 
-fdisable-tree-thread1" } */
+/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details 
-fdisable-tree-thread1 -fno-split-loops" } */
 
 #include "avx2-check.h"
 #define N 64
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index b41b5e614c2..27780370d85 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -540,10 +545,17 @@ split_loop (class loop *loop1)
   || !empty_block_p (loop1->latch)
   || !easy_exit_values (loop1)
   || !number_of_iterations_exit (loop1, exit1, &niter, false, true)
-  || niter.cmp == ERROR_MARK
-  /* We can't yet handle loops controlled by a != predicate.  */
-  || niter.cmp == NE_EXPR)
+  || niter.cmp == ERROR_MARK)
 return false;
+  if (niter.cmp == NE_EXPR)
+{
+  if (!niter.control.no_overflow)
+   return false;
+  if (tree_int_cst_sign_bit (niter.control.step) > 0)
+   niter.cmp = GT_EXPR;
+  else
+   niter.cmp = LT_EXPR;
+}
 
   bbs = get_loop_body (loop1);
 


Re: [PATCH] Fix 100864: `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-07-27 Thread Richard Biener via Gcc-patches
On Fri, Jul 28, 2023 at 8:34 AM Andrew Pinski  wrote:
>
> On Sun, Jul 23, 2023 at 1:39 AM Richard Biener via Gcc-patches
>  wrote:
> >
> >
> >
> > > Am 23.07.2023 um 01:27 schrieb Andrew Pinski via Gcc-patches 
> > > :
> > >
> > > This adds a special case of the `(a&~b) | b` pattern where
> > > `b` and `~b` are comparisons.
> > >
> > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > Don’t we have an existing match for inversion s we could amend?
>
> We don't currently but I might be able to pattern the function off of
> what was similarly done for bitwise_equal_p .
> I noticed the patch which added bitwise_equal_p even could benefit
> from this similar thing.

OK, I thought of logical_inverted_value but that isn't a 1:1 match here.

Richard.

> Thanks,
> Andrew
>
> >
> > > gcc/ChangeLog:
> > >
> > >PR tree-optimization/100864
> > >* match.pd ((~x & y) | x -> x | y): Add comparison variant.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >* gcc.dg/tree-ssa/bitops-3.c: New test.
> > > ---
> > > gcc/match.pd | 17 +-
> > > gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 
> > > 2 files changed, 83 insertions(+), 1 deletion(-)
> > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index bfd15d6cd4a..dd4a2df537d 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1928,7 +1928,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  /* (~x & y) | x -> x | y */
> > >  (simplify
> > >   (bitop:c (rbitop:c (bit_not @0) @1) @0)
> > > -  (bitop @0 @1)))
> > > +  (bitop @0 @1))
> > > + /* Similar but for comparisons which have been inverted already,
> > > +Note it is hard to simulate the inverted tcc_comparison due
> > > +NaNs; That is == and != are sometimes inversions and sometimes not.
> > > +So a double for loop is needed and then compare the inverse code
> > > +with the result of invert_tree_comparison is needed.
> > > +This works fine for vector compares as -1 and 0 are bitwise
> > > +inverses.  */
> > > + (for cmp (tcc_comparison)
> > > +  (for icmp (tcc_comparison)
> > > +   (simplify
> > > +(bitop:c (rbitop:c (icmp @0 @1) @2) (cmp@3 @0 @1))
> > > + (with { enum tree_code ic = invert_tree_comparison
> > > + (cmp, HONOR_NANS (@0)); }
> > > +  (if (ic == icmp)
> > > +   (bitop @3 @2)))
> > >
> > > /* ((x | y) & z) | x -> (z & y) | x */
> > > (simplify
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > > new file mode 100644
> > > index 000..68fff4edce9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > > @@ -0,0 +1,67 @@
> > > +/* PR tree-optimization/100864 */
> > > +
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> > > +
> > > +#define op_ne !=
> > > +#define op_eq ==
> > > +#define op_lt <
> > > +#define op_le <=
> > > +#define op_gt >
> > > +#define op_ge >=
> > > +
> > > +#define operators(t) \
> > > +t(ne) \
> > > +t(eq) \
> > > +t(lt) \
> > > +t(le) \
> > > +t(gt) \
> > > +t(ge)
> > > +
> > > +#define cmpfunc(v, op) \
> > > +__attribute__((noipa)) \
> > > +_Bool func_##op##_##v(v int a, v int b, v _Bool e) \
> > > +{ \
> > > +  v _Bool c = (a op_##op b); \
> > > +  v _Bool d = !c; \
> > > +  return (e & d) | c; \
> > > +}
> > > +
> > > +#define cmp_funcs(op) \
> > > +cmpfunc(, op) \
> > > +cmpfunc(volatile , op)
> > > +
> > > +operators(cmp_funcs)
> > > +
> > > +#define test(op) \
> > > +if (func_##op##_ (a, b, e) != func_##op##_volatile (a, b, e)) \
> > > + __builtin_abort();
> > > +
> > > +int main()
> > > +{
> > > +  for(int a = -3; a <= 3; a++)
> > > +for(int b = -3; b <= 3; b++)
> > > +  {
> > > +_Bool e = 0;
> > > +operators(test)
> > > +e = 1;
> > > +operators(test)
> > > +  }
> > > +  return 0;
> > > +}
> > > +
> > > +/* Check to make sure we optimize `(a&!b) | b` -> `a | b`. */
> > > +/* There are 6 different comparison operators testing here. */
> > > +/* bit_not_expr and bit_and_expr should show up for each one (volatile). 
> > > */
> > > +/* Each operator should show up twice
> > > +   (except for `!=` which shows up 2*6 (each tester) + 2 (the 2 loops) 
> > > extra = 16). */
> > > +/* bit_ior_expr will show up for each operator twice (non-volatile and 
> > > volatile). */
> > > +/* { dg-final { scan-tree-dump-times "ne_expr,"  16 "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "eq_expr,"   2 "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "lt_expr,"   2 "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "le_expr,"   2 "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "gt_expr,"   2 "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "ge_expr,"   2 "optimized"} } */
> > > +/* { dg-final { scan-tree-dump-times "bit_not_expr,"  

Re: loop-split improvements, part 1

2023-07-27 Thread Richard Biener via Gcc-patches
On Fri, Jul 28, 2023 at 8:38 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> while looking on profile misupdate on hmmer I noticed that loop splitting 
> pass is not
> able to handle the loop it has as an example it should apply on:
>
>One transformation of loops like:
>
>for (i = 0; i < 100; i++)
>  {
>if (i < 50)
>  A;
>else
>  B;
>  }
>
>into:
>
>for (i = 0; i < 50; i++)
>  {
>A;
>  }
>for (; i < 100; i++)
>  {
>B;
>  }
>
> The problem is that ivcanon turns the test into i != 100 and the pass
> explicitly gives up on any loops ending with != test.  It needs to know
> the directoin of the induction variable in order to derive right conditions,
> but that can be done also from step.
>
> It turns out that there are no testcases for basic loop splitting.  I will add
> some with the profile update fix.
>
> There are other issues, like VRP will turn i < 99 into i == 99 based on
> value range which also makes the pass to give up.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK.

> gcc/ChangeLog:
>
> * tree-ssa-loop-split.cc (split_loop): Also support NE driven
> loops when IV test is not overflowing.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ifc-12.c: Disable loop splitting.
> * gcc.target/i386/avx2-gather-6.c: Likewise.
> * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> index 9468c070489..bedf29c7dbc 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-ifcvt-stats-blocks-details" } */
> +/* { dg-options "-Ofast -fdump-tree-ifcvt-stats-blocks-details 
> -fno-split-loops" } */
>  /* { dg-require-visibility "" } */
>
>  struct st
> diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c 
> b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
> index b9119581ae2..47a95dbe989 100644
> --- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
> +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details 
> -mtune=skylake" } */
> +/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details 
> -mtune=skylake -fno-split-loops" } */
>
>  #include "avx2-gather-5.c"
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c 
> b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
> index 57192791857..fa336e70e84 100644
> --- a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
> +++ b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx2 } */
> -/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details 
> -fdisable-tree-thread1" } */
> +/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details 
> -fdisable-tree-thread1 -fno-split-loops" } */
>
>  #include "avx2-check.h"
>  #define N 64
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index b41b5e614c2..27780370d85 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -540,10 +545,17 @@ split_loop (class loop *loop1)
>|| !empty_block_p (loop1->latch)
>|| !easy_exit_values (loop1)
>|| !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> -  || niter.cmp == ERROR_MARK
> -  /* We can't yet handle loops controlled by a != predicate.  */
> -  || niter.cmp == NE_EXPR)
> +  || niter.cmp == ERROR_MARK)
>  return false;
> +  if (niter.cmp == NE_EXPR)
> +{
> +  if (!niter.control.no_overflow)
> +   return false;
> +  if (tree_int_cst_sign_bit (niter.control.step) > 0)
> +   niter.cmp = GT_EXPR;
> +  else
> +   niter.cmp = LT_EXPR;
> +}
>
>bbs = get_loop_body (loop1);
>


Re: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]

2023-07-27 Thread juzhe.zh...@rivai.ai
Thanks for fixing it.
LGTM from my side.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-07-28 13:52
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
Subject: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]
From: xuli 
 
Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the
rounding mode, therefore the intrinsics of these instructions do not have
the parameter for rounding mode control.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of 
vsadd[u] and vssub[u].
* config/riscv/vector.md: Ditto.
 
gcc/testsuite/ChangeLog:
 
* g++.target/riscv/rvv/base/bug-12.C: Adapt testcase.
* g++.target/riscv/rvv/base/bug-14.C: Ditto.
* g++.target/riscv/rvv/base/bug-18.C: Ditto.
* g++.target/riscv/rvv/base/bug-19.C: Ditto.
* g++.target/riscv/rvv/base/bug-20.C: Ditto.
* g++.target/riscv/rvv/base/bug-21.C: Ditto.
* g++.target/riscv/rvv/base/bug-22.C: Ditto.
* g++.target/riscv/rvv/base/bug-23.C: Ditto.
* g++.target/riscv/rvv/base/bug-3.C: Ditto.
* g++.target/riscv/rvv/base/bug-8.C: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
* gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test.
* gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  6 --
gcc/config/riscv/vector.md| 42 +++---
.../g++.target/riscv/rvv/base/bug-12.C|  2 +-
.../g++.target/riscv/rvv/base/bug-14.C|  2 +-
.../g++.target/riscv/rvv/base/bug-18.C|  2 +-
.../g++.target/riscv/rvv/base/bug-19.C|  2 +-
.../g++.target/riscv/rvv/base/bug-20.C|  2 +-
.../g++.target/riscv/rvv/base/bug-21.C|  2 +-
.../g++.target/riscv/rvv/base/bug-22.C|  2 +-
.../g++.target/riscv/rvv/base/bug-23.C|  2 +-
.../g++.target/riscv/rvv/base/bug-3.C |  2 +-
.../g++.target/riscv/rvv/base/bug-8.C |  2 +-
.../riscv/rvv/base/binop_vx_constraint-100.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-101.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-102.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-103.c  | 28 +++
.../riscv/rvv/base/binop_vx_constraint-104.c  | 16 ++--
.../riscv/rvv/base/binop_vx_constraint-105.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-106.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-107.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-108.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-109.c  | 28 +++
.../riscv/rvv/base/binop_vx_constraint-110.c  | 16 ++--
.../riscv/rvv/base/binop_vx_constraint-111.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-112.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-113.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-114.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-115.c  | 16 ++--
.../riscv/rvv/base/binop_vx_constraint-116.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-117.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-118.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-119.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-97.c   | 28 +++
.../riscv/rvv/base/binop_vx_constraint-98.c   | 16 ++--
.../riscv/rvv/base/fixed-point-vxrm-error.c   | 24 ++
.../riscv/rvv/base/fixed-point-vxrm.c | 81 +++
.../riscv/rvv/base/merge_constraint-1.c   |  4 +-
37 files changed, 233 insertions

Re: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]

2023-07-27 Thread Kito Cheng via Gcc-patches
I didn't checked with rvv intrinsic spec, but I assume this is found during
test with api test, so LGTM, thanks for fixing this:)

juzhe.zh...@rivai.ai  於 2023年7月28日 週五 14:43 寫道:

> Thanks for fixing it.
> LGTM from my side.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Li Xu
> Date: 2023-07-28 13:52
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
> Subject: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]
> From: xuli 
>
> Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the
> rounding mode, therefore the intrinsics of these instructions do not have
> the parameter for rounding mode control.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: remove rounding
> mode of vsadd[u] and vssub[u].
> * config/riscv/vector.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/bug-12.C: Adapt testcase.
> * g++.target/riscv/rvv/base/bug-14.C: Ditto.
> * g++.target/riscv/rvv/base/bug-18.C: Ditto.
> * g++.target/riscv/rvv/base/bug-19.C: Ditto.
> * g++.target/riscv/rvv/base/bug-20.C: Ditto.
> * g++.target/riscv/rvv/base/bug-21.C: Ditto.
> * g++.target/riscv/rvv/base/bug-22.C: Ditto.
> * g++.target/riscv/rvv/base/bug-23.C: Ditto.
> * g++.target/riscv/rvv/base/bug-3.C: Ditto.
> * g++.target/riscv/rvv/base/bug-8.C: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
> * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
> * gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test.
> * gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
> ---
> .../riscv/riscv-vector-builtins-bases.cc  |  6 --
> gcc/config/riscv/vector.md| 42 +++---
> .../g++.target/riscv/rvv/base/bug-12.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-14.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-18.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-19.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-20.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-21.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-22.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-23.C|  2 +-
> .../g++.target/riscv/rvv/base/bug-3.C |  2 +-
> .../g++.target/riscv/rvv/base/bug-8.C |  2 +-
> .../riscv/rvv/base/binop_vx_constraint-100.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-101.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-102.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-103.c  | 28 +++
> .../riscv/rvv/base/binop_vx_constraint-104.c  | 16 ++--
> .../riscv/rvv/base/binop_vx_constraint-105.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-106.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-107.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-108.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-109.c  | 28 +++
> .../riscv/rvv/base/binop_vx_constraint-110.c  | 16 ++--
> .../riscv/rvv/base/binop_vx_constraint-111.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-112.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-113.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-114.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-115.c  | 16 ++--
> .../riscv/rvv/base/binop_vx_constraint-116.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-117.c  |  4 +-
> .../riscv/rvv/base/binop_vx_constraint-118.c  |  4 +-
> .../riscv/rvv/base/binop_vx_cons

Re: loop-split improvements, part 1

2023-07-27 Thread Andrew Pinski via Gcc-patches
On Thu, Jul 27, 2023 at 11:38 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> while looking on profile misupdate on hmmer I noticed that loop splitting 
> pass is not
> able to handle the loop it has as an example it should apply on:
>
>One transformation of loops like:
>
>for (i = 0; i < 100; i++)
>  {
>if (i < 50)
>  A;
>else
>  B;
>  }
>
>into:
>
>for (i = 0; i < 50; i++)
>  {
>A;
>  }
>for (; i < 100; i++)
>  {
>B;
>  }
>
> The problem is that ivcanon turns the test into i != 100 and the pass
> explicitly gives up on any loops ending with != test.  It needs to know
> the directoin of the induction variable in order to derive right conditions,
> but that can be done also from step.
>
> It turns out that there are no testcases for basic loop splitting.  I will add
> some with the profile update fix.

Thanks for doing this.
Here are bug reports you should look at after all of your loop
splitting patches are done:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37239
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77689

Thanks,
Andrew

>
> There are other issues, like VRP will turn i < 99 into i == 99 based on
> value range which also makes the pass to give up.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-split.cc (split_loop): Also support NE driven
> loops when IV test is not overflowing.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ifc-12.c: Disable loop splitting.
> * gcc.target/i386/avx2-gather-6.c: Likewise.
> * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> index 9468c070489..bedf29c7dbc 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Ofast -fdump-tree-ifcvt-stats-blocks-details" } */
> +/* { dg-options "-Ofast -fdump-tree-ifcvt-stats-blocks-details 
> -fno-split-loops" } */
>  /* { dg-require-visibility "" } */
>
>  struct st
> diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c 
> b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
> index b9119581ae2..47a95dbe989 100644
> --- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
> +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details 
> -mtune=skylake" } */
> +/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details 
> -mtune=skylake -fno-split-loops" } */
>
>  #include "avx2-gather-5.c"
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c 
> b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
> index 57192791857..fa336e70e84 100644
> --- a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
> +++ b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
> @@ -1,6 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-require-effective-target avx2 } */
> -/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details 
> -fdisable-tree-thread1" } */
> +/* { dg-options "-mavx2 -O3 -fopenmp-simd -fdump-tree-vect-details 
> -fdisable-tree-thread1 -fno-split-loops" } */
>
>  #include "avx2-check.h"
>  #define N 64
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index b41b5e614c2..27780370d85 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -540,10 +545,17 @@ split_loop (class loop *loop1)
>|| !empty_block_p (loop1->latch)
>|| !easy_exit_values (loop1)
>|| !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> -  || niter.cmp == ERROR_MARK
> -  /* We can't yet handle loops controlled by a != predicate.  */
> -  || niter.cmp == NE_EXPR)
> +  || niter.cmp == ERROR_MARK)
>  return false;
> +  if (niter.cmp == NE_EXPR)
> +{
> +  if (!niter.control.no_overflow)
> +   return false;
> +  if (tree_int_cst_sign_bit (niter.control.step) > 0)
> +   niter.cmp = GT_EXPR;
> +  else
> +   niter.cmp = LT_EXPR;
> +}
>
>bbs = get_loop_body (loop1);
>