from:"Jeff Law via Gcc\-patches"

Re: RISC-V sign extension query

2023-09-18 Thread Jeff Law via Gcc-patches





On 9/18/23 13:45, Vineet Gupta wrote:



For the cases which do require sign extends, but not being eliminated 
due to "missing definition(s)" I'm working on adapting Ajit's REE ABI 
interfaces work [2] to work for RISC-V as well.
I wonder if we could walk the DECL_ARGUMENTS for current_function_decl 
and create sign extensions for integer arguments smaller than XLEN at 
the start of the function.  Record them in a list.


Then we just let the rest of REE do its thing.  When REE is done we go 
back and delete those sign extensions we created.


Assuming that works, then we just need to conditionalize the code on ABI 
properties.




Jeff

Re: RISC-V sign extension query

2023-09-18 Thread Jeff Law via Gcc-patches





On 9/18/23 13:45, Vineet Gupta wrote:

Hi Jeff, Andrew

I've been looking into redundant sign extension and while there are 
things to be improved in REE, there's something I wanted to confirm 
before heading off into the weeds.


Consider the test below:

int foo(int unused, int n, unsigned y, unsigned delta){
   int s = 0;
   unsigned int x = 0;    // if int, sext elided
   for (;xI believe the SEXT.W is not semantically needed as a1 is supposed to be 
sign extended already at call site as per psABI [1]. I quote


     "When passed in registers or on the stack, integer scalars narrower 
than XLEN bits are widened according to the sign of their type up to 32 
bits, then sign-extended to XLEN bits"
That's my understanding.  We can (and should) assume that a sub-word 
argument has been properly sign extended to XLEN.




However currently RISC-V backend thinks otherwise: changing @x to int, 
causes the the sign extend to go away. I think both the cases should 
behave the same (and not generate SEXT.w) given the ABI clause above. 
Note that this manifests in initial RTL expand itself 
generating/or-not-generating the sign_extend so if it is unnecessary we 
can avoid late fixups in REE.
So for parameters I think there are knobs that we can set in the target 
files to indicate they're extended/promoted.  TARGET_PROMOTE_PROTOTYPES 
would be a good search term I think.  I don't think it's a heavily used 
feature, so we may need to beat on it a little to get it to do what we want.




Jeff

Re: [PATCH v1] RISC-V: Bugfix for scalar move with merged operand

2023-09-18 Thread Jeff Law via Gcc-patches





On 9/18/23 04:00, Robin Dapp wrote:

I must be missing something.  Doesn't insn 10 broadcast the immediate
0x2 to both elements of r142?!?  What am I missing?

It is indeed a bit misleading.  The difference is in the mask which
is not displayed in the short form.  So we actually use a vec_dup
for a single-element move, essentially a masked vec_dup where only
one element is masked in.

Ah :-)



The problem was that the original doesn't use a merging "vec_set"
but a "destructive" one where the other elements get ignored.

The fix is OK IMHO.

Agreed.

jeff

Re: [PATCH] tree-optimization/111294 - backwards threader PHI costing

2023-09-17 Thread Jeff Law via Gcc-patches





On 9/14/23 07:23, Richard Biener via Gcc-patches wrote:

This revives an earlier patch since the problematic code applying
extra costs to PHIs in copied blocks we couldn't make any sense of
prevents a required threading in this case.  Instead of coming up
with an artificial other costing the following simply removes the
bits.

As with all threading changes this requires a plethora of testsuite
adjustments, but only the last three are unfortunate as is the
libgomp team.c adjustment which is required to avoid a bogus -Werror
diagnostic during bootstrap.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any objections?

Thanks,
Richard.

PR tree-optimization/111294
gcc/
* tree-ssa-threadbackward.cc (back_threader_profitability::m_name):
Remove
(back_threader::find_paths_to_names): Adjust.
(back_threader::maybe_thread_block): Likewise.
(back_threader_profitability::possibly_profitable_path_p): Remove
code applying extra costs to copies PHIs.

libgomp/
* team.c (gomp_team_start): Guard gomp_alloca to avoid false
positive alloc-size diagnostic.

gcc/testsuite/
* gcc.dg/tree-ssa/pr111294.c: New test.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/tree-ssa/pr61839_2.c: Likewise.
* gcc.dg/tree-ssa/ssa-sink-18.c: Likewise.
* g++.dg/warn/Wstringop-overflow-4.C: XFAIL subtest on ilp32.
* gcc.dg/uninit-pred-9_b.c: XFAIL subtest everywhere.
* gcc.dg/vect/vect-117.c: Make scan for not Invalid sum
conditional on lp64.
No objections. IIRC this was all to deal with a code explosion problem 
seen on sparc.  As long as tree-ssa/pr69196-1.c hasn't gone crazy we're OK.


jeff

Re: [RFC PATCH 2/2] RISC-V: Update testsuite for type-changed builtins

2023-09-17 Thread Jeff Law via Gcc-patches





On 9/6/23 20:17, Tsukasa OI wrote:

From: Tsukasa OI 

This commit replaces the type of the builtin used in the testsuite.

Even without this commit, it won't cause any test failures but changed so
that no confusion occurs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: Make signed type to unsigned.
* gcc.target/riscv/zbc64.c: Ditto.
* gcc.target/riscv/zbkb32.c: Ditto.
* gcc.target/riscv/zbkb64.c: Ditto.
* gcc.target/riscv/zbkc32.c: Ditto.
* gcc.target/riscv/zbkc64.c: Ditto.
* gcc.target/riscv/zbkx32.c: Ditto.
* gcc.target/riscv/zbkx64.c: Ditto.
* gcc.target/riscv/zknd32.c: Ditto.
* gcc.target/riscv/zknd64.c: Ditto.
* gcc.target/riscv/zkne32.c: Ditto.
* gcc.target/riscv/zkne64.c: Ditto.
* gcc.target/riscv/zknh-sha256.c: Ditto.
* gcc.target/riscv/zknh-sha512-32.c: Ditto.
* gcc.target/riscv/zknh-sha512-64.c: Ditto.
* gcc.target/riscv/zksed32.c: Ditto.
* gcc.target/riscv/zksed64.c: Ditto.
* gcc.target/riscv/zksh32.c: Ditto.
* gcc.target/riscv/zksh64.c: Ditto.

OK
jeff

Re: [PATCH 1/2] RISC-V: Make bit manipulation value / round number and shift amount types for builtins unsigned

2023-09-17 Thread Jeff Law via Gcc-patches





On 9/11/23 19:28, Tsukasa OI wrote:

From: Tsukasa OI 

For bit manipulation operations, input(s) and the manipulated output are
better to be unsigned like other target-independent builtins like
__builtin_bswap32 and __builtin_popcount.

Although this is not completely compatible as before (as the type changes),
most code will run normally, even without warnings (with -Wall -Wextra).

To make consistent to the LLVM commit 599421ae36c3 ("[RISCV] Use unsigned
instead of signed types for Zk* and Zb* builtins."), round numbers and
shift amount on the scalar crypto instructions are also changed
to unsigned.

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (RISCV_ATYPE_UQI): New for
uint8_t.  (RISCV_ATYPE_UHI): New for uint16_t.
(RISCV_ATYPE_QI, RISCV_ATYPE_HI, RISCV_ATYPE_SI, RISCV_ATYPE_DI):
Removed as no longer used.
(RISCV_ATYPE_UDI): New for uint64_t.
* config/riscv/riscv-cmo.def: Make types unsigned for not working
"zicbop_cbo_prefetchi" and working bit manipulation clmul builtin
argument/return types.
* config/riscv/riscv-ftypes.def: Make bit manipulation, round
number and shift amount types unsigned.
* config/riscv/riscv-scalar-crypto.def: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: Make signed type to unsigned.
* gcc.target/riscv/zbc64.c: Ditto.
* gcc.target/riscv/zbkb32.c: Ditto.
* gcc.target/riscv/zbkb64.c: Ditto.
* gcc.target/riscv/zbkc32.c: Ditto.
* gcc.target/riscv/zbkc64.c: Ditto.
* gcc.target/riscv/zbkx32.c: Ditto.
* gcc.target/riscv/zbkx64.c: Ditto.
* gcc.target/riscv/zknd32.c: Ditto.
* gcc.target/riscv/zknd64.c: Ditto.
* gcc.target/riscv/zkne32.c: Ditto.
* gcc.target/riscv/zkne64.c: Ditto.
* gcc.target/riscv/zknh-sha256.c: Ditto.
* gcc.target/riscv/zknh-sha512-32.c: Ditto.
* gcc.target/riscv/zknh-sha512-64.c: Ditto.
* gcc.target/riscv/zksed32.c: Ditto.
* gcc.target/riscv/zksed64.c: Ditto.
* gcc.target/riscv/zksh32.c: Ditto.
* gcc.target/riscv/zksh64.c: Ditto.

OK
Jeff

---

Re: [PATCH v1] RISC-V: Bugfix for scalar move with merged operand

2023-09-17 Thread Jeff Law via Gcc-patches






On 9/17/23 01:42, Pan Li via Gcc-patches wrote:

From: Pan Li 

Given below example for VLS mode

void
test (vl_t *u)
{
   vl_t t;
   long long *p = (long long *)&t;

   p[0] = p[1] = 2;

   *u = t;
}

The vec_set will simplify the insn to vmv.s.x when index is 0, without
merged operand. That will result in some problems in DCE, aka:

1:  137[DI] = a0
2:  138[V2DI] = 134[V2DI]  // deleted by DCE
3:  139[DI] = #2   // deleted by DCE
4:  140[DI] = #2   // deleted by DCE
5:  141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE
6:  138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE
7:  135[V2DI] = 138[V2DI]  // deleted by DCE
8:  142[V2DI] = 135[V2DI]  // deleted by DCE
9:  143[DI] = #2
10: 142[V2DI] = vec_dup:V2DI (143[DI])
11: (137[DI]) = 142[V2DI]

The higher 64 bits of 142[V2DI] is unknown here and it generated
incorrect code when store back to memory. This patch would like to
fix this issue by adding a new SCALAR_MOVE_MERGED_OP for vec_set.
I must be missing something.  Doesn't insn 10 broadcast the immediate 
0x2 to both elements of r142?!?  What am I missing?


JEff

Re: [pushed] [RA]: Improve cost calculation of pseudos with equivalences

2023-09-17 Thread Jeff Law via Gcc-patches





On 9/14/23 09:28, Vladimir Makarov via Gcc-patches wrote:
I've committed the following patch.  The reason for this patch is 
explained in its commit message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.



ra-equiv-cost.patch_ZN7cObject4dropEP12cOwnedObject-stores

commit 3c834d85f2ec42c60995c2b678196a06cb744959
Author: Vladimir N. Makarov
Date:   Thu Sep 14 10:26:48 2023 -0400

 [RA]: Improve cost calculation of pseudos with equivalences
 
 RISCV target developers reported that RA can spill pseudo used in a

 loop although there are enough registers to assign.  It happens when
 the pseudo has an equivalence outside the loop and the equivalence is
 not merged into insns using the pseudo.  IRA sets up that memory cost
 to zero when the pseudo has an equivalence and it means that the
 pseudo will be probably spilled.  This approach worked well for i686
 (different approaches were benchmarked long time ago on spec2k).
 Although common sense says that the code is wrong and this was
 confirmed by RISCV developers.
 
 I've tried the following patch on I7-9700k and it improved spec17 fp

 by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45%
 (8.54 vs 8.58).  The average generated code size is practically the
 same (0.001% difference).
 
 In the future we probably need to try more sophisticated cost

 calculation which should take into account that the equiv can not be
 combined in usage insns and the costs of reloads because of this.
 
 gcc/ChangeLog:
 
 * ira-costs.cc (find_costs_and_classes): Decrease memory cost

 by equiv savings.

Thanks for diving into this!

What's rather strange is when I do an A/B test with this patch on RISC-V 
it appears to be a pretty consistent loss for integer code.  This would 
seem to match your findings on x86 as well.


I still need to dig into it more deeply, but I see higher ALU as well as 
higher load/store traffic.  The load/store traffic in the one case I've 
looked at so far (omnetpp) appears to be prologue/epilogue related. 
Essentially we're using an additional callee saved register on paths 
that don't trigger at runtime.


Jeff

Re: [PATCH] RISC-V: Finish Typing Un-Typed Instructions and Turn on Assert

2023-09-17 Thread Jeff Law via Gcc-patches





On 9/12/23 00:18, Lehua Ding wrote:

Hi Jeff,

On 2023/9/12 11:47, Jeff Law wrote:
But that condition is _not_ generally sufficient to prevent these 
insns from existing during sched1.  ie, a pass between split1 and 
sched1 could create these patterns and successfully match them.  That 
in turn would trigger the assertion.


can_create_pseudo_p is true up through the register allocator.  As a 
result a condition like TARGET_VECTOR && can_create_pseudo_p() is 
_not_ sufficient to ensure the pattern does not exist during sched1.  
While no pass likely creates these kinds of insns right now in that 
window between split1 and sched1, there's no guarantee that will 
always be true.


But if a pass between split1 and sched1 creates these patterns, then an 
unrecognized error will throw after reload. Is that right? That is to 
say, this insn patterns is designed to exist only before split1, but now 
the conditions are a little looser, a little tighter is better if we 
can. If this is the case, I feel it makes no difference whether the 
error is thrwoed by sched pass or a pass after reload.
If someone was to create one of these patterns without an associated 
insn type, then the assert would trigger during sched1, and that is 
good.  The earlier we can catch an inconsistency, the better.






The simple rule is easy to follow.  Every insn should have a type.  
That also gives us a degree of future-proof, even if someone adds 
additional passes/capabilities between split1 and sched1.


However, adding content that you don't need feels even more difficult to 
understand, and this is just my feeling. It would be clearer if we could 
set the type according to the purpose of the insn pattern.
I understand your position, but respectfully disagree with the 
conclusion in this case.


jeff

Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-15 Thread Jeff Law via Gcc-patches





On 9/14/23 16:26, 钟居哲 wrote:

I don't think it can fix the case when it is -march=rv64gc_zve32x


juzhe.zh...@rivai.ai

*From:* Kito Cheng 
*Date:* 2023-09-15 00:17
*To:* Juzhe-Zhong 
*CC:* gcc-patches ; kito.cheng
; jeffreyalaw
; rdapp.gcc 
*Subject:* Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode
move[PR111391]
I am thinking what we are doing is something like we are allowing
scalar mode within the vector register, so...not sure should we try to
implement that within the mov pattern?
I guess we need some inputs from Jeff.
It's advantageous if we can avoid it.  It often gets quite ugly when you 
start allowing something like scalar modes in vector registers -- 
particularly if you support something other than simple moves.


But we may end up needing to do that anyway to do something like 
supporting the sqrt & recip estimators in scalar FP modes, which can be 
very advantageous for benchmarks like nab.


So my suggestion would be go ahead if it looks like it can really solve 
this problem -- knowing there will likely be a long tail of fallout.  If 
it doesn't help pr111391, then let's defer until we really dive into 
544.nab/644.nab and try to improve that sqrt (x) and 1/sqrt(x) sequence 
that shows up in there.


jeff

Re: [PATCH v1] RISC-V: Remove unused structure in cost model

2023-09-12 Thread Jeff Law via Gcc-patches





On 9/12/23 07:02, Pan Li via Gcc-patches wrote:

From: Pan Li 

The struct range is unused, remove it.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.h (struct range): Removed.

OK
jeff

Re: [PATCH] RISC-V: Finish Typing Un-Typed Instructions and Turn on Assert

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/11/23 21:17, Lehua Ding wrote:

Hi Jeff,

On 2023/9/12 11:00, Jeff Law wrote:

I'd rather be consistent and make it policy that every insn has a type.


Since the type set here will not be used by sched pass (these insn 
pattern will not exit at shced pass since use define_insn_and_split with 
condition `TARGET_VECTOR && can_create_pseudo_p ()`), I think it is easy 
to cause misunderstanding and some problems are not easy to find (e.g. 
accidentally went through the sched pass should be assert error).
But that condition is _not_ generally sufficient to prevent these insns 
from existing during sched1.  ie, a pass between split1 and sched1 could 
create these patterns and successfully match them.  That in turn would 
trigger the assertion.


can_create_pseudo_p is true up through the register allocator.  As a 
result a condition like TARGET_VECTOR && can_create_pseudo_p() is _not_ 
sufficient to ensure the pattern does not exist during sched1.  While no 
pass likely creates these kinds of insns right now in that window 
between split1 and sched1, there's no guarantee that will always be true.


The simple rule is easy to follow.  Every insn should have a type.  That 
also gives us a degree of future-proof, even if someone adds additional 
passes/capabilities between split1 and sched1.



jeff

Re: [PATCH v2 2/2] riscv: Add support for str(n)cmp inline expansion

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/6/23 10:07, Christoph Muellner wrote:

From: Christoph Müllner 

This patch implements expansions for the cmpstrsi and cmpstrnsi
builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
instructions are available.  The expansion basically emits a comparison
sequence which compares XLEN bits per step if possible.

This allows to inline calls to strcmp() and strncmp() if both strings
are xlen-aligned.  For strncmp() the length parameter needs to be known.
The benefits over calls to libc are:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment tests

The inlining mechanism is gated by a new switches ('-minline-strcmp' and
'-minline-strncmp') and by the variable 'optimize_size'.
The amount of emitted unrolled loop iterations can be controlled by the
parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.

The comparision sequence is inspired by the strcmp example
in the appendix of the Bitmanip specification (incl. the fast
result calculation in case the first word does not contain
a NULL byte).  Additional inspiration comes from rs6000-string.c.

The emitted sequence is not triggering any readahead pagefault issues,
because only aligned strings are accessed by aligned xlen-loads.

This patch has been tested using the glibc string tests on QEMU:
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
* rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_not): Export INSN name.
(_not3): Likewise.
* config/riscv/riscv-protos.h (riscv_expand_strcmp): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
macros.
(GEN_EMIT_HELPER2): Likewise.
(emit_strcmp_scalar_compare_byte): New function.
(emit_strcmp_scalar_compare_subword): Likewise.
(emit_strcmp_scalar_compare_word): Likewise.
(emit_strcmp_scalar_load_and_compare): Likewise.
(emit_strcmp_scalar_call_to_libc): Likewise.
(emit_strcmp_scalar_result_calculation_nonul): Likewise.
(emit_strcmp_scalar_result_calculation): Likewise.
(riscv_expand_strcmp_scalar): Likewise.
(riscv_expand_strcmp): Likewise.
* config/riscv/riscv.md (*slt_): Export
INSN name.
(@slt_3): Likewise.
(cmpstrnsi): Invoke expansion function for str(n)cmp.
(cmpstrsi): Likewise.
* config/riscv/riscv.opt: Add new parameter
'-mstring-compare-inline-limit'.
* doc/invoke.texi: Document new parameter
'-mstring-compare-inline-limit'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-strcmp-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strcmp.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled.c: New test.
* gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
* gcc.target/riscv/zbb-strcmp.c: New test.

OK for the trunk.  THanks for pushing this along.

jeff

Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/6/23 10:07, Christoph Muellner wrote:

From: Christoph Müllner 

This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:

rv32gc_zbb (RV64 is similar):
   add a3,a0,4
   li  a4,-1
.L1:  lw  a5,0(a0)
   add a0,a0,4
   orc.b   a5,a5
   beq a5,a4,.L1
   not a5,a5
   ctz a5,a5
   srl a5,a5,0x3
   add a0,a0,a5
   sub a0,a0,a3

rv64gc_xtheadbb (RV32 is similar):
   add   a4,a0,8
.L2:  lda5,0(a0)
   add   a0,a0,8
   th.tstnbz a5,a5
   beqz  a5,.L2
   th.reva5,a5
   th.ff1a5,a5
   srl   a5,a5,0x3
   add   a0,a0,a5
   sub   a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test

The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.

Tested using the glibc string tests.

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config.gcc: Add new object riscv-string.o.
riscv-string.cc.
* config/riscv/riscv-protos.h (riscv_expand_strlen):
New function.
* config/riscv/riscv.md (strlen): New expand INSN.
* config/riscv/riscv.opt: New flag 'minline-strlen'.
* config/riscv/t-riscv: Add new object riscv-string.o.
* config/riscv/thead.md (th_rev2): Export INSN name.
(th_rev2): Likewise.
(th_tstnbz2): New INSN.
* doc/invoke.texi: Document '-minline-strlen'.
* emit-rtl.cc (emit_likely_jump_insn): New helper function.
(emit_unlikely_jump_insn): Likewise.
* rtl.h (emit_likely_jump_insn): New prototype.
(emit_unlikely_jump_insn): Likewise.
* config/riscv/riscv-string.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strlen.c: New test.
* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
* gcc.target/riscv/zbb-strlen-disabled.c: New test.
* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
* gcc.target/riscv/zbb-strlen.c: New test.
Note that I don't think we need the new UNSPEC_STRLEN since its only 
used in the expander and doesn't survive into RTL.  Your call on whether 
or not to remove it now or as a separate patch (or keep it if I'm wrong 
about it not being needed.


OK for the trunk.  Sorry this got lost in the shuffle last year.

jeff
.

Re: [PATCH 3/3] [V2] [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/7/23 14:33, Palmer Dabbelt wrote:

On Thu, 07 Sep 2023 13:16:36 PDT (-0700), dimi...@dinux.eu wrote:

Hi,

This patch appears to have caused PR 111259.


Thanks.  Looks like wer'e not running our tests with RTL checking, 
Patrick is going to try and see if we've got compute time left for some 
builds -- even just having builds with checking would be a good one, we 
get bit by these bugs from time to time.


I'm spinning up a --enable-checking=yes build.  Maybe we just need 
something like


    diff --git a/gcc/config/riscv/predicates.md 
b/gcc/config/riscv/predicates.md

    index 53e7c1d03aa..aa4f02c67d5 100644
    --- a/gcc/config/riscv/predicates.md
    +++ b/gcc/config/riscv/predicates.md
    @@ -172,11 +172,11 @@ (define_predicate "stack_pop_up_to_s11_operand"
     ;; ZCMP predicates
     (define_predicate "a0a1_reg_operand"
    -  (and (match_operand 0 "register_operand")
    +  (and (match_code "reg")
    (match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))
     (define_predicate "zcmp_mv_sreg_operand"
    -  (and (match_operand 0 "register_operand")
    +  (and (match_code "reg")
    (match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, 
S1_REGNUM)

     : IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
     || IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))

Presumably you got a SUBREG if that patch solves the problem.

Note that due to RTL checking's compile-time cost, I think it's a 
separate option to enable-checking.  It's probably not feasible to turn 
it on for anything but cross testing right now.


IIRC Jakub does RTL checking builds & regression testing just once a 
year in the spring due to its cost.  And I think on just one target 
(x86).  We could probably do something similar to weed out these kinds 
of problems.  Spin it in November and wait however long it takes :-)


jeff

Re: [PATCH v5] RISC-V:Optimize the MASK opt generation

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/7/23 19:26, Feng Wang wrote:


  Supported ISA specs (for use with the -misa-spec= option):
diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index 36de4639318..cbfcf7dabcf 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -387,3 +387,14 @@ function integer_range_info(range_option, init, option, 
uinteger_used)
  else
  return "-1, -1"
  }
+
+# Find the index of target variable from extra_target_vars
+function find_index(var, var_arry, n_var_arry)
+{
+for (var_index = 0; var_index < n_var_arry; var_index++)
+{
+if (var_arry[var_index] == var)
+break
+}
+return var_index
+}
So if the mapping isn't found, it looks like this will return 
n_var_arry.  It looks like the caller expects that and adds a new entry 
to var_arry.  I would suggest a better comment.  Perhaps something like


# Find the index of VAR in VAR_ARRY which as length N_VAR_ARRY.  If
# VAR is not found, return N_VAR_ARRY.

I believe there is extensive documentation of the options handling in 
doc/options.texi.  The new functionality in this patch should be 
documented there.  Give it your best shot and I can help polish up the 
language.


Thanks,
Jeff

Re: [PATCH] RISC-V: Finish Typing Un-Typed Instructions and Turn on Assert

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/11/23 20:33, Lehua Ding wrote:

Hi Edwin,

Sorry to bother you. I have a small question for you.

On 2023/9/12 6:52, Edwin Lu wrote:
--- a/gcc/config/riscv/autovec-opt.md +++ 
b/gcc/config/riscv/autovec-opt.md @@ -649,7 +649,8 @@ 
(define_insn_and_split "*cond_" gen_int_mode 
(GET_MODE_NUNITS (mode), Pmode)}; 
riscv_vector::expand_cond_len_unop (icode, ops); DONE; -}) +} 
+[(set_attr "type" "vector")])


Is it necessary to add type attribute to these INSNs that exist only 
before split1 pass? These instructions are expanded into the pattern in 
vector.md after split1 pass and do not reach the sched1 pass at all. If 
so, I feel that these patterns cannot be added, and it is reasonable to 
report an error if these INSNs reach sched1 pass. Thanks.

I'd rather be consistent and make it policy that every insn has a type.

The next thing I'd like to do (and it's probably much more work) is to 
validate that every insn type maps to a functional unit for the sched2 
pass.  The scheduler will do some horrible things to the generated code 
when lots of insns either don't have types or the types aren't matched 
to a functional unit.


Jeff

Re: RISC-V: Replace not + bitwise_imm with li + bitwise_not

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/11/23 13:16, Andrew Waterman via Gcc-patches wrote:

Note this is a size-speed tradeoff, as the Zcb extension has a
16-bit-wide C.NOT instruction.  Might want to suppress this
optimization when Zcb is present and the function is being optimize > for size.

Yea, let's gate this on !optimize_function_for_size_p (cfun).

While we don't have Zcb support yet, hopefully we will soon.

jeff

Re: [PATCH] RISC-V: Finish Typing Un-Typed Instructions and Turn on Assert

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/11/23 16:52, Edwin Lu wrote:

Updates autovec instruction that was added after last patch and turns on the
assert statement to ensure all new instructions have a type.

* config/riscv/autovec-opt.md: Update type
* config/riscv/riscv.cc (riscv_sched_variable_issue): Remove assert
"Remove assert" -> "Enable assert."  We're removing the #if 0 guard, not 
the assert itself :-)


OK with the ChangeLog fixed.
jeff

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/9/23 02:46, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v5:
 - Introduce new helper function fold_offsets_1.
 - Fix bug because constants could be partially propagated
   through instructions that weren't understood.
 - Introduce helper class fold_mem_info that stores f-m-o
   info for an instruction.
 - Calculate fold_offsets only once with do_fold_info_calculation.
 - Fix correctness issue by introducing compute_validity_closure.
 - Propagate in more cases for PLUS/MINUS with constant.

Changes in v4:
 - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
 - Remove fold_mem_offsets_driver and enum fold_mem_phase.
 - Call recog when patching offsets in do_commit_offset.
 - Restore INSN_CODE after modifying insn in do_check_validity.

Changes in v3:
 - Added propagation for more codes:
   sub, neg, mul.
 - Added folding / elimination for sub and
   const int moves.
 - For the validity check of the generated addresses
   also test memory_address_addr_space_p.
 - Replaced GEN_INT with gen_int_mode.
 - Replaced some bitmap_head with auto_bitmap.
 - Refactor each phase into own function for readability.
 - Add dump details.
 - Replace rtx iteration with reg_mentioned_p.
 - Return early for codes that we can't propagate through.

Changes in v2:
 - Made the pass target-independant instead of RISCV specific.
 - Fixed a number of bugs.
 - Add code to handle more ADD patterns as found
   in other targets (x86, aarch64).
 - Improved naming and comments.
 - Fixed bitmap memory leak.





+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+{
+  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+   return NULL;
+  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+   break;
+}
+
+  if (!use)
+return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
So what if there's two uses of REG in INSN?  I don't think it's be 
common at all, but probably better safe and reject than sorry, right? Or 
is that case filtered out earlier?






+
+  rtx_insn* def = DF_REF_INSN (ref_chain->ref);

Formatting nit.  The '*' should be next to the variable, not the type.


+



+
+static HOST_WIDE_INT
+fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
+
+/*  Helper function for fold_offsets.
+
+If DO_RECURSION is false and ANALYZE is true this function returns true iff
+it understands the structure of INSN and knows how to propagate constants
+through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
+
+If DO_RECURSION is true then it also calls fold_offsets for each recognised
+part of INSN with the appropriate arguments.
+
+If DO_RECURSION is true and ANALYZE is false then offset that would result
+from folding is computed and is returned through the pointer OFFSET_OUT.
+The instructions that can be folded are recorded in FOLDABLE_INSNS.
+*/
+static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
+   HOST_WIDE_INT *offset_out, bitmap foldable_insns)
Nit.  Linkage and return type on separate line.  That makes the function 
name start at the beginning of a line.





+
+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction

Re: [PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311]

2023-09-11 Thread Jeff Law via Gcc-patches





On 9/10/23 21:42, juzhe.zh...@rivai.ai wrote:

Ping this patch.

I think it's time to enable scalable vectorization by default and do the 
whole regression every time (except vect.exp that we didn't enable yet)


Update current FAILs status:

Real FAILS (ICE and execution FAIL):

FAIL: gcc.dg/pr70252.c (internal compiler error: in 
gimple_expand_vec_cond_expr, at gimple-isel.cc:284)

FAIL: gcc.dg/pr70252.c (test for excess errors)
FAIL: gcc.dg/pr92301.c execution test

Robin is working on these 3 issues and will be solved soon.

FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in as_a, at machmode.h:381)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in as_a, at machmode.h:381)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)

This is a long time known issue I have mentioned many times, we need 
help for LTO since it's caused by mode bits extension.


The rest bogus FAILs:
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "Not unrolling loop, 
doesn't roll"

FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "likely upper bound: 6"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "realistic bound: -1"
FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding 
Accumulator"
FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump cunroll "optimized: 
loop with [0-9]+ iterations completely unrolled"

FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump-not optimized "foo"
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized 
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized 
"BIT_INSERT_EXPR" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized 
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized 
"BIT_INSERT_EXPR" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11c.c scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment 
of access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment 
of access forced using peeling" 1

FAIL: gcc.dg/tree-ssa/loop-bound-1.c scan-tree-dump ivopts "bounded by 254"
FAIL: gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump ivopts "bounded by 254"
FAIL: gcc.dg/tree-ssa/predcom-2.c scan-tree-dump-times pcom "Unrolling 2 
times." 2

FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom "Combination" 1
FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom "Unrolling 3 
times." 1

FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom "Combination" 2
FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom "Unrolling 3 
times." 1
FAIL: gcc.dg/tree-ssa/predcom-9.c scan-tree-dump pcom "Executing 
predictive commoning without unrolling"
FAIL: gcc.dg/tree-ssa/reassoc-46.c scan-tree-dump-times optimized 
"(?:vect_)?sum_[\\d._]+ = (?:(?:vect_)?_[\\d._]+ \\+ 
(?:vect_)?sum_[\\d._]+|(?:v   ect_)?sum_[\\d._]+ \\+ (?:vect_)?_[\\d._]+)" 1
FAIL: gcc.dg/tree-ssa/scev-10.c scan-tree-dump-times ivopts " 
  Type:\\tREFERENCE ADDRESS\n" 1
FAIL: gcc.dg/tree-ssa/scev-11.c scan-tree-dump-times ivopts " 
  Type:\\tREFERENCE ADDRESS\n" 2
FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts "Overflowness wrto 
loop niter:\tNo-overflow"
FAIL: gcc.dg/tree-ssa/scev-9.c scan-tree-dump-times ivopts " 
  Type:\\tREFERENCE ADDRESS\n" 1
FAIL: gcc.dg/tree-ssa/split-path-11.c scan-tree-dump-times split-paths 
"join point for if-convertable half-diamond" 1


These are bogus dump FAILs and I have 100% confirm each of them, we are 
having same behavior as SVE.


So is this patch ok for trunk ?

Yes, this is OK for the trunk.

Jeff

Re: [PATCH] MATCH: [PR111346] `X CMP MINMAX` pattern missing :c on CMP

2023-09-10 Thread Jeff Law via Gcc-patches





On 9/10/23 20:18, Andrew Pinski via Gcc-patches wrote:

I noticed this while working on other MINMAX optimizations. It was
hard to find a simplified testcase though because it was dependent on
the ssa name versions. Adding the `:c` to cmp allows the pattern to
be match for the case where minmax as the first operand of the comparison
rather than the second.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR tree-optimization/111346

gcc/ChangeLog:

* match.pd (`X CMP MINMAX`): Add `:c` on the cmp part
of the pattern

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmaxcmp-1.c: New test.

OK
jeff

Re: [PATCH] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2023-09-10 Thread Jeff Law via Gcc-patches





On 9/8/23 06:39, Andrew Pinski via Gcc-patches wrote:

The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the comparison
which was wrong.
The match pattern copied the same logic mistake when they were added in
r14-1411-g17cca3c43e2f49 .

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111331
* match.pd (`(a CMP CST1) ? max : a`):
Fix the LE/GE comparison to the correct value.
* tree-ssa-phiopt.cc (minmax_replacement):
Fix the LE/GE comparison for the
`(a CMP CST1) ? max : a` optimization.

gcc/testsuite/ChangeLog:

PR tree-optimization/111331
* gcc.c-torture/execute/pr111331-1.c: New test.
* gcc.c-torture/execute/pr111331-2.c: New test.
* gcc.c-torture/execute/pr111331-3.c: New test.

OK
jeff

Re: [PATCH] RISC-V Add Types to Un-Typed Thead Instructions:

2023-09-10 Thread Jeff Law via Gcc-patches





On 8/31/23 11:36, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the THEAD instructions to ensure that no insn is left
without a type attribute.

Tested for regressions using rv32/64 multilib for linux/newlib.

gcc/Changelog:

* config/riscv/thead.md: Update types
OK.  THe first could arguably be "multi", but both instructions it 
generates appear to be move/conversions, so "fmove" is reasonable as well.


Ok for the trunk.  And I think that's should allow us to turn on the 
assertion, right?


jeff

Re: [PATCH V2] RISC-V: Avoid unnecessary slideup in compress pattern of vec_perm

2023-09-10 Thread Jeff Law via Gcc-patches





On 9/10/23 08:07, Juzhe-Zhong wrote:

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid 
unnecessary slideup.

OK
jeff

Re: [PATCH] RISC-V: Expand fixed-vlmax/vls vector permutation in targethook

2023-09-10 Thread Jeff Law via Gcc-patches





On 9/9/23 20:33, Juzhe-Zhong wrote:

When debugging FAIL: gcc.dg/pr92301.c execution test.
Realize a vls vector permutation situation failed to vectorize since early 
return false:

-  /* For constant size indices, we dont't need to handle it here.
- Just leave it to vec_perm.  */
-  if (d->perm.length ().is_constant ())
-return false;

To avoid more potential failed vectorization case. Now expand it in targethook.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_generic_patterns): Expand 
fixed-vlmax/vls vector permutation.

OK.
jeff

Re: [PATCH] RISC-V: Avoid unnecessary slideup in compress pattern of vec_perm

2023-09-10 Thread Jeff Law via Gcc-patches





On 9/9/23 21:55, Juzhe-Zhong wrote:

If a const vector all elements are same, the slide up is unnecessary.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_compress_patterns): Avoid 
unnecessary slideup.

---
  gcc/config/riscv/riscv-v.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index bee60de1d26..7ef884907b8 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2697,7 +2697,7 @@ shuffle_compress_patterns (struct expand_vec_perm_d *d)
rtx mask = force_reg (mask_mode, builder.build ());
  
rtx merge = d->op1;

-  if (need_slideup_p)
+  if (need_slideup_p && !const_vec_duplicate_p (d->op1))
  {
int slideup_cnt = vlen - (d->perm[vlen - 1].to_constant () % vlen) - 1;
rtx ops[] = {d->target, d->op1, gen_int_mode (slideup_cnt, Pmode)};
Would it be better to adjust how we compute need_slidup_p to check 
!const_vec_duplicate_p (d->op1) instead of doing it here?


That way the name "need_slideup_p" stays consistent with the intent of 
the code.  It would also mean we wouldn't need to duplicate the 
additional check if we wanted to model the use of slideup in the cost 
calculations.


Jeff

Re: [PATCH v2 2/5] RISC-V: Add Types for Un-Typed zc Instructions

2023-09-08 Thread Jeff Law via Gcc-patches





On 9/8/23 12:16, Edwin Lu wrote:

This patch adds types to the untyped zc instructions. Creates a new
types "pushpop" and "mvpair" for now

gcc/ChangeLog:

* config/riscv/riscv.md: Add "csr" type
* config/riscv/zc.md: Update types

OK.

Note that once we finish this exercise, making sure any new types that 
have been created are handled by the existing scheduling descriptions 
would be good follow-up.


jeff

Re: [PATCH v2 1/5] RISC-V: Update Types for Vector Instructions

2023-09-08 Thread Jeff Law via Gcc-patches





On 9/8/23 12:16, Edwin Lu wrote:

This patch adds types to vector instructions that were added after or were
missed by the original patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628594.html

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Update types
* config/riscv/autovec.md: likewise
I think these were all define_insn_and_splits, so just about anything 
will do.  OK.


jeff

Re: [PATCH 3/3] [V2] [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

2023-09-08 Thread Jeff Law via Gcc-patches





On 9/7/23 20:06, Fei Gao wrote:

On 2023-09-08 04:33  Palmer Dabbelt  wrote:


On Thu, 07 Sep 2023 13:16:36 PDT (-0700), dimi...@dinux.eu wrote:

Hi,

This patch appears to have caused PR 111259.

Hi Patrick

We're reproducing the issue also.

One thing that puzzles me is why a zcmp predicate casused
a regression in rv64gc single lib build. The new define_insn
and define_peephole2 are all gated by TARGET_ZCMP, which
is false when building libgcc.
Operand predicates are checked for validity before the insn condition is 
checked.


Jeff

Re: [PATCH 5/5] RISC-V: Remove Assert Protecting Types

2023-09-07 Thread Jeff Law via Gcc-patches





On 9/6/23 11:50, Edwin Lu wrote:

This patch turns on the assert which ensures every instruction has type
that is not TYPE_UNKNOWN.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): Remove assert
And this is fine.  But hold off committing until all the dependencies 
are also committed.  I think there's still one earlier patch that I 
wanted to look at again.


jeff

Re: [PATCH 3/5] RISC-V: Add Types to Un-Typed Zicond Instructions

2023-09-07 Thread Jeff Law via Gcc-patches





On 9/6/23 18:42, Tsukasa OI via Gcc-patches wrote:



Looks okay to me but will need to resolve merge conflicts after commit
af88776caa20 ("RISC-V: Add support for 'XVentanaCondOps' reusing
'Zicond' support").
Sure.  We allow trival updates to resolve merge conflicts without 
needing another round of review.


Jeff

Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion

2023-09-06 Thread Jeff Law via Gcc-patches





On 9/6/23 10:22, Palmer Dabbelt wrote:
On Wed, 06 Sep 2023 09:07:33 PDT (-0700), christoph.muell...@vrull.eu 
wrote:

From: Christoph Müllner 

This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are 
available.

The inserted sequences are:

rv32gc_zbb (RV64 is similar):
  add a3,a0,4
  li  a4,-1
.L1:  lw  a5,0(a0)
  add a0,a0,4
  orc.b   a5,a5
  beq a5,a4,.L1
  not a5,a5
  ctz a5,a5
  srl a5,a5,0x3
  add a0,a0,a5
  sub a0,a0,a3

rv64gc_xtheadbb (RV32 is similar):
  add   a4,a0,8
.L2:  ld    a5,0(a0)
  add   a0,a0,8
  th.tstnbz a5,a5
  beqz  a5,.L2
  th.rev    a5,a5
  th.ff1    a5,a5
  srl   a5,a5,0x3
  add   a0,a0,a5
  sub   a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test

The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.


Maybe this is more of a Jeff question, but this looks to me like 
something that should be target-agnostic -- maybe we need some backend 
work to actually emit the special instruction, but IIRC this is a 
somewhat common flavor of instruction and is in other ISAs as well.  It 
looks like there's already a strlen insn, so I guess the core issue is 
why we need that unspec?


Sorry if I'm just missing something, though...


The generic strlen expansion in GCC doesn't really expand a strlen loop. 
 It really just calls into the target code and forces the target to 
handle everything.



We could have generic strlen expansion code that kicks in if the target 
expander fails.  And we could probably create the necessary opcodes to 
express the optimized end-of-string comparison instructions that exist 
on various architectures.  I'm not not sure it's worth that much effort 
given targets are already doing their own strlen expansions.


jeff

Re: [PATCH v4 1/1] RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

2023-09-06 Thread Jeff Law via Gcc-patches





On 9/5/23 23:47, Tsukasa OI wrote:

From: Tsukasa OI 

'XVentanaCondOps' is a vendor extension from Ventana Micro Systems
containing two instructions for conditional move and will be supported on
their Veyron V1 CPU.

And most notably (for historical reasons), 'XVentanaCondOps' and the
standard 'Zicond' extension are functionally equivalent (only encodings and
instruction names are different).

*   czero.eqz == vt.maskc
*   czero.nez == vt.maskcn

This commit adds support for the 'XVentanaCondOps' extension by extending
'Zicond' extension support.  With this, we can now reuse the optimization
using the 'Zicond' extension for the 'XVentanaCondOps' extension.

The specification for the 'XVentanaCondOps' extension is based on:


gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Parse 'XVentanaCondOps' extension.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): New.
(TARGET_XVENTANACONDOPS): Ditto.
(TARGET_ZICOND_LIKE): New to represent targets with conditional
moves like 'Zicond'.  It includes RV64 + 'XVentanaCondOps'.
* config/riscv/riscv.cc (riscv_rtx_costs): Replace TARGET_ZICOND
with TARGET_ZICOND_LIKE.
(riscv_expand_conditional_move): Ditto.
* config/riscv/riscv.md (movcc): Replace TARGET_ZICOND with
TARGET_ZICOND_LIKE.
* config/riscv/riscv.opt: Add new riscv_xventana_subext.
* config/riscv/zicond.md: Modify description.
(eqz_ventana): New to match corresponding czero instructions.
(nez_ventana): Ditto.
(*czero..): Emit a 'XVentanaCondOps' instruction if
'Zicond' is not available but 'XVentanaCondOps' + RV64 is.
(*czero..): Ditto.
(*czero.eqz..opt1): Ditto.
(*czero.nez..opt2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-primitiveSemantics.c: New test,
* gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c: New
test to make sure that XVentanaCondOps instructions are disabled
on RV32.
* gcc.target/riscv/xventanacondops-xor-01.c: New test,
OK.  Thanks for taking care of this.  I guess Raphael and I should get 
more active on pushing the rest of the veyron-v1 bits upstream :-)


Jeff

Re: [PATCH] RISC-V: Fix incorrect mode tieable which cause ICE in RA[PR111296]

2023-09-06 Thread Jeff Law via Gcc-patches





On 9/6/23 03:47, Juzhe-Zhong wrote:

This patch fix incorrect mode tieable between DI and V2SI which cause ICE
in RA.

PR target/111296

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_modes_tieable_p): Fix bug.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/pr111296.C: New test.




---
  gcc/config/riscv/riscv.cc  |  7 +++
  .../g++.target/riscv/rvv/base/pr111296.C   | 18 ++
  2 files changed, 25 insertions(+)
  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr111296.C

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 228515acc1f..2c0c4c2f3ae 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7648,6 +7648,13 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
  static bool
  riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
  {
+  /* We don't allow different REG_CLASS modes tieable since it
+ will cause ICE in register allocation (RA).
+ E.g. V2SI and DI are not tieable.  */
+  if (riscv_v_ext_mode_p (mode1) && !riscv_v_ext_mode_p (mode2))
+return false;
+  else if (riscv_v_ext_mode_p (mode2) && !riscv_v_ext_mode_p (mode1))
+return false;

Isn't this just
if (riscv_v_ext_mode_p (mode1) != riscv_v_ext_mode_p (mode2))

OK with that change.

jeff

Re: [PATCH v3 1/1] RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/5/23 20:33, Tsukasa OI wrote:


Internally we have this as:

(TARGET_ZICOND || TARGET_XVENTANACONDOPS)

I don't really care, so I'm happy to go with yours.


Because XVentanaCondOps instructions are only available on 64-bit target
(I wanted to prevent misuses because we don't reject XVentanaCondOps on
RV32), the target expression would be:

(a) (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT))

and I had three options to deal with it.

1.  Use the plain expression (a)
2.  Name the expression (a)
3.  Enable TARGET_XVENTANACONDOPS only on 64-bit target

I think option 2 is the simplest yet understandable.
Sure.  It may also give us the option to roll in some of the thead code 
at some point.  Their conditional move support seems to line up pretty 
well with zicond/xventanacondops too, though I haven't looked at it very 
deeply yet.






I'm happy to hear that because I had no confidence so whether we can use
#include to share common parts.  I haven't tried yet but I believe we
have to #include only common parts (not including dg instructions
containing -march=..._zicond) so I will likely required to modify zicond
tests as well.
You actually don't even have to break out the common parts.  The dg- 
directives in an included file aren't parsed by the dg framework.





I'll submit PATCH v4 (not committing directly) as changes will be a bit
larger (and Jeff's words seem "near approval" even after fixing the
tests, not complete approval).
Sounds perfect.  Given the bulk of the review work is already done, the 
final review ack will be easy.


jeff

Re: [PATCH v2] RISC-V: Fix Zicond ICE on large constants

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/5/23 06:08, Tsukasa OI wrote:

From: Tsukasa OI 

Large constant cons and/or alt will trigger ICEs building GCC target
libraries (libgomp and libatomic) when the 'Zicond' extension is enabled.

For instance, zicond-ice-2.c (new test case in this commit) will cause
an ICE when SOME_NUMBER is 0x1000 or larger.  While opposite numbers
corresponding cons/alt (two temp2 variables) are checked, cons/alt
themselves are not checked and causing 2 ICEs building
GCC target libraries as of this writing:

1.  gcc/libatomic/config/posix/lock.c
2.  gcc/libgomp/fortran.c

Coercing a large value into a register will fix the issue.

It also coerce a large cons into a register on "imm, imm" case (the author
could not reproduce but possible to cause an ICE).

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Force
large constant cons/alt into a register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ice-2.c: New test.  This is based on
an ICE at libat_lock_n func on gcc/libatomic/config/posix/lock.c
but heavily minimized.

"New test." is sufficient.  No need to change it, just a note going forward.

OK.  Thanks for taking care of this.

jeff

Re: [PATCH] RISC-V: Add conditional sqrt autovec pattern

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/3/23 22:49, Lehua Ding wrote:

This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_):
Add sqrt + vcond_mask combine pattern.
* config/riscv/autovec.md (2):
Change define_expand to define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.

OK.  Thanks.

FWIW, I thought we only had the reciprocal sqrt estimator, but in fact 
rvv does define a real vector sqrt.   So the concerns we kicked around 
in the meeting this morning turned out not be warranted.


This raises one of the very interesting questions in this space, 
specifically whether or not we should be using the rsqrt estimator with 
correction steps.   Unless the vfsqrt latency is really bad, it's going 
to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7 
sequences will be pipelinable while vfsqrt almost certainly isn't.


Sadly we don't have a scalar FP rsqrt estimator.  Though I certainly 
ponder using the vector one -- there's a neat trick you can do with the 
nab benchmark from spec and produce sqrt and rsqrt at the same time with 
a Goldschmidt sequence.  It requires a bit of hackery to make new tree 
nodes, but it was definitely worth it on other targets I've worked on.



Jeff

Re: [PATCH v3 1/1] RISC-V: Add support for 'XVentanaCondOps' reusing 'Zicond' support

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/5/23 06:10, Tsukasa OI wrote:

From: Tsukasa OI 

'XVentanaCondOps' is a vendor extension from Ventana Micro Systems
containing two instructions for conditional move and will be supported on
their Veyron V1 CPU.

And most notably (for historical reasons), 'XVentanaCondOps' and the
standard 'Zicond' extension are functionally equivalent (only encodings and
instruction names are different).

*   czero.eqz == vt.maskc
*   czero.nez == vt.maskcn

This commit adds support for the 'XVentanaCondOps' extension by extending
'Zicond' extension support.  With this, we can now reuse the optimization
using the 'Zicond' extension for the 'XVentanaCondOps' extension.

The specification for the 'XVentanaCondOps' extension is based on:


gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_flag_table):
Parse 'XVentanaCondOps' extension.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): New.
(TARGET_XVENTANACONDOPS): Ditto.
(TARGET_ZICOND_LIKE): New to represent targets with conditional
moves like 'Zicond'.  It includes RV64 + 'XVentanaCondOps'.
* config/riscv/riscv.cc (riscv_rtx_costs): Replace TARGET_ZICOND
with TARGET_ZICOND_LIKE.
(riscv_expand_conditional_move): Ditto.
* config/riscv/riscv.md (movcc): Replace TARGET_ZICOND with
TARGET_ZICOND_LIKE.
* config/riscv/riscv.opt: Add new riscv_xventana_subext.
* config/riscv/zicond.md: Modify description.
(eqz_ventana): New to match corresponding czero instructions.
(nez_ventana): Ditto.
(*czero..): Emit a 'XVentanaCondOps' instruction if
'Zicond' is not available but 'XVentanaCondOps' + RV64 is.
(*czero..): Ditto.
(*czero.eqz..opt1): Ditto.
(*czero.nez..opt2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-primitiveSemantics.c: New test,
modified from zicond-primitiveSemantics.c.
* gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c: New
test to make sure that XVentanaCondOps instructions are disabled
on RV32.
* gcc.target/riscv/xventanacondops-xor-01.c: New test, modified
from zicond-xor-01.c.
---
  gcc/common/config/riscv/riscv-common.cc   |  2 +

  \

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8d8f7b4f16ed..eb10f4a3323f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2745,7 +2745,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
-  else if (TARGET_ZICOND
+  else if (TARGET_ZICOND_LIKE

Internally we have this as:

(TARGET_ZICOND || TARGET_XVENTANACONDOPS)

I don't really care, so I'm happy to go with yours.



+(define_code_attr eqz_ventana [(eq "maskcn") (ne "maskc")])
+(define_code_attr nez_ventana [(eq "maskc") (ne "maskcn")])

We did these as N/n which output n or nothing:

(define_code_attr n [(eq "n") (ne "")])
(define_code_attr N [(eq "") (ne "n")])


  
  ;; Zicond

  (define_insn "*czero.."
@@ -28,8 +31,15 @@
  (const_int 0))
(match_operand:GPR 2 "register_operand""r")
(const_int 0)))]
-  "TARGET_ZICOND"
-  "czero.\t%0,%2,%1"
+  "TARGET_ZICOND_LIKE"
+  {
+if (TARGET_ZICOND)
+  return "czero.\t%0,%2,%1";
+else if (TARGET_XVENTANACONDOPS && TARGET_64BIT)
+  return "vt.\t%0,%2,%1";
+else
+  gcc_unreachable ();
+  }
  )

And so the output template ends up like this:


  "* return TARGET_ZICOND ? \"czero.\t%0,%2,%1\" : \"vt.maskc\t%0,%2,%1\"; 
"


But again, I don't care enough about this to make it a big deal and I'm 
happy to go with your approach.





diff --git 
a/gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c 
b/gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c
new file mode 100644
index ..992f1425c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xventanacondops-primitiveSemantics-rv32.c
So we're never going to have an rv32 variant.  So I don't think we need 
rv32 xventanacondops tests.


For the tests we keep, the right way to do them is with #includes.

ie start with this:




+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_xventanacondops -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og"} } */


Then #include the zicond variant of the test



+
+/* { dg-final { scan-assembler-times "vt\\.maskc\t" 6 } } */
+/* { dg-final { scan-assembler-times "vt\\.maskcn\t" 6 } } */
+/* { dg-final { scan-assembler-not "beq" } } */
+/* { dg-final { scan-assembler-not "bne" } } */

Then you have the assembly scan strings.

That way we don't duplicate the actual test code.

If you could fixup the tests, then I think

Re: [PATCH] riscv: Synthesize all 11-bit-rotate constants with rori

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/5/23 15:15, Christoph Muellner wrote:

From: Christoph Müllner 

Some constants can be built up using LI+RORI instructions.
The current implementation requires one of the upper 32-bits
to be a zero bit, which is not neccesary.
Let's drop this requirement in order to be able to synthesize
a constant like 0x00ffL.

The tests for LI+RORI are made more strict to detect regression
in the calculation of the LI constant and the rotation amount.

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Don't
require one zero bit in the upper 32 bits for LI+RORI synthesis.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
* gcc.target/riscv/zbb-li-rotr.c: Likewise.

OK
jeff

Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/5/23 15:39, 钟居哲 wrote:

- Why don't we use the normal reverse postorder (or postorder) approach of
    computing live ranges?  Is that because we don't really need full global
    live ranges?

Yes. We don't need global live ranges.

- Why can't we use existing code i.e. tree-ssa-live?  I suspect I already
    know the answer but an explanation (in a comment) would still be useful.

The existing code can't help I have tried many times.
I would expect it to be fairly hard to use for this purpose.  I've tried 
to use it in other contexts as well without success.


Jeff

[committed] RISC-V: Expose bswapsi for TARGET_64BIT

2023-09-05 Thread Jeff Law via Gcc-patches

Various bswapsi tests are failing for rv64.  More importantly, we're 
generating crappy code.


Let's take the first test from bswapsi-1.c as an example.


typedef unsigned int uint32_t;

#define __const_swab32(x) ((uint32_t)(\
(((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\
(((uint32_t)(x) & (uint32_t)0xff00UL) <<  8) |\
(((uint32_t)(x) & (uint32_t)0x00ffUL) >>  8) |\
(((uint32_t)(x) & (uint32_t)0xff00UL) >> 24)))

/* This byte swap implementation is used by the Linux kernel and the
   GNU C library.  */

uint32_t
swap32_a (uint32_t in)
{
  return __const_swab32 (in);
}






We generate this for rv64gc_zba_zbb_zbs:


srliw   a1,a0,24
slliw   a5,a0,24
slliw   a3,a0,8
li  a2,16711680
li  a4,65536
or  a5,a5,a1
and a3,a3,a2
addia4,a4,-256
srliw   a0,a0,8
or  a5,a5,a3
and a0,a0,a4
or  a0,a5,a0
ret

Urgh!

After this patch we generate:


rev8a0,a0
sraia0,a0,32
ret


Clearly better.


The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is 
that the RTL expanders already know how to widen a bswap, which is 
definitely true.  But it's the case that failure to expose a bswapsi 
will cause the 32bit bswap optimizations in gimple store merging to not 
trigger.  Thus we get crappy code.


To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging 
then detects the 32bit bswap idioms and generates suitable __builtin 
calls.  The expander will "FAIL" expansion for TARGET_64BIT which forces 
the generic expander code to synthesize the operation (we could 
synthesize in here, but that'd result in duplicate code).


Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the 
testsuite without any regressions.


Pushed to the trunk,

Jeff


commit fbc01748ba46eb26074388a8fb7b44d25a414a72
Author: Jeff Law 
Date:   Tue Sep 5 15:39:16 2023 -0600

RISC-V: Expose bswapsi for TARGET_64BIT

Various bswapsi tests are failing for rv64.  More importantly, we're 
generating
crappy code.

Let's take the first test from bswapsi-1.c as an example.

> typedef unsigned int uint32_t;
>
> #define __const_swab32(x) ((uint32_t)(\
> (((uint32_t)(x) & (uint32_t)0x00ffUL) << 24) |\
> (((uint32_t)(x) & (uint32_t)0xff00UL) <<  8) |\
> (((uint32_t)(x) & (uint32_t)0x00ffUL) >>  8) |\
> (((uint32_t)(x) & (uint32_t)0xff00UL) >> 24)))
>
> /* This byte swap implementation is used by the Linux kernel and the
>GNU C library.  */
>
> uint32_t
> swap32_a (uint32_t in)
> {
>   return __const_swab32 (in);
> }
>
>
>

We generate this for rv64gc_zba_zbb_zbs:

> srliw   a1,a0,24
> slliw   a5,a0,24
> slliw   a3,a0,8
> li  a2,16711680
> li  a4,65536
> or  a5,a5,a1
> and a3,a3,a2
> addia4,a4,-256
> srliw   a0,a0,8
> or  a5,a5,a3
> and a0,a0,a4
> or  a0,a5,a0
> retUrgh!

After this patch we generate:

> rev8a0,a0
> sraia0,a0,32
> ret
Clearly better.

The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that 
the
RTL expanders already know how to widen a bswap, which is definitely true.  
But
it's the case that failure to expose a bswapsi will cause the 32bit bswap
optimizations in gimple store merging to not trigger.  Thus we get crappy 
code.

To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging then
detects the 32bit bswap idioms and generates suitable __builtin calls.  The
expander will "FAIL" expansion for TARGET_64BIT which forces the generic
expander code to synthesize the operation (we could synthesize in here, but
that'd result in duplicate code).

Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the 
testsuite
without any regressions.

gcc/
* config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7b55528ee49..1544ef4e125 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -454,7 +454,16 @@ (define_expand "bswapdi2"
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "register_operand")
(bswap:SI (match_operand:SI 1 "register_operand")))]
-  "(!TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)) || TARGET_XTHEADBB")
+  "TARGET_ZBB || TARGET_ZBKB || TARGET_XTHEADBB"
+{
+  /* Expose bswapsi2 on TARGET_64BIT so that the gimple store
+

Re: [PATCH] riscv: xtheadbb: Enable constant synthesis with th.srri

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/5/23 09:42, Christoph Muellner wrote:

From: Christoph Müllner 

Some constants can be built up using rotate-right instructions.
The code that enables this can be found in riscv_build_integer_1().
However, this functionality is only available for Zbb, which
includes the rori instruction.  This patch enables this also for
XTheadBb, which includes the th.srri instruction.

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
synthesis with rotate-right for XTheadBb.

OK
Jeff

Re: [PATCH] MATCH: `(nop_convert)-(convert)a` into -(convert)a if we are converting from something smaller

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/2/23 01:00, Andrew Pinski via Gcc-patches wrote:

This allows removal of one conversion and in the case of booleans, might be 
able to remove
the negate and the other conversion later on.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/107137

gcc/ChangeLog:

* match.pd (`(nop_convert)-(convert)a`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/neg-cast-2.c: New test.
* gcc.dg/tree-ssa/neg-cast-3.c: New test.

OK.
jeff

Re: [PATCH] MATCH: Add `(x | c) & ~(y | c)` and `x & ~(y | x)` patterns [PR98710]

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/3/23 19:25, Andrew Pinski via Gcc-patches wrote:

Adding some more simple bit_and/bit_ior patterns.
How often these show up, I have no idea.

This was tested on top of
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629174.html .

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/98710
* match.pd (`(x | c) & ~(y | c)`, `(x & c) | ~(y & c)`): New pattern.
(`x & ~(y | x)`, `x | ~(y & x)`): New patterns.

gcc/testsuite/ChangeLog:

PR tree-optimization/98710
* gcc.dg/tree-ssa/andor-7.c: New test.
* gcc.dg/tree-ssa/andor-8.c: New test.

OK
jeff

Re: [PATCH] MATCH: Add pattern for `(x | y) & (x & z)`

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/3/23 14:49, Andrew Pinski via Gcc-patches wrote:

Like the pattern already there for `(x | y) & x`,
this adds a simple pattern to optimize `(x | y) & (x & z)`
to just `x & z`.

OK? Bootstrapped and tested on x86-64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/103536
* match.pd (`(x | y) & (x & z)`,
`(x & y) | (x | z)`): New patterns.

gcc/testsuite/ChangeLog:

PR tree-optimization/103536
* gcc.dg/tree-ssa/andor-6.c: New test.
* gcc.dg/tree-ssa/andor-bool-1.c: New test.

OK.
jeff

Re: [PATCH] RISC-V: Emit .note.GNU-stack for non-linux target as well

2023-09-05 Thread Jeff Law via Gcc-patches





On 8/31/23 03:05, Kito Cheng wrote:

We only emit that on linux target before, that not problem before,
however Qemu has fix a bug to make qemu user mode honor PT_GNU_STACK[1],
that will cause problem when we test baremetal with qemu.

So the straightforward is enable that as well for non-linux toolchian,
the price is that will increase few bytes for each binary.

[1] https://github.com/qemu/qemu/commit/872f3d046f2381e3f416519e82df96bd60818311

gcc/ChangeLog:

* config/riscv/linux.h (TARGET_ASM_FILE_END): Move ...
* config/riscv/riscv.cc (TARGET_ASM_FILE_END): to here.

OK.
jeff

Re: [PATCH] MATCH: Add `~MAX(~X, Y)` pattern: [PR96694]

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/3/23 18:21, Andrew Pinski via Gcc-patches wrote:

This adds `~MAX(~X, Y)` and `~MIN(~X, Y)` patterns
that are like the `~(~a & b)` and `~(~a | b)` patterns
and allows to reduce the number of ~ by 1.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/96694

gcc/ChangeLog:

* match.pd (`~MAX(~X, Y)`, `~MIN(~X, Y)`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-24.c: New test.

OK.
jeff

Re: [PATCH] MATCH: Transform `(1 >> X) !=/== 0` into `X ==/!= 0`

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/3/23 10:25, Andrew Pinski via Gcc-patches wrote:

We currently have a pattern for handling `(C >> X) & D == 0`
but if C is 1 and D is 1, the `& 1` might have been removed.

gcc/ChangeLog:

PR tree-optimization/105832
* match.pd (`(1 >> X) != 0`): New pattern

OK
jeff

Re: [PATCH] ssa_name_has_boolean_range vs signed-boolean:31 types

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/1/23 20:32, Andrew Pinski via Gcc-patches wrote:

This turns out to be a latent bug in ssa_name_has_boolean_range
where it would return true for all boolean types but all of the
uses of ssa_name_has_boolean_range was expecting 0/1 as the range
rather than [-1,0].
So when I fixed vector lower to do all comparisons in boolean_type
rather than still in the signed-boolean:31 type (to fix a different issue),
the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which
was signed-boolean:31) had a range of [0,1] which broke down and sometimes
gave us -1/-2 as values rather than what we were expecting of -1/0.

This was the simpliest patch I found while testing.

We have another way of matching [0,1] range which we could use instead
of ssa_name_has_boolean_range except that uses only the global ranges
rather than the local range (during VRP).
I tried to clean this up slightly by using gimple_match_zero_one_valuedp
inside ssa_name_has_boolean_range but that failed because due to using
only the global ranges. I then tried to change get_nonzero_bits to use
the local ranges at the optimization time but that failed also because
we would remove branches to __builtin_unreachable during evrp and lose
information as we don't set the global ranges during evrp.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR 110817

gcc/ChangeLog:

* tree-ssanames.cc (ssa_name_has_boolean_range): Remove the
check for boolean type as they don't have "[0,1]" range.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr110817-1.c: New test.
* gcc.c-torture/execute/pr110817-2.c: New test.
* gcc.c-torture/execute/pr110817-3.c: New test.
I'm a bit surprised this didn't trigger any regressions.  Though maybe 
all the existing testcases were capturing cases where non-boolean types 
were known to have a 0/1 value.



OK.
jeff

Re: [PATCH 3/3] MATCH: Replace all uses of ssa_name_has_boolean_range with zero_one_valued_p

2023-09-05 Thread Jeff Law via Gcc-patches





On 9/2/23 09:09, Andrew Pinski via Gcc-patches wrote:

This replaces all uses of ssa_name_has_boolean_range with zero_one_valued_p
except for the one in the definition of zero_one_valued_p. This simplifies
the code in general and makes only one way of saying we have a range of [0,1].

Note this depends on the patch that adds ssa_name_has_boolean_range usage
to zero_one_valued_p.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Move zero_one_valued_p and truth_valued_p
towards the begnining of the file.
(X / bool_range_Y): Use zero_one_valued_p instead
of ssa_name_has_boolean_range. Move after all other
`X / Y` patterns. Add check to make sure bool_range_Y
is not the literal 0.
(1 - a): Use zero_one_valued_p instead
of ssa_name_has_boolean_range
(-(type)!A): Likewise.

OK.
jeff

Re: [PATCH 2/3] MATCH: Improve zero_one_valued_p by using ssa_name_has_boolean_range

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/2/23 09:09, Andrew Pinski via Gcc-patches wrote:

Currently zero_one_valued_p uses tree_nonzero_bits which uses
the global ranges of the SSA Names. We can improve this via using
ssa_name_has_boolean_range which uses the local ranges
which are used while handling folding during VRP and other passes.

OK? Bootstrapped and tested on x86_64 with no regressions.

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Match SSA_NAMES where
ssa_name_has_boolean_range returns true.
OK.  Presumably we're confident using a context sensitive range is 
always OK in here.


Jeff

Re: [PATCH 1/3] Improve ssa_name_has_boolean_range slightly

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/2/23 09:09, Andrew Pinski via Gcc-patches wrote:

Right now ssa_name_has_boolean_range compares the range to
range_true_and_false but instead we would get the nonzero bits and
compare that to 1 instead (<=u 1).
The nonzerobits comparison can be done in similar fashion.
Note I think get_nonzero_bits is redundant as the range queury will
return a more accurate version or the same value.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssanames.cc (ssa_name_has_boolean_range): Improve
using range's get_nonzero_bits and use `<=u 1`.

OK
jeff

Re: [PATCH] riscv: xtheadcondmov: Don't run tests with -Oz

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/1/23 04:20, Christoph Muellner wrote:

From: Christoph Müllner 

Recently, these xtheadcondmov tests regressed with -Oz:
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
* FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
* FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c

As -Oz stands for "Optimize aggressively for size rather than speed.",
we need to inspect the generated code, which looks like this:

   -Oz
    :
  0:   e199bneza1,6 <.L2>
  2:   40100513li  a0,1025
   0006 <.L2>:
  6:   8082ret

   -O2:
    :
  0:   40100793li  a5,1025
  4:   40b7950bth.mveqza0,a5,a1
  8:   8082ret

As the generated code with -Oz consumes less size, there is nothing
wrong in the code generation. Instead, let's not run the xtheadcondmov
tests with -Oz.

Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c: Disable for -Oz.
* gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c: Likewise.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c: Likewise.

OK
jeff

Re: [PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/1/23 11:30, Andrew Pinski via Gcc-patches wrote:

So it turns out there was a simplier way of starting to
improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
That was rewrite test_for_singularity to use range_op_handler
and Value_Range.

This patch implements that and

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* vr-values.cc (test_for_singularity): Add edge argument
and rewrite using range_op_handler.
(simplify_compare_using_range_pairs): Use Value_Range
instead of value_range and update test_for_singularity call.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp124.c: New test.
* gcc.dg/tree-ssa/vrp125.c: New test.
---



diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 52ab4fe6109..2474e57ee90 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -904,69 +904,33 @@ simplify_using_ranges::simplify_bit_ops_using_ranges
  }
  
  /* We are comparing trees OP1 and OP2 using COND_CODE.  OP1 has

-   a known value range VR.
+   a known value range OP1_RANGE.
  
 If there is one and only one value which will satisfy the

-   conditional, then return that value.  Else return NULL.
-
-   If signed overflow must be undefined for the value to satisfy
-   the conditional, then set *STRICT_OVERFLOW_P to true.  */
+   conditional on the EDGE, then return that value.
+   Else return NULL.  */
  
  static tree

  test_for_singularity (enum tree_code cond_code, tree op1,
- tree op2, const value_range *vr)
+ tree op2, const int_range_max &op1_range, bool edge)
  {
-  tree min = NULL;
-  tree max = NULL;
-
-  /* Extract minimum/maximum values which satisfy the conditional as it was
- written.  */
-  if (cond_code == LE_EXPR || cond_code == LT_EXPR)
+  /* This is already a singularity.  */
+  if (cond_code == NE_EXPR || cond_code == EQ_EXPR)
+return NULL;

I don't think this is necessarily the right thing to do for NE.

Consider if op1 has the range [0,1] and op2 has the value 1.  If the 
code is NE, then we should be able to return a singularity of 0 since 
that's the only value for x where x ne 1 is true given the range for x.




I like what you're trying to do, it just needs a bit of refinement I think.

jeff

Re: [PATCH 1/2] VR-VALUES: Rename op0/op1 to op1/op2 for test_for_singularity

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/1/23 11:30, Andrew Pinski via Gcc-patches wrote:

As requested and make easier to understand with the new ranger
code, rename the arguments op0/op1 to op1/op2.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions

gcc/ChangeLog:

* vr-values.cc (test_for_singularity): Rename
arguments op0/op1 to op1/op2.

OK
jeff

Re: [PATCH v2] RISC-V: zicond: Fix opt2 pattern

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/4/23 20:19, Tsukasa OI wrote:



-FAIL: 30_threads/async/async.cc execution test
+FAIL: gcc.c-torture/execute/pr60003.c   -O1  execution test
+FAIL: gcc.dg/setjmp-3.c execution test
+FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1  execution test
+FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1 -fpic execution test
FWIW, I've got async.cc marked here as flakey when run under QEMU. 
That's also consistent with what I found at a prior employer when 
working on a private GCC port.




Jeff

Re: [PATCH] RISC-V: Fix Zicond ICE on large constants

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/4/23 00:45, Kito Cheng wrote:

Maybe move the check logic a bit forward? My thought is the logic is
already specialized into a few catalogs, (imm, imm), (imm, reg), (reg,
reg)... and the logic you put is already in (imm, reg), but it should
really move into (reg, reg) case IMO? and move that forward we could
prevent add too much logic to redirect the case.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2db9c81ac8b..c84509c393b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3892,6 +3892,12 @@ riscv_expand_conditional_move (rtx dest, rtx
op, rtx cons, rtx alt)
  op1 = XEXP (op, 1);
}

+  /* CONS might not fit into a signed 12 bit immediate suitable
+for an addi instruction.  If that's the case, force it into
+a register.  */
+  if (CONST_INT_P (cons) && !SMALL_OPERAND (INTVAL (cons)))
+   cons = force_reg (mode, cons);
+
   /* 0, reg or 0, imm */
   if (cons == CONST0_RTX (mode)
  && (REG_P (alt)
But for the imm, imm case if we force things into regs too early, then 
we'll lose if alt - cons and cons fit in a 12 bit immediate but alt does 
not.


I think Tsukasa is on the right path here.  I should have checked 
riscv_emit_binary -- I though it handled the out-of-range constant case, 
but looking at it now, it clearly does not.


I think this implies we need a similar blob of code for the imm, imm 
case for cons.


Jeff

Re: [PATCH v2 1/2] strlen: fold strstr() even if the length isn't previously known [PR96601]

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/4/23 14:58, Hamza Mahfooz wrote:

Currently, we give up in fold_strstr_to_strncmp() if the length of the
the second argument to strstr() isn't known to us by the time we hit
that function. However, we can instead insert a strlen() in ourselves
and continue trying to fold strstr() into strlen()+strncmp().

PR tree-optimization/96601

gcc/ChangeLog:

* tree-ssa-strlen.cc (fold_strstr_to_strncmp): If arg1_len == NULL,
insert a strlen() for strstr()'s arg1 and use it as arg1_len.

gcc/testsuite/ChangeLog:

* gcc.dg/strlenopt-30.c: Modify test.
I'm not sure it's necessarily a win to convert to strncmp as 
aggressively as this patch would do.  Essentially when you have large 
needles that are partially matched repeatedly, performance can 
significantly suffer.


If we're going to seriously consider this path, then I'd like to see how 
it performs in general.  The glibc testsuite I think has some 
performance coverage for strstr.


jeff

Re: [PATCH] lra: Avoid unfolded plus-0

2023-09-04 Thread Jeff Law via Gcc-patches





On 8/31/23 09:24, Richard Sandiford via Gcc-patches wrote:

While backporting another patch to an earlier release, I hit a
situation in which lra_eliminate_regs_1 would eliminate an address to:

 (plus (reg:P R) (const_int 0))

This address compared not-equal to plain:

 (reg:P R)

which caused an ICE in a later peephole2.  (The ICE showed up in
gfortran.fortran-torture/compile/pr80464.f90 on the branch but seems
to be latent on trunk.)

These unfolded PLUSes shouldn't occur in the insn stream, and later code
in the same function tried to avoid them.

Tested on aarch64-linux-gnu so far, but I'll test on x86_64-linux-gnu too.
Does this look OK?

There are probably other instances of the same thing elsewhere,
but it seemed safer to stick to the one that caused the issue.

Thanks,
Richard


gcc/
* lra-eliminations.cc (lra_eliminate_regs_1): Use simplify_gen_binary
rather than gen_rtx_PLUS.

OK
jeff

Re: [PATCH] RISC-V: Document some -march special cases

2023-09-04 Thread Jeff Law via Gcc-patches





On 8/29/23 23:52, Kito Cheng wrote:

I would prefer NOT to expose those --param on user manual since
generally those options are used for internal only, we should add -m
option and enable `--param=riscv-autovec-preference=scalable` by
default once we think it's stable enough.

I tend to agree.

The params controlling this stuff are in my mind a stopgap as we work 
through those issues.  Ideally we get to a point where they're no longer 
needed and we just remove them.  THat gets harder if they get exposed in 
a release, even more so if they get documented in the release.


Jeff

Re: [PATCH v2] RISC-V: zicond: Fix opt2 pattern

2023-09-04 Thread Jeff Law via Gcc-patches





On 9/1/23 13:53, Vineet Gupta wrote:

This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since in
failing case, pattern's asm czero.nez gets both rs2 and rs1 as non zero.

We start with the following src code snippet:

   if (a == 0)
return 0;
   else
return x;
 }

which is equivalent to:  "x = (a != 0) ? x : a" where x is NOT 0.
 

and matches define_insn "*czero.nez..opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|(if_then_else:DI (ne (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(reg/v:DI 136 [ x ])
|(reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
 czero.nez x, x, a   ; %0, %2, %1

which implies
 "x = (a != 0) ? 0 : a"

clearly not what the pattern wants to do.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez if X
is not guaranteed to be 0.

However this can be fixed with a small tweak

"x = (a != 0) ? x : a"

is same as

"x = (a == 0) ? a : x" since middle operand is 0 when a == 0.

which can be expressed with CZERO.eqz

before fix  after fix
-   -
lia5,1  lia5,1
lda4,8(sp)  lda4,8(sp)   # a4 is runtime non zero
czero.nez a0,a4,a5 # a0=0 NOK   czero.eqz a0,a4,a5   # a0=a4!=0 OK

The issue only happens at -O1 as at higher optimization levels, the
whole conditional move gets optimized away.

This fixes 4 testsuite failues in a zicond build:

FAIL: gcc.c-torture/execute/pr60003.c   -O1  execution test
FAIL: gcc.dg/setjmp-3.c execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1  execution test
FAIL: gcc.dg/torture/stackalign/setjmp-3.c   -O1 -fpic execution test

gcc/ChangeLog:
* config/riscv/zicond.md: Fix op2 pattern.

Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond")
Signed-off-by: Vineet Gupta 

OK.

Still not sure why I didn't trip over it in my own testing (execute.exp 
runs pr60003.c), but regardless, glad to have it fixed.


jeff

Re: [PATCH] RISC-V: Add Types to Un-Typed Risc-v Instructions:

2023-09-01 Thread Jeff Law via Gcc-patches





On 8/31/23 11:32, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the riscv instructions to ensure that no insn is left
without a type attribute. Added new types: "trap" (self explanatory) and "cbo"
(for cache related instructions)

Tested for regressions using rv32/64 multilib for linux/newlib. Also tested
rv32/64 gcv for linux.

gcc/Changelog:

* config/riscv/riscv.md: Update/Add types

OK.

jeff

Re: [PATCH] Add Types to Un-Typed Pic Instructions:

2023-09-01 Thread Jeff Law via Gcc-patches





On 8/31/23 17:01, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the pic instructions to ensure that no insn is left
without a type attribute.

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/pic.md: Update types

OK.  THanks.
jeff

Re: [PATCH] RISC-V: zicond: remove bogus opt2 pattern

2023-09-01 Thread Jeff Law via Gcc-patches





On 8/31/23 11:57, Vineet Gupta wrote:



On 8/31/23 06:51, Jeff Law wrote:



On 8/30/23 15:57, Vineet Gupta wrote:

This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the
pattern semantics can't be expressed by zicond instructions.

This involves test code snippet:

   if (a == 0)
return 0;
   else
return x;
 }

which is equivalent to:  "x = (a != 0) ? x : a"

Isn't it

x = (a == 0) ? 0 : x

Which seems like it ought to fit zicond just fine.


Logically they are equivalent, but 



If we take yours;

x = (a != 0) ? x : a

And simplify with the known value of a on the false arm we get:

x = (a != 0 ) ? x : 0;

Which is equivalent to

x = (a == 0) ? 0 : x;

So ISTM this does fit zicond just fine.


I could very well be mistaken, but define_insn is a pattern match and 
opt2 has *ne* so the expression has to be in != form and thus needs to 
work with that condition. No ?

My point was  that

x = (a != 0) ? x : 0

is equivalent to

x = (a == 0) ? 0 : x

You can invert the condition and swap the arms and get the same 
semantics.  Thus if one can be supported, so can the other as they're 
functionally equivalent.  It may be the at we've goof'd something in 
handling the inverted case, but conceptually we ought to be able to 
handle both.


I don't doubt you've got a failure, but it's also the case that I'm not 
seeing the same failure when I turn on zicond and run the execute.exp 
tests.  So clearly there's a difference somewhere in what we're doing.


So perhaps we should start with comparing assembly output for the test 
in question.  Can you pass yours along, I'll diff them this afternoon 
and see what we find.


jeff

Re: [PATCH] RISC-V: zicond: remove bogus opt2 pattern

2023-08-31 Thread Jeff Law via Gcc-patches





On 8/30/23 15:57, Vineet Gupta wrote:

This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the
pattern semantics can't be expressed by zicond instructions.

This involves test code snippet:

   if (a == 0)
return 0;
   else
return x;
 }

which is equivalent to:  "x = (a != 0) ? x : a"

Isn't it

x = (a == 0) ? 0 : x

Which seems like it ought to fit zicond just fine.

If we take yours;

x = (a != 0) ? x : a

And simplify with the known value of a on the false arm we get:

x = (a != 0 ) ? x : 0;

Which is equivalent to

x = (a == 0) ? 0 : x;

So ISTM this does fit zicond just fine.








and matches define_insn "*czero.nez..opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|(if_then_else:DI (ne (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(reg/v:DI 136 [ x ])
|(reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
 czero.nez x, x, a   ; %0, %2, %1
implying
 "x = (a != 0) ? 0 : a"

I get this from the RTL pattern:

x = (a != 0) ? x : a
x = (a != 0) ? x : 0

I think you got the arms reversed.






which is not what the pattern semantics are.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez

Agreed, but I think you goof'd earlier :-)


Jeff

Re: [PATCH] RISC-V: Make arch-24.c to test "success" case

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/28/23 21:31, Tsukasa OI wrote:

From: Tsukasa OI 

arch-24.c and arch-25.c are exactly the same and redundant.  The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.

OK
jeff

Re: [PATCH] RISC-V: Added zvfh support for zfa extensions.

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/29/23 01:51, Jin Ma wrote:

This is a follow-up for the zfa extension, added according to the 
recommendations
for zvfh and patch of Tsukasa OI. At the same 
time,
zfa-fli-5.c of which is also based on the patch.

Ref:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627284.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628492.html

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
zvfh can generate zfa extended instruction fli.h, just like zfh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fli-7.c: Change fa0 to fa\[0-9\] to avoid
assigning register numbers that are non-zero.
* gcc.target/riscv/zfa-fli-8.c: Ditto.
* gcc.target/riscv/zfa-fli-5.c: New test.

Thanks.  I pushed this to the trunk.
jeff

Re: [PATCH] RISC-V: Enable movmisalign for VLS modes

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/29/23 07:54, Kito Cheng via Gcc-patches wrote:

+/* To support misalign data movement, we should use
+   minimum element alignment load/store.  */
+unsigned int size = GET_MODE_SIZE (GET_MODE_INNER (mode));
+poly_int64 nunits = GET_MODE_NUNITS (mode) * size;
+machine_mode mode = riscv_vector::get_vector_mode (QImode, nunits).require 
();
+operands[0] = gen_lowpart (mode, operands[0]);
+operands[1] = gen_lowpart (mode, operands[1]);
+if (MEM_P (operands[0]) && !register_operand (operands[1], mode))
+  operands[1] = force_reg (mode, operands[1]);


Does force_reg safe for movmisalign?
It should be.  It's a pretty common idiom.  Essentially it's going to 
result in generating this for the MEM->MEM case:


MEM->REG
REG->MEM


Both of which are likely to go through the misalign expander.

I was about to ACK when I had to leave for a few minutes.

OK for the trunk.

jeff

Re: [PATCH] RISC-V: Remove movmisalign pattern for VLA modes

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/29/23 03:39, Juzhe-Zhong wrote:

This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test

Spike report:
z   ra 000100f4 sp 003ffb30 gp 00012cc8
tp  t0 000102d4 t1 000f t2 
s0  s1  a0 000101a6 a1 0008
a2 0010 a3 00012401 a4 00012480 a5 0020
a6 001f a7 00d6 s2  s3 
s4  s5  s6  s7 
s8  s9  sA  sB 
t3  t4  t5  t6 
pc 000101ec va/inst 0206dc07 sr 80026620
Load access fault!

(spike)
core   0: 0x00010204 (0x02065087) vle16.v v1, (a2)
core   0: exception trap_load_address_misaligned, epc 0x00010204
core   0:   tval 0x00012c81
(spike) reg 0 a2
0x00012c81

According to RVV ISA, we couldn't use "vle16.v" if the address is byte align.

Such issue is caused by this GIMPLE IR:

vect__1.15_17 = .MASK_LEN_LOAD (vectp_t.13_15, 8B, { -1, ... }, _24, 0);

For partial vectorization, the alignment is "8B" byte align here is incorrect 
here.

After this patch, the vectorization failed:

sll a5,a4,0x1
add a5,a5,a1
lhu a3,64(a5)
lbu a5,66(a5)
addwa4,a4,1
srl a3,a3,0x8
sll a5,a5,0x8
or  a5,a5,a3
sh  a5,0(a2)
add a2,a2,2
bne a4,a0,101f8 

I will enable auto-vectorization in another approach in the next following 
patch.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign): Delete.

OK.
jeff

Re: [PATCH 1/1] RISC-V: Make "prefetch.i" built-in usable

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/28/23 20:09, Tsukasa OI wrote:

On 2023/08/29 6:20, Jeff Law wrote:



On 8/9/23 21:10, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

The "__builtin_riscv_zicbop_cbo_prefetchi" built-in function was terribly
broken so that practically unusable.  It emitted "prefetch.i" but with no
meaningful arguments.

Though incompatible, this commit completely changes the function
prototype
of this built-in and makes it usable.  To minimize the functionality
issues,
it renames the built-in to "__builtin_riscv_zicbop_prefetch_i".

gcc/ChangeLog:

 * config/riscv/riscv-cmo.def: Fix function prototype.
 * config/riscv/riscv.md (riscv_prefetchi_): Fix instruction
 prototype.  Remove possible prefectch type argument
 * doc/extend.texi: Document __builtin_riscv_zicbop_prefetch_i.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/cmo-zicbop-1.c: Reflect new built-in prototype.
 * gcc.target/riscv/cmo-zicbop-2.c: Likewise.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688fd697255b..5658c7b7e113 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3273,9 +3273,8 @@
   })
     (define_insn "riscv_prefetchi_"
-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
-  (match_operand:X 1 "imm5_operand" "i")]
-  UNSPECV_PREI)]
+  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
+    UNSPECV_PREI)]
     "TARGET_ZICBOP"
     "prefetch.i\t%a0"
   )

What I would suggest is making a new predicate that accepts either a
register or a register+offset where the offset fits in a signed 12 bit
immediate.  Use that for operand 0's predicate and I think this will
"just work" and cover all the cases supported by the prefetch.i
instruction.

Jeff



Seems reasonable.

If we have to break the compatibility anyway, adding an offset argument
is not a bad idea (though I think they will use inline assembly if a
non-zero offset is required).

I will try to add *optional* offset argument (with default value 0) like
  __builtin_speculation_safe_value built-in function in the next version.
The reason I suggested creating a predicate which allowed either a reg 
or reg+offset was to give that level of flexibility.


The inline assembly would specify an address.  If the address was 
already in a register, great.  If the address is expressable as 
reg+offset, also great.  If the address was a symbolic operand, the 
right predicate+constraint should force the symbolic into a register.


Jeff

Re: [PATCH v2] RISC-V: Make PR 102957 tests more comprehensive

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/28/23 21:28, Tsukasa OI wrote:

From: Tsukasa OI 

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.

OK
jeff

Re: [PATCH v3 3/3] RISC-V: Add stub support for existing extensions (unprivileged)

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/28/23 21:39, Tsukasa OI wrote:

From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

OK.
jeff

Re: [PATCH v3 2/3] RISC-V: Add stub support for existing extensions (vendor)

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/28/23 21:39, Tsukasa OI wrote:

From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.

OK.
jeff

Re: [PATCH v3 1/3] RISC-V: Add stub support for existing extensions (privileged)

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/28/23 21:39, Tsukasa OI wrote:

From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.

OK
jeff

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-29 Thread Jeff Law via Gcc-patches





On 8/29/23 01:41, Richard Biener wrote:


 _1 = a * b;
 _2 = .FMA (c, d, _1);
 acc_1 = acc_0 + _2;


How can we execute the multiply and the FMA in parallel?  They
depend on each other.  Or is it the uarch can handle dependence
on the add operand but only when it is with a multiplication and
not a FMA in some better ways?  (I'd doubt so much complexity)
I've worked on an architecture that could almost do that.The ops 
didn't run in parallel, but instead serially as "chained" FP ops.


Essentially in cases where you could chain them they become a single 
instruction.  These were fully piped, thus issuing every cycle.  Latency 
was 1c faster than if you'd issued the ops as distinct instructions. 
More importantly, by combining the two FP ops into a single instruction 
you could issue more FP ops/cycle which significantly helps many FP 
codes.  It's safe to assume this required notable additional FP 
hardware, but it's something we already had in the design for other 
purposes.


I keep hoping that architecture becomes public.  There were some other 
really interesting features in the design that could be incorporated 
into other designs with minimal hardware cost.






Can you explain in more detail how the uarch executes one vs. the
other case?

Probably can't say more than I already have.

Anyway, given the architecture in question is still private and no 
longer something I have to champion, if you want to move forward with 
with the patch, I won't object.


jeff

Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/28/23 19:28, Fei Gao wrote:

On 2023-08-29 06:54  Jeff Law  wrote:




On 8/28/23 01:47, Fei Gao wrote:

no functional changes but allow targets to check shrink-wrap-separate enabled 
or not.

     gcc/ChangeLog:

   * shrink-wrap.cc (try_shrink_wrapping_separate):call
     use_shrink_wrapping_separate.
   (use_shrink_wrapping_separate): wrap the condition
     check in use_shrink_wrapping_separate.
   * shrink-wrap.h (use_shrink_wrapping_separate): add to extern

So as I mentioned earlier today in the older thread, can we use
override_options to do this?

If we look at aarch64_override_options we have this:

    /* The pass to insert speculation tracking runs before
   shrink-wrapping and the latter does not know how to update the
   tracking status.  So disable it in this case.  */
    if (aarch64_track_speculation)
  flag_shrink_wrap = 0;

We kind of want this instead

    if (flag_shrink_wrap)
  {
    turn off whatever target bits enable the cm.push/cm.pop insns
  }


This does imply that we have a distinct target flag to enable/disable
those instructions.  But that seems like a good thing to have anyway.

I'm afraid we cannot simply resolve the confilict based on
flag_shrink_wrap/flag_shrink_wrap_separate only, as they're set true from -O1 
onwards,
which means zcmp is disabled almostly unless 
-fno-shrink-warp/-fno-shrink-warp-separate
are explictly given.
Yea, but I would generally expect that if someone is really concerned 
about code size, they're probably using -Os which (hopefully) would not 
have shrink-wrapping enabled.




So after discussion with Kito, we would like to turn on zcmp for -Os and 
shrink-warp-separate
for the speed perfered optimization. use_shrink_wrapping_separate in this patch 
provide the
chance for this check. No new hook is needed.

Seems reasonable to me if Kito is OK with it.

jeff

Re: [PATCH v2 3/3] RISC-V: Add stub support for existing extensions (unprivileged)

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/14/23 00:09, Tsukasa OI wrote:

From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects, GCC should
support almost all extensions that Binutils support, even if the GCC does
not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

This series (most likely patch 3/3) seems to break arch-24.c and arch-25.c.

Please fix and post a V3.

Jeff

Re: [PATCH] mklog: fix bugs of --append option

2023-08-28 Thread Jeff Law via Gcc-patches





On 7/19/23 02:21, Lehua Ding wrote:

Hi,

This little patch fix two bugs of mklog.py with --append option.
The first bug is that the regexp used is not accurate enough to
determine the top of diff area. The second bug is that if `---`
is not a true start, it needs to be added back to the patch file.

contrib/ChangeLog:

* mklog.py: Fix regexp and add missed `---`

OK.  Sorry for the delay.
jeff

Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/24/23 08:06, Robin Dapp via Gcc-patches wrote:

Ping.  I refined the code and some comments a bit and added a test
case.

My question in general would still be:  Is this something we want
given that we potentially move some of combine's work a bit towards
the front of the RTL pipeline?

Regards
  Robin

Subject: [PATCH] fwprop: Allow UNARY_P and check register pressure.

This patch enables the forwarding of UNARY_P sources.  As this
involves potentially replacing a vector register with a scalar register
the ira_hoist_pressure machinery is used to calculate the change in
register pressure.  If the propagation would increase the pressure
beyond the number of hard regs, we don't perform it.

gcc/ChangeLog:

* fwprop.cc (fwprop_propagation::profitable_p): Add unary
handling.
(fwprop_propagation::update_register_pressure): New function.
(fwprop_propagation::register_pressure_high_p): New function
(reg_single_def_for_src_p): Look through unary expressions.
(try_fwprop_subst_pattern): Check register pressure.
(forward_propagate_into): Call new function.
(fwprop_init): Init register pressure.
(fwprop_done): Clean up register pressure.
(fwprop_insn): Add comment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c: New test.
So I was hoping that Richard S. would chime in here as he knows this 
code better than anyone.


This looks like a much better implementation of something I've done 
before :-)  Basically imagine a target where a sign/zero extension can 
be folded into arithmetic for free.  We put in various hacks to this 
code to encourage more propagations of extensions.


I still think this is valuable.  As we lower from gimple->RTL we're 
going to still have artifacts in the RTL that we're going to want to 
optimize away.  fwprop has certain advantages over combine, including 
the fact that it runs earlier, pre-loop.



It looks generally sensible to me.  But give Richard S. another week to 
chime in.  He seems to be around, but may be slammed with stuff right now.


jeff

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:

This patch tries to fix the 2% regression in 510.parest_r on
ampere1 in the tracker. (Previous discussion is here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)

1. Add testcases for the problem. For an op list in the form of
"acc = a * b + c * d + acc", currently reassociation doesn't
Swap the operands so that more FMAs can be generated.
After widening_mul the result looks like:

_1 = .FMA(a, b, acc_0);
acc_1 = .FMA(c, d, _1);

While previously (before the "Handle FMA friendly..." patch),
widening_mul's result was like:

_1 = a * b;
_2 = .FMA (c, d, _1);
acc_1 = acc_0 + _2;

If the code fragment is in a loop, some architecture can execute
the latter in parallel, so the performance can be much faster than
the former. For the small testcase, the performance gap is over
10% on both ampere1 and neoverse-n1. So the point here is to avoid
turning the last statement into FMA, and keep it a PLUS_EXPR as
much as possible. (If we are rewriting the op list into parallel,
no special treatment is needed, since the last statement after
rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)

2. Function result_feeds_back_from_phi_p is to check for cross
backedge dependency. Added new enum fma_state to describe the
state of FMA candidates.

With this patch, there's a 3% improvement in 510.parest_r 1-copy
run on ampere1. The compile options are:
"-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".

Best regards,
Di Zhao



 PR tree-optimization/110279

gcc/ChangeLog:

 * tree-ssa-reassoc.cc (enum fma_state): New enum to
 describe the state of FMA candidates for an op list.
 (rewrite_expr_tree_parallel): Changed boolean
 parameter to enum type.
 (result_feeds_back_from_phi_p): New function to check
 for cross backedge dependency.
 (rank_ops_for_fma): Return enum fma_state. Added new
 parameter.
 (reassociate_bb): If there's backedge dependency in an
 op list, swap the operands before rewrite_expr_tree.

gcc/testsuite/ChangeLog:

 * gcc.dg/pr110279.c: New test.
Not a review, but more of a question -- isn't this transformation's 
profitability uarch sensitive.  ie, just because it's bad for a set of 
aarch64 uarches, doesn't mean it's bad everywhere.


And in general we shy away from trying to adjust gimple code based on 
uarch preferences.


It seems the right place to do this is gimple->rtl expansion.

Jeff

Re: [RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/28/23 17:09, Hans-Peter Nilsson wrote:

On Mon, 28 Aug 2023, Jeff Law via Gcc-patches wrote:



On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

I've pushed this to the trunk.


I randomly noticed a typo: "hart", perhaps for "part"?
Not sure though.

Not a typo.  "hart" has a well defined meaning in the risc-v world.

Thanks,
jeff

Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/28/23 01:47, Fei Gao wrote:

no functional changes but allow targets to check shrink-wrap-separate enabled 
or not.

   gcc/ChangeLog:

 * shrink-wrap.cc (try_shrink_wrapping_separate):call
   use_shrink_wrapping_separate.
 (use_shrink_wrapping_separate): wrap the condition
   check in use_shrink_wrapping_separate.
 * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
So as I mentioned earlier today in the older thread, can we use 
override_options to do this?


If we look at aarch64_override_options we have this:

  /* The pass to insert speculation tracking runs before
 shrink-wrapping and the latter does not know how to update the
 tracking status.  So disable it in this case.  */
  if (aarch64_track_speculation)
flag_shrink_wrap = 0;

We kind of want this instead

  if (flag_shrink_wrap)
{
  turn off whatever target bits enable the cm.push/cm.pop insns
}


This does imply that we have a distinct target flag to enable/disable 
those instructions.  But that seems like a good thing to have anyway.


jeff

Re: [PATCH V3] riscv: generate builtin macro for compilation with strict alignment:

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/15/23 12:29, Edwin Lu wrote:

This patch is a modification of
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html
following the discussion on
https://github.com/riscv-non-isa/riscv-c-api-doc/issues/32

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
  Generate __riscv_unaligned_avoid with value 1 or
  __riscv_unaligned_slow with value 1 or
  __riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override):
 Define riscv_user_wants_strict_align. Set
 riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
 __riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
 __riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
 __riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.
OK.  Though I'm pretty sure the commit hooks are going to complain about 
your ChangeLog :-)


jeff

Re: [PATCH 1/2] allow target to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches





On 6/25/23 20:29, Fei Gao wrote:

hi Jeff

Please see my earlier reply here.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg310656.html

Maybe you scrolled past it in so many emails:)
Oh, so the issue isn't really the set of components being wrapped, but 
the way in which we save them.  Yea, that's going to need some tinkering.


It does make me wonder if we can handle this in riscv_override_options. 
That's a pretty standard place to deal with option conflicts.  We ought 
to be able to check if both options are enabled, then disable zcmp 
push/pop at that poing without introducing any new hooks.



jeff

Re: [PATCH 1/2] allow target to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches





On 6/25/23 20:29, Fei Gao wrote:

hi Jeff

Please see my earlier reply here.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg310656.html

Maybe you scrolled past it in so many emails:)

It definitely got lost in my mountain of mail.

jeff

Re: [PATCH] RISC-V: Revive test case PR 102957

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/11/23 08:29, Tsukasa OI wrote:

On 2023/08/11 23:15, Jeff Law wrote:






Originally, it tested that a two letter extension ('Zb') is accepted by
GCC (because the background of PR 102957 was GCC assumed multi-letter
'Z' extensions are three letters or more).

After rejecting unrecognized extensions, "dg-error" is added **just to
avoid the test failure** and that doesn't look right.  Yes, we now don't
have an ICE (like in the original report) but after the PR 102957 fix,
we just accepted it, not rejecting it.

Instead, we have a valid (recognized) two-letter 'Z' extension: 'Zk'.  I
think replacing "zb" with "zk" is more correct considering the original
bug report (PR 102957) and its assumption.

cf. 

Thanks.  It still seems to me we want to  have two tests here.

I would suggest leaving pr102957.c alone since that tests that we give a 
proper error for "zb".  Then create a new test that verifies "zk" is 
accepted without error.


jeff

Re: [PATCH] RISC-V: Add Types to Un-Typed Vector Instructions:

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/28/23 13:03, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates vector instructions to ensure that no insn is left
without a type attribute. Creates a placeholder type "vector" for insns
where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

OK
jeff

Re: [PATCH V2] RISC-V: Fix error combine of pred_mov pattern

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/10/23 06:21, Lehua Ding wrote:


+;; vle.v/vse.v,vmv.v.v
+(define_insn_and_split "*pred_mov"
+  [(set (match_operand:V_VLS 0 "nonimmediate_operand""=vr,vr,
vd, m,vr,vr")
+(if_then_else:V_VLS
+  (unspec:
+[(match_operand: 1 "vector_mask_operand"   "vmWc1,   Wc1,
vm, vmWc1,   Wc1,   Wc1")
+ (match_operand 4 "vector_length_operand"  "   rK,rK,
rK,rK,rK,rK")
+ (match_operand 5 "const_int_operand"  "i, i, 
i, i, i, i")
+ (match_operand 6 "const_int_operand"  "i, i, 
i, i, i, i")
+ (match_operand 7 "const_int_operand"  "i, i, 
i, i, i, i")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (match_operand:V_VLS 3 "reg_or_mem_operand"  "m, m, 
m,vr,vr,vr")
+  (match_operand:V_VLS 2 "vector_merge_operand""0,vu,
vu,vu,vu, 0")))]
+  "TARGET_VECTOR && (register_operand (operands[0], mode)
+ || register_operand (operands[3], mode))"

Just a formatting nit in the pattern's condition.

"(TARGET_VECTOR
  && (register_operand (operands[0], mode)
  || register_operand (operands[3], mode)))"

OK with that change.  No need to wait for another approval.  Just update 
the patch, commit and post the committed patch to the list for archival 
purposes.


Thanks, and sorry for the long wait.  I just get busy sometimes.

jeff

Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-28 Thread Jeff Law via Gcc-patches

On 8/11/23 10:30, Lehua Ding wrote:

 > But combine doesn't run at -O0.  So something is inconsistent.  I
 > certainly believe we need to avoid the mem->mem case, but that's
 > independent of combine and affects all optimization levels.

This is an new bug when running all tests after fixing the combine bug.

OK.  I must have misunderstood.   Thanks for clarifying.

 > I think we can simplify to just
 > !(MEM_P (operands[0]) && MEM_P (operands[1])

 > I would have expected those to be handled by the constraints rather than
 > the pattern's condition.

Yeh, the condition of the V2 becomes much simpler after split.
That was the hope.  It is worth noting that for simple moves eg movsi, 
movdi, movsf, movdf, etc there is a requirement that a single insn 
support all the valid combinations.  But I don't think we've ever had 
that requirement for vector modes and the situations where it's 
important are much less likely to trigger for vector moves.  Even more 
so given how the cond_mov patterns are implemented for RISC-V.

Jeff

Re: [PATCH 1/1] RISC-V: Make "prefetch.i" built-in usable

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/9/23 21:10, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

The "__builtin_riscv_zicbop_cbo_prefetchi" built-in function was terribly
broken so that practically unusable.  It emitted "prefetch.i" but with no
meaningful arguments.

Though incompatible, this commit completely changes the function prototype
of this built-in and makes it usable.  To minimize the functionality issues,
it renames the built-in to "__builtin_riscv_zicbop_prefetch_i".

gcc/ChangeLog:

* config/riscv/riscv-cmo.def: Fix function prototype.
* config/riscv/riscv.md (riscv_prefetchi_): Fix instruction
prototype.  Remove possible prefectch type argument
* doc/extend.texi: Document __builtin_riscv_zicbop_prefetch_i.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-1.c: Reflect new built-in prototype.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688fd697255b..5658c7b7e113 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3273,9 +3273,8 @@
  })
  
  (define_insn "riscv_prefetchi_"

-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
-  (match_operand:X 1 "imm5_operand" "i")]
-  UNSPECV_PREI)]
+  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
+UNSPECV_PREI)]
"TARGET_ZICBOP"
"prefetch.i\t%a0"
  )
What I would suggest is making a new predicate that accepts either a 
register or a register+offset where the offset fits in a signed 12 bit 
immediate.  Use that for operand 0's predicate and I think this will 
"just work" and cover all the cases supported by the prefetch.i instruction.


Jeff

Re: [RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

I've pushed this to the trunk.
jeff

Re: [RFC PATCH 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
(in the assembler).  However, GCC emits "pause" unconditionally, making
an assembler error while compiling code with __builtin_riscv_pause while
the 'Zihintpause' extension disabled.

However, the "pause" instruction code (0x010f) is a HINT and emitting
its instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x010f" instead of "pause" only if the extension is disabled
(making the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Implement the 'Zihintpause' extension,
version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause" and "hint_pause_pseudo", corresponding
the existence of the 'Zihintpause' extension.
(riscv_builtins) Split builtin implementation depending on the
existence of the 'Zihintpause' extension.
* config/riscv/riscv-opts.h
(MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Make it only available when
the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
"pause" implementation when the 'Zihintpause' extension is disabled.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause.c:
New test when the 'Zihintpause' extension is enabled.
* gcc.target/riscv/zihintpause-noarch.c:
New test when the 'Zihintpause' extension is disabled.
So I cleaned this up a bit per the list discussion and pushed the final 
result.  Hopefully I didn't goof anything too badly ;-)  The net is we 
emit "pause" or a suitable .insn based on TARGET_ZIHINTPAUSE.


Jeffcommit c2d04dd659c499d8df19f68d0602ad4c7d7065c2
Author: Tsukasa OI 
Date:   Mon Aug 28 15:04:13 2023 -0600

RISC-V: __builtin_riscv_pause for all environment

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension (in
the assembler).  However, GCC emits "pause" unconditionally, making an
assembler error while compiling code with __builtin_riscv_pause while the
'Zihintpause' extension disabled.

However, the "pause" instruction code (0x010f) is a HINT and emitting 
its
instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x010f" instead of "pause" only if the extension is disabled 
(making
the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Implement the 'Zihintpause' extension, version 2.0.
(riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause".
(riscv_builtins) : Add "pause" extension.
* config/riscv/riscv-opts.h (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): 
New.
* config/riscv/riscv.md (riscv_pause): Adjust output based on
TARGET_ZIHINTPAUSE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c: New test when the 'Zihintpause'
extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c: New test when the 
'Zihintpause'
extension is disabled.

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 128a7020172..a5b62cda3a0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -224,6 +224,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zihintntl", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zihintpause", ISA_SPEC_CLASS_NONE, 2, 0},
 
   {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1381,6 +1382,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
 
   {"zihintntl", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTNTL},
+  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
 
   {"zicboz", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOZ},
   {"zicbom", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOM},
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 79681d75962..8afe7b7e97d 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -122,7 +122,7 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || TARGET_ZBC) 
&& !TARGET_64BIT)

Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Jeff Law via Gcc-patches





On 8/28/23 08:27, Robin Dapp via Gcc-patches wrote:

LGTM from my side, but I would like to wait Robin is ok too


In principle I'm OK with it as well, realizing we will still need to fine-tune
a lot here anyway.  For now, IMHO it's good to have some additional test 
coverage
in the vector space but we should not expect every test to be correct/a good 
match
for everything we do yet.  Juzhe mentioned he doesn't want to commit this before
all/most bugs are addresses anyway, right?
No strong opinions on my side.  We could enable now knowing there's 
failures and that list of failures becomes a TODO list.  Or we could 
wait for more to be working before committing to keep our test results 
reasonably clean.


I could make an argument for either direction, but since Juzhe & Robin 
are doing most of the autovec work, happy to go with whatever they prefer.


jeff

Re: [PATCH v10] RISC-V: Add support for the Zfa extension

2023-08-25 Thread Jeff Law via Gcc-patches





On 8/13/23 23:50, Jin Ma wrote:

This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb

The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html

What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally 
to
   accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
   is required.

2, The instructions FMINM and FMAXM correspond to C23 library function fminimum 
and fmaximum.
   Therefore, this patch has simply implemented the pattern of fminm3 
and
   fmaxm3 to prepare for later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension version, which 
depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point 
number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, 
memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no 
split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D 
immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm3): New.
(fmaxm3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f_quiet4_zfa): New.
* config/riscv/riscv.opt: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
Thanks.  I added Tsukasa's testcases, except for zfa-fli-5.c which 
depends on sorting out the question around zvfh to your patch.


If you and Tsukasa could sort out the zvfh situation and submit a 
follow-up patch (if needed) it would be greatly appreciated.


I've pushed this to the trunk.

Thanks for your patience,
Jeff

Re: [2/2] RISC-V: Constant FP Optimization with 'Zfa'

2023-08-25 Thread Jeff Law via Gcc-patches





On 8/14/23 06:51, Jin Ma wrote:



This code is great and completely different from the way I implemented it.
I'm not sure which one is better, but my idea is that the fli instruction
corresponds to three tables (HF, SF and DF), all of which represent
specific values. the library in gcc's real.h can very well convert
the corresponding values into the values in the table, so it is only
necessary to perform a simple binary search to look up the tables.
Yea, I was kindof amazed at how Tsukasa implemented that code.  But I 
think the tables are easier to understand, so I'd tend to prefer them.


I'm still evaluating, but in general it looks like your implementation 
is (functionally) a superset of what Tsukasa has done.  I've still got 
some testing to do with Tsukasa's tests to verify, but my inclination is 
to go with your v10 patch right now.


Jeff

Re: [PATCH 1/2] RISC-V: Add support for the 'Zfa' extension

2023-08-25 Thread Jeff Law via Gcc-patches





On 8/13/23 23:32, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

This commit adds support for the 'Zfa' extension containing additional
floating point instructions, version 0.1 (stable and approved).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implication 'Zfa' -> 'F'.
(riscv_ext_version_table) Add support for the 'Zfa' extension.
(riscv_ext_flag_table) Set MASK_ZFA if 'Zfa' is available.
* config/riscv/riscv-opts.h (MASK_ZFA, TARGET_ZFA): New.
So I think this and Jin Ma's most recently posted Zfa bits are almost 
functionally equivalent.  The only notable difference in this patch is 
Jin's work puts Zfa into its own subextension rather than in the 
existing zf subextension.


I think that was done in the v10 patch from Jin in response to the 
implies/depends comment from Kito.  So I'm inclined to say that's the 
preferred approach.


Given that's the only notable difference between this patch and Jin's 
patch, I'm going to consider 1/2 in this series superseded by Jin's work.


jeff

Re: [PATCH V2] RISC-V: Add Types to Un-Typed Sync Instructions:

2023-08-25 Thread Jeff Law via Gcc-patches





On 8/24/23 15:19, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the sync instructions to ensure that no insn is left
without a type attribute. Updates a total of 6 insns to have type "atomic"

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/sync-rvwmo.md: updated types to "multi" or
 "atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

OK.

You should have your write access set up already.  So go ahead and 
follow the directions in the email you received to get yourself into the 
MAINTAINERS file.  Then you can go ahead and push this change.


THanks,
Jeff

Re: [PATCH v2] RISC-V: Enable Hoist to GCSE simple constants

2023-08-25 Thread Jeff Law via Gcc-patches





On 8/24/23 23:16, Vineet Gupta wrote:

Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 39).

To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.

gcc/ChangeLog:
* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.
Add some comments about different constants handling.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
  for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

OK.

jeff

Re: RISC-V: Fix stack_save_restore_1/2 test cases

2023-08-25 Thread Jeff Law via Gcc-patches

On 8/24/23 09:45, Jivan Hakobyan via Gcc-patches wrote:

Subject:
RISC-V: Fix stack_save_restore_1/2 test cases
From:
Jivan Hakobyan via Gcc-patches 
Date:
8/24/23, 09:45

To:
GCC Patches , Jeff Law 

This patch fixes failing stack_save_restore_1/2 test cases.
After 6619b3d4c15c commit size of the frame was changed.

gcc/testsuite/ChangeLog:
 * gcc.target/riscv/stack_save_restore_1.c: Update frame size
 * gcc.target/riscv/stack_save_restore_2.c: Likewise.
Rather than use specific values for the size of the stack in this test, 
can we match something a little more general so that we're not 
constantly having to come back and adjust the stack offset?

I'm not real familiar with the check-function-bodies capabilities, but I 
suspect we can probably use a regexp like [0-9]+ rather than 2016, 2032, 
etc.

Jeff

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1111 matches

Mail list logo