> +/* We can't enable FP16 NEG/PLUS/MINUS/MULT/DIV auto-vectorization when
> -march="*zvfhmin*". */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0
> "vect" } } */
Thanks. OK from my side.
Regards
Robin
Hi Juzhe,
no complaints here. Just please make sure you add the commit
message or something related as top comment to the test when
committing.
Somebody who reads the test is not going to want to lookup
the commit message to know what's going on.
Regards
Robin
> I think it shouldn't be with vec_set patch.
> Instead, it obviously should be the separate patch.
Yes, I didn't mean in the actual same patch.
Regards
Robin
ec_set patch. I think the alignment helps
a bit with readability.
>From 147a459dfbf1fe9d5dd93148f475f42dee3bd94b Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Tue, 6 Jun 2023 17:29:26 +0200
Subject: [PATCH] RISC-V: Change V_WHOLE iterator to properly match
instruction.
Currently we emit e.g. a
> +rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
> +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus
> (, mode),
> +riscv_vector::RVV_WIDEN_TERNOP, ops);
ops is still there ;) No need for another revision
> These enhance patterns are generated in complicate combining situations.
Yes, that's clear. One strategy is to look through combine's output and
see which combination results make sense for a particular backend.
I was wondering where the unspec-less patterns originate (when we
expand
Hi Juzhe,
just one/two really minor nits.
> +rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
> +riscv_vector::emit_vlmax_ternary_insn (code_for_pred_widen_mul_plus
> (, mode),
> +riscv_vector::RVV_WIDEN_TERNOP, ops);
Here and in
>>> I like the code examples in general but find them hard to read
>>> at lengths > 5-10 or so. Could we condense this a bit?
> Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?
Sure, just condense a bit. No need for V2.
Regards
Robin
Hi Juzhe,
> ...
>vsetvli zero,t1,e8,m1,ta,ma
> vle8.v v1,0(a4)
> vsetvli t3,zero,e16,m2,ta,ma
> vsext.vf2 v6,v1
> vsetvli zero,t1,e8,m1,ta,ma
> vle8.v v1,0(a5)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a0,t4
>
Hi Juzhe,
thanks looks pretty comprehensive already.
> +(define_expand "vec_perm"
> + [(match_operand:V 0 "register_operand")
> + (match_operand:V 1 "register_operand")
> + (match_operand:V 2 "register_operand")
> + (match_operand: 3 "vector_perm_operand")]
> + "TARGET_VECTOR &&
Hi Juzhe,
> The approach is quite simple and obvious, changing extension pattern
> into define_insn_and_split will make combine PASS combine into widen
> operations naturally.
looks good to me. Tiny nit: I would add a comment above the patterns
to clarify why insn_and_split instead of expand.
Hi,
I figured I'd send this patch that I quickly hacked together some
days back. It's likely going to be controversial because we don't
have vector costs in place at all yet and even with costs it's
probably debatable as the emitted sequence is longer :)
I'm willing to defer or ditch it
>>> but ideally the user would be able to specify -mrvv-size=32 for an
>>> implementation with 32 byte vectors and then vector lowering would make use
>>> of vectors up to 32 bytes?
>
> Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on
> GNU vectors.
> You can take a
Hi Kito,
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
>
> The idea of VLS code gen support is emulate VLS
Hi,
as we can always broadcast an integer constant to a vector register
allow them in riscv_const_insns. We need as many instructions as
it takes to generate the constant and one vmv.vx.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Allow
Hi Juzhe,
>>> Can you explain these two points (3 and 4, maybe 2) a bit in the comments?
>>> I.e. what makes fma different from a normal insn?
> You can take a lookt at vector.md. The ternary instruction pattern has
> operands[0] operands[1] operands[2] operands[3] operands[4] operands[5] :
>
>
Hi Juzhe,
> +;; We can't expand FMA for the following reasons:
But we do :) We just haven't selected the proper alternative yet.
> +;; 1. Before RA, we don't know which multiply-add instruction is the ideal
> one.
> +;;The vmacc is the ideal instruction when operands[3] overlaps
>
> I realize that both TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES and
> TARGET_VECTORIZE_RELATED_MODE will partially enable some
> auto-vectorization even preferred_simd_mode does not enable
> auto-vectorization when we don't specify
> --param=riscv-autovec-preference.
>
> So plz add
Hi,
> This patch would like to remove the magic number in the riscv-v.cc, and
> align the same value to one macro.
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 458020ce0a1..20b589bf51b 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++
> Beside, V2 patch should change this:
> emit_vlmax_masked_insn (unsigned icode, int op_num, rtx *ops)
>
> change it into emit_vlmax_masked_mu_insn .
V3 is inline with these changes.
This patch implements abs2, vneg2 and vnot2 expanders
for integer vector registers and adds tests for them.
> I think it's logically incorrect. For ABS, you want:
>
> operands[0] = operads[1] > 0 ? operands[1] : (-operands[1])
> So you should do this following sequence:
>
> vmslt v0,v1,0
> vneg v1,v1v0.t (should use Mask undisturbed)
Yes, this is the emitted sequence, but the vsetvli mask is indeed
Hi,
this patch implements abs2, vneg2 and vnot2 expanders
for integer vector registers and adds tests for them.
v2 is rebased against Juzhe's latest refactoring.
Regards
Robin
gcc/ChangeLog:
* config/riscv/autovec.md (2): Add vneg/vnot.
(abs2): Add.
*
Hi Juzhe,
> use riscv_v_ext_vector_mode_p instead since riscv_v_ext_mode_p includes
> tuple modes.
> You should not use tuple modes in related_mode. Tuple modes will be used in
> array mode target hook and
> used by vec_load_lanes/vec_store_lanes.
Ah, thanks for catching this. Yes,
Hi,
this patch implements the autovec expanders for sign and zero extension
patterns as well as the accompanying truncations. In order to use them
additional mode_attr iterators as well as vectorizer hooks are required.
Using these hooks we can e.g. vectorize with VNx4QImode as base mode
and
>>> Don't you want to use your shiny new operand passing style here as
>>> with the other expanders?
> H, I do this just following ARM code style.
> You can see I do pass rtx[] for expand_vcond and pass rtx,rtx,rtx for
> expand_vec_cmp.
> Well, I just follow ARM SVE implementation (You can
> +(define_expand "vec_cmp"
> + [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> + [(match_operand:VI 2 "register_operand")
> +(match_operand:VI 3 "register_operand")]))]
> + "TARGET_VECTOR"
> + {
> +riscv_vector::expand_vec_cmp
Hi Juzhe,
thanks, IMHO it's clearer with the changes now. There are still
things that could be improved but it is surely an improvement over
what we currently have. Therefore I'd vote to go ahead so we can
continue with more expanders and changes.
Still, we should be prepared for more
Hi Juzhe,
in general I find the revised structure quite logical and it is definitely
an improvement. Some abstraction are still a bit leaky but we can always
refactor "on the fly". Some comments on the general parts, skipping
over the later details.
> bool m_has_dest_p;
Why does a store not
> I do refactoring since we are going to have many different
> auto-vectorization patterns, for example: cond_addetc.
>
> I should make the current framework suitable for all of them to
> simplify the future work.
That's good in general but can't it wait until the respective
changes go in?
> Thanks Robin. Address comment.
Did you intend to send an update here already or are you working
on it? Just wondering because you just sent another refactoring
patch.
Regards
Robin
> So I expect you will also apply those refactor on Juzhe's new changes?
> If so I would like to have a separated NFC refactor patch if possible.
What's NFC? :) Do you mean to just have the refactor part as a separate
patch? If yes, I agree.
> e.g.
> Juzhe's vec_cmp/vcond -> NFC refactor patch
As discussed with Juzhe off-list, I will rebase this patch against
Juzhe's vec_cmp/vcond patch once that hits the trunk.
Regards
Robin
Hi Juzhe,
thanks. Some remarks inline.
> +;; Integer (signed) vcond. Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond"
> + [(set (match_operand:V 0 "register_operand")
> +
>>> + TAIL_UNDEFINED = -1,
>>> + MASK_UNDEFINED = -1,
> Why you add this ?
>
>>> + void add_policy_operands (enum tail_policy vta = TAIL_UNDEFINED,
>>> + enum mask_policy vma = MASK_UNDEFINED)
> No, you should just specify this as TAIL_ANY or MASK_ANY as default value.
That's the value I
Hi,
this patch implements autovec expanders of abs2, vneg2 and
vnot2 for integers. I also tried to refactor the helper code
in riscv-v.cc a bit. Guess it's not enough to warrant a separate patch
though.
Regards
Robin
gcc/ChangeLog:
* config/riscv/autovec.md (2): Fix typo.
Hi,
this obvious patch removes empty run template files and one redundant
stdio.h include.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Do not include
.
* gcc.target/riscv/rvv/autovec/binop/shift-run-template.h: Removed.
Hi,
this fixes a rebase oversight regarding the loading
of vector constants. Added another test to properly
catch that in the future.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Remove else.
gcc/testsuite/ChangeLog:
*
> Huh, including stdint-gcc.h looks completely wrong. What's the issue you are
> trying to solve?
The way I understood it is that that's a temporary workaround until
all multilib et al. (+testsuite) configurations are in place but I
haven't checked the details myself. Eventually this should be
> This patch would like to align the stdint.h to the stdint-gcc.h for all
> the RVV test files. Aka:
>
> stdint.h => stdint-gcc.h
Looks good. Jeff already pre-approved so you can go ahead and install
this on the trunk.
Regards
Robin
> After this patch, RVV GCC by default support alignment of RVV modes
> according to riscv-modes.def. In riscv-modes.def, we define each RVV
> modes are element align which is aligned to RVV ISA spec.
>
> If you want to support other alignment, you should add tunning info
> for this in the
Hi,
we need to discern what we want to achieve here. The goal might
be to prevent the vectorizer from performing peeling or versioning
for alignment. I realize the peeling code looks ugly but it's
actually for a good cause when the target does not support
misaligned vector access or only with
> emit_merge_op can not be wrapped into binop since mask position is
> different in pattern.
>
> I prefer merge op in different wrapper.
Yes, I didn't mean literally the same but that things already
become a bit confusing with all the different variants and bool
arguments or code duplication
Hi,
in general LGTM, just minor nits and comments.
> - void set_len_and_policy (rtx len, bool force_vlmax = false)
> -{
> - bool vlmax_p = force_vlmax;
> - gcc_assert (has_dest);
> + void set_len_and_policy (rtx len, bool force_vlmax = false, bool ta_p =
> true,
> +
>> After update local codebase to the trunk. I realize there is one more fail
>> in RV32.
>> After this patch, all fails of RVV are cleaned up.
>> Thanks.
But only because we build vmv-imm with autovec-preference=scalable. With
fixed-vlmax
it still does not work because I messed up the rebase
> ok, thanks :)
This has likely been discussed at length before, but why need to
specify the additional -mabi with -march (instead of -march implying
a matching abi)?
> The vector shift immediates happen to have the same constraints as some
> of the CSR-related operands, but it's a different usage. This adds a
> name for them, so I don't get confused again next time.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (shifts): Use
>
> OK, you can go ahead commit patch. I am gonna send another patch to
> fix this.
I agree that we should handle more constants but I'd still rather go
ahead now and fix things later. The patch is more about the test
rather than the actual change anyway.
Jeff already ack'ed v1, maybe waiting for
ChangeLog:
* MAINTAINERS: Sort.
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 1c380bef5c5..e4dee76e2df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -521,7 +521,6 @@ James Lemke
Ilya
> LGTM. You should commit it now. Then I can rebase vec_init patch.
Would need an ACK/OK from Kito at least :)
> "csr_operand" does seem wrong, though, as that just accepts constants.
> Maybe "arith_operand" is the way to go? I haven't looked at the
> V immediates though.
I was pondering changing the shift-count operand to QImode everywhere
but that indeed does not help code generation across the board.
Changes from v1:
- Change subject to RISC-V ;)
- Minor comment updates and rebasing.
This patch tries to improve the wrappers that emit either vlmax or
non-vlmax operations. Now, emit_len_op can be used to
emit a regular operation. Depending on whether a length != NULL
is passed either no
Changes from v1:
- Split into run tests (guarded by riscv_vector) and compile tests
which will be executed unconditionally. Doing dg-do run and -save-temps
on a non-supported target will not do anything at all.
This patchs adds scan as well as execution tests for vectorized
binary
Changes from v1:
- Rebase against Juzhe's vec_series patch.
- Get rid of redundant scalar mode setting.
This patch adds basic binary integer operations support. It is based
on Michael Collison's work and makes use of the existing helpers in
riscv-c.cc. It introduces emit_nonvlmax_binop
Hi,
this patch allows mklog.py to be called with a commit hash directly.
So, instead of
git show | git gcc-mklog
git gcc-mklog --commit
can be used.
When no is given but --commit is specified, HEAD is used
instead. The behavior without --commit is the same as before.
Is that useful/OK?
It's somewhat common for mail clients to treat "--" as a signature
deliminator, it's "---" that git uses as a comment deliminator.
It's in my muscle memory somehow. Always did it that way because I
didn't want the same delimiter as in the git part of the message. Time
to change that habit I
> + machine_mode op2mode = Pmode;
> + if (inner == E_QImode || inner == E_HImode || inner == E_SImode)
> + op2mode = inner;
This I added in order to match the scalar variants like
[(set (match_operand:VI_QHS 0 "register_operand" "=vd,vd, vr, vr")
(if_then_else:VI_QHS
Hi,
this patch tries to improve the wrappers that emit either vlmax or
non-vlmax operations. Now, emit_len_op can be used to
emit a regular operation. Depending on whether a length != NULL
is passed either no VLMAX flags are set or we emit a vsetvli and
set VLMAX flags. The patch also adds
Hi,
this patchs adds scan as well as execution tests for vectorized
binary integer operations. It is based on Michael Collison's work
and also includes scalar variants. The tests are not fully comprehensive
as the vector type promotions (vec_unpack, extend etc.) are not
implemented yet. Also,
Hi,
this patch splits off the shift patterns of the binop patterns.
This is necessary as the scalar shifts require a Pmode operand
as shift count. To this end, a new iterator any_int_binop_no_shift
is introduced. At a later point when the binops are split up
further in commutative and
Hi,
this patch adds basic binary integer operations support. It is based
on Michael Collison's work and makes use of the existing helpers in
riscv-c.cc. It introduces emit_nonvlmax_binop which, in turn, uses
emit_pred_binop. Setting the destination as well as the mask and the
length is
Hi Juzhe,
I wasn't yet able to check this locally so just some minor comment nits:
> +/* Return the vectorization machine mode for RVV according to LMUL. */
> +machine_mode
> +preferred_simd_mode (scalar_mode mode)
> +{
> + /* We only enable auto-vectorization when TARGET_MIN_VLEN < 128 &&
> +
Hi,
I figured I'm going to start sending some patches that build on top
of the upcoming RISC-V autovectorization. This one is obviously
not supposed to be installed before the basic support lands but
it's small enough that it shouldn't hurt to send it now.
This patch allows vector constants in
Dapp
+Robin Dapp
+Robin Dapp
Simon Dardis
Sudakshina Das
Bud Davis
@@ -731,6
Hi Michael,
I have the diff below for the binops in my tree locally.
Maybe something like this works for you? Untested but compiles and
the expander helpers would need to be fortified obviously.
Regards
Robin
--
gcc/ChangeLog:
* config/riscv/autovec.md (3): New binops expander.
> ../../gcc/config/riscv/generic.md:28:1: unknown value `smin' for attribute
> `type'
> make[3]: *** [Makefile:2528: s-attrtab] Error 1
>
>From 582c428258ce17ffac8ef1b96b4072f3d510480f Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Fri, 21 Apr 2023 09:38:06 +0200
Subject: [PA
> Can you give more comments about Robin's opinion that he want to change into
> "fixed" vs "varying" or "fixed vector size" vs "dynamic vector size" ?
It's not necessary to decide on this now as --params are not supposed
to be stable and can be changed quickly. I was just curious if this had
> $ riscv64-unknown-linux-gnu-gcc
> --param=riscv-autovec-preference=fixed-vlmax
> gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c -O2 -march=rv64gcv
> -S
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:
> In function 'stach_check_alloca_1':
>
>> I think we can CC IBM folks to see whether we can make WHILE_LEN works
>> for both IBM and RVV ?
>
> I've CCed them. Adding WHILE_LEN support to rs6000/s390x would be
> mainly the "easy" way to get len-masked (epilog) loop support. I've
> figured actually implementing WHILE_ULT for AVX512
Hi,
> This broke the tests, I'm seeing syntax errors:
> ERROR: gcc.dg/vect/slp-3.c -flto -ffat-lto-objects: error executing dg-final:
> syntax error in target selector "target ! vect_partial_vectors || vect32 ||
> s390_vx"
> ERROR: gcc.dg/vect/slp-3.c: error executing dg-final: syntax error
Hi,
this patch changes SLP test expectations. As we only vectorize when no
more than one rgroup is present, no vectorization is performed.
I was also considering using a separate target selector (something like
vect_partial_vectors_bias_m1) but as the number of testcases is limited
that would
Hi,
When compiling on a system where binutils do not yet support the 'z16'
name assembling fails with -march=native which we currently interpret
as -march=z16 (on a z16 machine). This patch uses -march=arch14
instead.
Is it OK?
Regards
Robin
--
gcc/ChangeLog:
*
Hi,
we seem to flip flop between the "high" and "not low" variants of load on
condition. Accept both in the affected test cases.
Going to commit this as obvious.
Regards
Robin
--
gcc/testsuite/ChangeLog:
* gcc.target/s390/ifcvt-two-insns-bool.c: Allow "high" and
"not low or
ng used as part of the length then? Do we need a zero-extend
> here?
v2 attached with these problems addressed.
Testsuite and bootstrap as before.
Regards
RobinFrom 27cc2fa49a0f3fbc2c629028b51e862346392636 Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Mon, 22 Aug 2022 11:05:39 +0200
Subject: [P
Hi,
this patch adds LEN_LOAD/LEN_STORE support for z14 and newer.
It defines a bias value of -1 and implements the LEN_LOAD and LEN_STORE
optabs.
It also includes various vll/vstl testcases adapted from Kewen Lin's patch
for Power.
Bootstrapped and regtested on z13-z16.
Is it OK?
Regards
Hi,
> On optimizing for speed, default_noce_conversion_profitable_p() allows
> plenty of headroom, so this patch has little impact.
>
> Also, if the target-specific cost estimate is accurate or allows for
> margins, the impact should be similarly small.
I believe this part of ifcvt does/did not
Should we go ahead with this, i.e. push the change and wait for fallout?
I guess we're still early enough in the cycle for that. There are no
regressions anymore on s390, Power9, x86 and aarch64 (at least on the
farm machines I checked).
Regards
Robin
> IIRC, I was trying to "fix" modeless operand by giving it a mode, but
> since it made no difference for x86, I later dropped the patch.
> However, operand with a known mode is preferred, so if it works for
> you, just include my patch in your submission. My patch is somehow
> trivial if we want
Hi,
> With the patch my local changes to make better use of vec_set work
> nicely even though I haven't done a full bootstrap yet. Were there
> other issues with the patch or can it still be applied?
I performed a bootstrap as well as a regtest with -march=z16 on s390.
There is no new fallout.
Hi,
I'm looking into vec_set with variable index on s390. Uros posted a
patch [1] that did not make it upstream in Nov 2020. It changed the
mode of the index operand to whatever the target supports in
can_vec_set_var_idx_p. I missed it back then but we indeed do not make
proper use of vec_set
> Do we have evidence that targets properly cost XOR vs SUB RTXen?
>
> It might actually be a reload optimization - when the constant is
> available in a register use 'sub', when it needs to be reloaded
> use 'xor'?
>
> That said, I wonder if the fallout of changing some SUB to XOR
> is bigger
Hi,
since r13-2746 we hit an ICE when bootstrapping with -m31 and
--enable-checking=all.
../../../../libgfortran/ieee/ieee_helper.c: In function
'ieee_class_helper_16':
../../../../libgfortran/ieee/ieee_helper.c:77:3: internal compiler
error: RTL check: expected code 'reg', have 'subreg' in
> I opened:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107061
The online docs for encodekey256 also say
XMM4 through XMM6 are reserved for future usages and software should not
rely upon them being zeroed.
I believe we also zero there.
> This sounds like an issue.
So with your patch
> I did bootstrapping and ran the testsuite on x86(-64), aarch64, Power9
> and s390. Everything looks good except two additional fails on x86
> where code actually looks worse.
>
> gcc.target/i386/keylocker-encodekey128.c
>
> 17c17,18
> < movaps %xmm4, k2(%rip)
> ---
>> pxor
> Yes, because the native_interpret always starts at offset zero
> (we can't easily feed in a "shifted" RHS). So what I assumed is
> that IFN_LEN_STORE always stores elements [0, len + adj].
Hmm, but this assumption is not violated here or am I missing something?
It's not like we're storing
> The error is probably in vn_reference_lookup_3 which assumes that
> 'len' applies to the vector elements in element order. See the part
> of the code where it checks for internal_store_fn_p. If 'len' is with
> respect to the memory and thus endianess has to be taken into
> account then for the
Hi,
I'm locally testing a branch that enables vll/vstl for partial vector
usage i.e. len_load and len_store on s390. I see a FAIL in
testsuite/gfortran.dg/power_3.f90.
Since r13-1777-gbd9837bc3ca134 we also performe VN for masked/len stores
and things go wrong there. The problem seems to be
> Yeah, rtx_costs (or preferably insn_cost, if that works) seem like the
> best way of addressing this. If the target says that register moves are
> cheaper than constant moves then it's a feature that CSE & co remove
> duplicate constants. The REG_EQUIV note is still useful in those cases
>
Small addition to clarify: (insn 8) from the example is of course
matched to a vzero. The "problem" begins when (reg 64) is later moved
into another register and the (const_vector) has been optimized to a
single definition e.g. by CSE, i.e. we have several
(insn yy (set (reg:V2DI xx) (reg:V2DI
Hi,
I have been working on making better use of s390's vzero instruction.
Currently we rather zero a vector register once and load it into other
registers via vlr instead of emitting multiple vzeros.
At IRA/reload point we e.g. have
(insn 8 5 19 2 (set (reg/v:V2DI 64 [ zero ])
> Which is this from the mail archives:
>
> https://gcc.gnu.org/pipermail/gcc-patches/1998-June/000308.html
>
> I would tend to agree that for equal cost that the constant would be
> preferred since that should be better from a scheduling/dependency
> standpoint. So it seems to me we can
> Did you did any archeology into this code to see if there was any
> history that might shed light on why it doesn't just using the costing
> models?
This one was buried under some dust :)
commit 0254c56158b0533600ba9036258c11d377d46adf
Author: John Carr
Date: Wed Jun 10 06:00:50 1998 +
Hi,
I recently looked into a sequence like
vzero %v0
vlr %v2, %v0
vlr %v3, %v0.
Ideally we would like to use vzero for all of these sets in order to not
create dependencies.
For some instances of this problem I found the offending snippet to be
the postreload cse pass. If there is a non
> The question is really whether xor or sub is "better" statically. I can't
> think of any reasons. On s390, why does xor end up "better"?
There is an xor with immediate (as opposed to no "subtract from
immediate") which saves an instruction, usually. On x86, I think the
usual argument for xor
> cost might also depend on the context in case flag setting
> behavior differs for xor vs sub (on x86 sub looks strictly more
> powerful here). The same is probably true when looking for
> a combination with another bitwise operation.
>
> Btw, why not perform the optimization in expand_binop?
Hi,
posting this separately from PR91213 now. I wrote an s390 test and most
likely it could also be done for x86 which will give it broader coverage.
Depending on the backend it might be better to convert
cst - x
into
cst xor x
if cst + 1 is a power of two and 0 <= x <= cst. This patch
Hi,
this adds a missing -mzarch to some ifcvt test cases.
Going to commit this as obvious in some days barring objections.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/s390/ifcvt-one-insn-bool.c: Add -mzarch.
* gcc.target/s390/ifcvt-one-insn-char.c: Dito.
*
Hi,
adding -save-temps as well as a '\t' in order for the tests to do what
they are supposed to do.
Going to push this as obvious in some days.
Regards
Robin
--
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vperm-rev-z14.c: Add -save-temps.
*
rom 1f11a6b89c9b0ad64b480229cd4db06e887a Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Fri, 24 Jun 2022 15:17:08 +0200
Subject: [PATCH v2] s390: Recognize reverse/element swap permute patterns.
This adds functions to recognize reverse/element swap permute patterns
for vler, vster as well as vpdi and rotate.
gcc/Change
Hi,
similar to other backends this patch implements vec_set via
vec_merge and vec_duplicate instead of an unspec. This opens up
more possibilites to combine instructions.
Bootstrapped and regtested. No regressions.
Is it OK?
Regards
Robin
gcc/ChangeLog:
* config/s390/s390.md:
Hi,
vec_select can handle dynamic/runtime masks nowadays. Therefore we can
get rid of the UNSPEC_VEC_EXTRACT that was preventing further
optimizations like combining instructions with vec_extract patterns.
Bootstrapped and regtested. No regressions.
Is it OK?
Regards
Robin
gcc/ChangeLog:
601 - 700 of 973 matches
Mail list logo