> > + if (is_gimple_min_invariant (op))
> > + return true;
> > + if (SSA_NAME_IS_DEFAULT_DEF (op)
> > + || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT
> (op
> > + return true;
> > + return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
> > +}
> +/* Return true it is whole register-register move. */
> +bool
> +whole_reg_to_reg_move_p (rtx *ops, machine_mode mode)
> +{
> + if (register_operand (ops[0], mode)
> + && register_operand (ops[3], mode)
> + && satisfies_constraint_vu (ops[2])
> + && satisfies_constraint_Wc1
> 1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize
> shift with vector shift amount,
> that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
> 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize
> shift with scalar shift
Hi,
found in PR112971, this patch adds folding support for bitwise operations
of const duplicate zero vectors and stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not fold to {0, 0, 0,
OK.
Regards
Robin
LGTM.
Regards
Robin
LGTM.
Regards
Robin
Hi Juzhe,
> -} elseif [istarget riscv64-*-*] {
> +} elseif [istarget riscv*-*-*] {
> if [check_effective_target_riscv_v] {
> lappend DEFAULT_VECTCFLAGS "--param" "riscv-vector-abi"
> set dg-do-what-default run
Yes, that's reasonable. A bit further down we have
LGTM.
Regards
Robin
On 12/15/23 13:52, juzhe.zh...@rivai.ai wrote:
> Do you mean :
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_fit_selector_p
> = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 ||
> Oh. I think it should be renamed into not_fit.
>
> Is this following make sense to you ?
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_not_fit_selector_p
> = maybe_ge (vec_len, 2
On 12/15/23 13:16, juzhe.zh...@rivai.ai wrote:
>
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE
>>> (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
>
> For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us
>
Hi Juzhe,
in general looks OK.
> + /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes. */
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len,
> 256)
> + && !vec_len.is_constant ())
>
> It looks like:
>
> FOR_EACH_MODE_FROM (new_mode, new_mode)
> if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
> && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
> && targetm.vector_mode_supported_p (new_mode)
>
Hi,
this is a bit of a follow up of the latest expmed change.
In extract_bit_field_1 we try to get a better vector mode before
extracting from it. Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent
Thanks, LGTM but please add a comment like:
These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior. Depending on the
execution environment they can either pass or fail. As of now,
with the latest QEMU version, they will pass even without the
> Do you mean add some comments in tests?
I meant add it as a run test as well and comment that the test
has caused out-of-bounds writes before and passed by the time of
adding it (or so) and is kept regardless.
Regards
Robin
Thanks. The attached v2 goes with your suggestion and adds a
vec_extractbi expander. Apart from that it keeps the
MODE_PRECISION changes from before and uses
insn_data[icode].operand[0]'s mode.
Apart from that no changes on the riscv side.
Bootstrapped and regtested on x86 and aarch64. On
> I don”t choose to run since I didn”t have issue run on my local
> simulator no matter qemu or spike.
Yes it was flaky. That's kind of expected with the out-of-bounds
writes we did. They can depend on runtime environment and other
factors. Of course it's a bit counterintuitive to add a
Hi Juzhe,
in general looks OK to me.
Just a question for understanding:
> - if (header_info.valid_p ()
> - && (anticipated_exp_p (header_info) || block_info.full_available))
Why is full_available true if we cannot use it?
> +/* { dg-do compile } */
It would be nice if we could
Given that it's almost verbatim aarch64's implementation and the
general approach appears sensible, LGTM.
Regards
Robin
> - Change the second mode to vec_extract_optab. This is only a name
> lookup, and it seems more natural to continue using the real element mode.
Am I understanding correctly that this implies we should provide
a vec_extractbi expander? (with the innermode being BImode
here).
Regards
Robin
Yes, no harm in doing that. LGTM.
Regards
Robin
Hi Richard,
I have tested the new pass on riscv64 and while it did exhibit some
regressions, none of them are critical. Mostly, test expectations
will need to be adjusted - no new execution failures.
As mentioned in the initial discussion it does help us get the
behavior we want but, as of now,
> Yes, I test the patch with all below configurations and there is no failure
> now. That would be great!
Thank you! I posted it as a patch now:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640182.html
Regards
Robin
Hi,
this fixes expectations in the strcmp-run test which would sometimes
fail with newlib. The test expects libc strcmp return values and
asserts the vectorized result is similar to those. Therefore hard-code
the expected results instead of relying on a strcmp call.
Pan has already tested in a
Hi Pan,
> I reduced the SZ size from 10 to 1, and the below case with SZ = 2
> will fail. The failed location is "foo is 50, foo2 is 12800, i,j is
> 1, 0".
>
> #define SZ 2
>
> const char *s[SZ] = {"1",
> "12345678901234567889012345678901234567890"};
Thanks. I still cannot reproduce but I
> Robostify the gather index to fixe those FAILs.
OK.
They must have somehow slipped through because I pruned vlmax tests
for comparison of different vlens.
Regards
Robin
What also works is something like:
scalar_mode extract_mode = innermode;
if (GET_MODE_CLASS (outermode) == MODE_VECTOR_BOOL)
extract_mode = smallest_int_mode_for_size
(GET_MODE_PRECISION (innermode));
however
> So yes, I guess we need to answer
Ah, please also ensure to include (and follow) the stringop_strategy
checks. (LIBCALL, VECTOR)
The naming is a bit unfortunate still but that need not be fixed
in this patch.
Regards
Robin
Hi Sergei,
thanks for contributing this!
Small general remarks/nits upfront:
The code looks like it hasn't been run through clang-format or
similar. Please make sure that it adheres to the GNU coding
conventions. The same applies to comments. Some of them start
in lowercase.
As you rely on
> Yes. This is the last chance to walk around it here but we will end up with
> more patterns.
> since reduction dest operand always LMUL = 1 mode.
>
> So, when -march=rv64gcv, the dest mode should be V4SI, if
> -march=rv64gcv_zvl256b, the dest mode should be V8SI.
> ...etc. Different
> In record_use:
>
> if (HARD_REGISTER_NUM_P (regno)
> && partial_subreg_p (use->mode (), mode))
>
> Assertion failed on partial_subreg_p which is:
>
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
> /* Modes involved in a subreg must be
> FYI. I have the some failures as juzhe mentioned, with the emulator
> qemu version qemu-riscv64 version 8.1.93 (v8.2.0-rc3). The entire log
> may look like below:
>
> Executing on host:
> /home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc
>
On 12/11/23 03:09, juzhe.zh...@rivai.ai wrote:
> + if (shuffle_series (d))
> + return true;
>
>
> Could you rename it into shuffle_series_patterns ?
>
> Just to make naming consistent.
Done, going to push with that change in a while.
Regards
Robin
LGTM, thanks.
Regards
Robin
> rv64gcv
With -minline-strcmp I assume?
Regards
Robin
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
>
Hi,
we currently try to recognize various forms of stepped (const_vector)
sequence variants in expand_const_vector. Because of complications with
canonicalization and encoding it is easier to identify such patterns
in expand_vec_perm_const_1 already where perm.series_p () is available.
This
Ah, I forgot to attach the current v2 that also enables strncmp.
It was additionally tested with -minline-strncmp on rv64gcv.
Regards
Robin
Subject: [PATCH v2] RISC-V: Add vectorized strcmp and strncmp.
This patch adds vectorized strcmp and strncmp implementations and
tests. Similar to
Similar to strlen, this now seems safe to push. Will do so
later.
I tested on rv64gcv_zvl128b with -minline-strlen and didn't see
regressions.
Regards
Robin
After Juzhe's vsetvl fix earlier this week this seems safe to push.
Going to do so later.
I tested on rv64gcv_zvl128b with -minline-strlen and didn't see
regressions apart from zbb-strlen-disabled-2.c which will always
fail with -minline-strlen because it expects -mno-inline-strlen.
Regards
Sorry for the delay, just a tiny naming/comment nit.
Rest LGTM, no need for a v2.
> +/* Return true each pattern has different 2 steps.
> + TODO: We currently only support NPATTERNS = 2. */
Return true if the permutation consists of two interleaved
patterns with a constant step each.
>
LGTM.
Btw your vsetvl patch from yesterday fixes the vectorized
strlen/strcmp problems. Those use vleff as first instruction.
Regards
Robin
LGTM.
+ /* Don't perform earliest fusion on unrelated edge. */
+ if (bitmap_count_bits (e) != 1)
+ continue;
This could still use a comment why e is "unrelated" in that case
(no v2 needed).
Regards
Robin
Hi,
PR112854 shows a problem on rv32 with zvl1024b. During the course of
expand_constructor we try to overlay/subreg a 64-element mask by a
scalar (Pmode) register. This works for zvle512b and its maximum of
32 elements but fails for rv32 and 64 elements.
To circumvent this this patch adds a
> I think the VLS modes are excluded exactly meet we expected.
> For example, when zvl128b, LMUL = 1.
> We allow allow VLS modes <= 128bit, exclude VLS modes > 128bits.
> We have the same behavior as ARM SVE.
I just found the ordered_p a bit unintuitive here at first sight.
But when thinking
Yes, makes sense. Just one clarifying question.
> +{
> + if (GET_MODE_CLASS (vls_mode) != MODE_VECTOR_BOOL
> + && !ordered_p (TARGET_MAX_LMUL * BITS_PER_RISCV_VECTOR,
> + GET_MODE_PRECISION (vls_mode)))
> + /* We enable VLS modes which are aligned with
> But how do we know BImode fits in QImode?
I was kind of hoping that a "bit" always fits in a "byte"/unit
but yeah, I guess we don't always know :/
> I think the issue is more that we try to extract an element from
> the mask vector? How is element extraction defined for VLA vectors
> anyway?
OK.
Regards
Robin
I'd suggest the same thing as in the other patch, i.e. not having
the large number of identical lines in the iterator. That's just
my opinion, though. Rest LGTM.
Regards
Robin
LGTM.
Regards
Robin
Hi,
recent -std changes caused testsuite failures. Fix those by adding
-std=gnu99 and -Wno-incompatible-pointer-types.
Going to commit as obvious.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112552.c: Add
-Wno-incompatible-pointer-types.
*
> +(define_mode_attr widen_ternop_dest_constraint [
> + (RVVM8QI "=vd, vr, vd, vr, vd, vr, ?")
> + (RVVM4QI "=vd, vr, vd, vr, vd, vr, ?")
> + (RVVM2QI "=vd, vr, vd, vr, vd, vr, ?")
> + (RVVM1QI "=vd, vr, vd, vr, vd, vr, ?")
> + (RVVMF2QI "=vd, vr, vd, vr, vd, vr, ?")
> + (RVVMF4QI "=vd, vr,
Hi,
this changes the vec_extract path of extract_bit_field to use QImode
instead of BImode when extracting from mask vectors and changes
GET_MODE_BITSIZE to GET_MODE_PRECISION. This fixes an ICE on riscv
where we did not find a vec_extract optab and continued with the generic
code that requires
LGTM.
Regards
Robin
Split it into four separate patches now.
Regards
Robin
Hi,
this patch adds a vectorized strcmp implementation and tests. Similar
to strlen, expansion is still guarded by -minline-strcmp. I just
realized I forgot to make it a series but this one is actually
dependent on the NFC patch and the rawmemchr fix before.
Regards
Robin
gcc/ChangeLog:
Hi,
this patch implements a vectorized strlen by re-using and slightly
adjusting the rawmemchr implementation. Rawmemchr returns the address
of the needle while strlen returns the difference between needle address
and start address.
As before, strlen expansion is guarded by -minline-strlen.
Hi,
now split into multiple patches.
In preparation for the vectorized strlen and strcmp support this NFC
patch unifies the stringop strategy handling a bit. The "auto"
strategy now is a combination of scalar and vector and an expander
should try the strategies in their preferred order.
For
Hi,
this fixes a bug in the rawmemchr implementation by incrementing the
source address by vl * element_size instead of just vl.
This is normally harmless as we will just scan the same region more than
once but, in combination with an older qemu version, would lead to
an execution failure in
LGTM
Regards
Robin
Yes, OK, thanks for that. CC'ing Juzhe as this is his pass.
Regards
Robin
Hi,
this adds vectorized implementations of strcmp and strncmp as well as
strlen. strlen falls back to the previously implemented rawmemchr.
Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure:
We would only ever increment the source address by 1 regardless of
the input type.
OK.
Regards
Robin
> While working on overlap for widening instructions, I realize that we set
> vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.
>
> Since according to RVV ISA:
> "The destination EEW equals the source EEW."
>
> For both vwadd.wx/vfwadd.wf source vector and dest vector operand are same
>
LGTM (in context of the last message) but please consider adding
the comments/naming I suggested.
Regards
Robin
>>> I can't really match spec and code. For the lmul = 2 case sure,
>>> but W84 e.g. allows v4 and not v6? What actually is "highest-numbered
>>>part"?
> Yes.
>
> For vwcvt, LMUL 4 -> LMUL 8.
> We allow overlap vwcvt v0 (occupy v0 - v7), v4 (occupy v4 - v7)
> This patch support the overlap
Looks like this already went in while I was looking at it...
In general it looks ok to me but I would have really hoped
for some more comments.
> +;; These following constraints are used by RVV instructions with dest EEW >
> src EEW.
> +;; RISC-V 'V' Spec 5.2. Vector Operands:
> +;; The
LGTM. That one is easy to revert (as opposed to changing all
modes).
Regards
Robin
> The easiest way to avoid running into the alias analysis problem is
> to scrap the MEM_EXPR when we expand the internal functions for
> partial loads/stores. That avoids the disambiguation we run into
> which is realizing that we store to an object of less size as
> the size of the mode we
On 11/25/23 09:24, Juzhe-Zhong wrote:
> Come back to review the codes of gather/scatter, notice
> gather_scatter_valid_offset_mode_p looks odd.
> gather_scatter_valid_offset_mode_p is supposed to block vluxei64/vsuxei64 in
> RV32 system.
> However, it failed to do that since it is passing
LGTM.
Regards
Robin
LGTM (and harmless enough) but I'd rather wait for a second look or a
maintainer's OK as we're past stage 1 and it's not a real bugfix.
(On top, it's Thanksgiving so not many people will even notice).
On a related note, this should probably be a middle-end optimization
but before a variable-index
> Oh. You mean this patch also fixes FLTO failed case ?
Yes, it's the same issue. There we have a fixed vl (known via LTO)
that is being propagated "into" gathers and we end up missing
gather elements.
Regards
Robin
I was just about to post a similar-ish patch that fixes pr65518.c
but you were faster ;)
Therefore LGTM. You can add PR/target 112670.
Regards
Robin
LGTM.
Regards
Robin
> I don't get it. Why do we need remove them ?
It's just replaced by riscv_zvfh. I should probably edit the
patch description and changelog entries to make it clearer.
Regards
Robin
>> Bootstrapped and regtested on aarch64 and regtested on riscv. x86 is
>> still running.
Just to confirm: x86 bootstrap and regtest unchanged. Going to commit
it soon.
Regards
Robin
Hi,
this removes the now-redundant vector_hw and zvfh_hw checks in the
testsuite.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c:
Remove zvfh_hw.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: Ditto.
*
Hi,
in PR112406 Tamar found another problem with COND_OP reductions.
I wrongly assumed that the reduction variable will always remain in
operand 1, just as we create the COND_OP in ifcvt. But of course,
addition being commutative, we are free to swap operand 1 and 2 and
can end up with e.g.
> /* { dg-do run { target { { {riscv_v} && {rv64} } } } } */
>
> Seems you should remove rv64 here ? sicne I think it is redundant here.
Going to commit with that removed.
Regards
Robin
> Mhm, not so obvious after all. We vectorize 250 instances with
> rv32gcv, 229 with rv64gcv and 250 with rv64gcv_zbb. Will have
> another look tomorrow.
The problem is that tree-vect-patterns is more restrictive than
necessary and does not vectorize everything it could. Therefore
I'm going to
Hi Juzhe,
> This bug is exposed when testing on zvl512b RV32 system.
>
> The rootcause is RA reload DI CONST_VECTOR into vmv.v.x then it ICE.
>
> So disallow DI CONST_VECTOR on RV32.
OK.
Regards
Robin
Mhm, not so obvious after all. We vectorize 250 instances with
rv32gcv, 229 with rv64gcv and 250 with rv64gcv_zbb. Will have
another look tomorrow.
Regards
Robin
Hi,
since Jakub's recent middle-end changes we vectorize more
popcount instances. This patch just adjusts the dump check.
Going to commit as obvious once I have figured out why there
is a dump difference between my local tester and on the server.
Regards
Robin
gcc/testsuite/ChangeLog:
Hi,
this adds an effective target requirement to compile the tests. Since
we disabled 64-bit indices on rv32 targets those tests should be
unsupported on rv32.
Regards
Robin
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/base/bug-14.C: Add
dg-require-effective-target rv64.
Hi,
as per recent discussion and in order to fix inconsistencies
between spike and qemu this patch removes gcc_march and gcc_mabi
arguments from the default CFLAGS in the testsuite invocation for
some sub directories.
Juzhe reported that this helps for him.
Regards
Robin
LGTM. I prefer that over the iterator.
Regards
Robin
LGTM.
Regards
Robin
> No, you shouldn't place _7 != 0 inside the .COND_ADD but instead
> have an extra pattern stmt producing that so
>
> patt_8 = _7 != 0;
> patt_9 = .COND_ADD (patt_8, ...);
>
> that's probably still not enough, but I always quickly forget how
> bool patterns work ... basically a comparison like
> It must be correct. We already have test (intrinsic codes) for it.
Yeah, just noticed that myself. Anyway will do some more tests,
maybe my initial VLS analysis was somehow flawed.
> Condition should be put into iterators (Add a new iterator for
> indexed load store).
Ah, that's what you
> OK. Make sense。
I'm wondering whether the VLA modes in the iterator are correct.
Looks dubious to me but unsure, will need to create some tests
before continuing.
> LGTM as long as you remove all
> GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)
What's the problem with
> Yes, your version is also OK.
The attached was bootstrapped and regtested on aarch64, x86 and
regtested on riscv. Going to commit it later unless somebody objects.
Regards
Robin
Subject: [PATCH] vect: Pass truth type to vect_get_vec_defs.
For conditional operations the mask is loop
> So, going back to our testcases that reported errors with this, I
> don't think we should explicitly specify -march and -mabi when
> compiling a runnable program, but use the defaults (--with-arch).
> Most of our current runnable testcases adhere to this convention,
> except for the ones we
> But note you can explicitly specify a vector type as well, there's an
> overload for it, so we can fix the "invariant" case with the following
> (OK if you can test this on relevant targets)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3f59139cb01..936a3de9534 100644
>
Hi Juzhe,
> csrra4,vlenb
> csrra5,vlenb
Totally unrelated to this patch but this looks odd. I don't
remember if we had a patch for this already at some point.
In general the idea for the patch is to use the largest vector
element mode for the indices and compress several of
> For the fortran testcase we don't even run into this but hit an
> internal def and assert on
>
> gcc_assert (STMT_VINFO_VEC_STMTS (def_stmt_info).length () == ncopies);
>
> I think this shows missing handling of .COND_* in the bool pattern recognition
> as we get the 'bool' condition as
On 11/15/23 15:29, 钟居哲 wrote:
> Could you show me the example ?
>
> It's used by handling SEW = 64 on RV32. I don't know why this patch touch
> this code.
Use gather_load_run-1.c with the 64-bit index patterns disabled
on rv32. We insert (mem:DI (reg:SI)) into a vector so use the
SEW = 64
> Looks wrong. Recover back.
When we demote we use two elements where there was one before.
Therefore the vector needs to be able to hold twice as many
elements. We adjust vl correctly but the mode is not here.
Regards
Robin
101 - 200 of 973 matches
Mail list logo