"juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: > Thanks for your reply. Your suggestion "-1 - (int) i" is better. > > Can I send another patch that changes "builder.quick_push (gen_int_mode > (-(int) i, int_mode));" > into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you > merge it for me?
Yeah, please send a patch, and I'll merge it like you say. Thanks, Richard > > Or you change it yourself? > > Thank you so much. > > > > > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2022-08-22 16:31 > To: 钟居哲 > CC: rguenther; gcc-patches; kito.cheng > Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) > and allow the machine_mode definition with poly_uint16 (1, 1) > 钟居哲 <juzhe.zh...@rivai.ai> writes: >> After deep analysis and several tries. I have made a analysis and conclusion >> as follows. >> I really appreciate if you could spend some time take a look at the >> following analysis that I made: >> >> According to the codes in test_vector_subregs_fore_back: >> >> ~~~ >> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode); >> unsigned int min_nunits = constant_lower_bound (nunits); >> scalar_mode int_mode = GET_MODE_INNER (inner_mode); >> unsigned int count = gcd (min_nunits, 4); >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x = builder.build (); >> ~~~~ >> >> Analyze the codes and tried several patterns: >> >> For poly_uint16 (2,2): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> >> For poly_uint16 (4,4): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ] >> ]) >> >> So, I think I can conclude the pattern rule as follows: >> >> poly_uint16 (n,n): >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> (const_int 1 [0x1]) >> (const_int 2 [0x2]) >> (const_int 3 [0x3]) >> .................. >> (const_int n [hex(n)]) >> repeat [ >> (const_int 0 [0]) >> (const_int -1 [0xffffffffffffffff]) >> (const_int -2 [0xfffffffffffffffe]) >> (const_int -3 [0xfffffffffffffffd]) >> ......... >> (const_int -n [hex(-n)]) >> ] >> ]) >> Am I understanding right? >> >> Let's first take a look at the codes you write: >> >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> rtx x = builder.build (); >> >> So for poly_uint (1,1): >> Ideally according to the codes, you want to generate the pattern like this: >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 0 [0]) >> ] >> ]) > > Ah, right, the two subsequences are supposed to be different. > >> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration >> make i always is 0. >> In this case, i = -i = 0. >> >> So we will end up with the first value = 0, and repeat value is also 0, in >> "rtx x = builder.build ();". >> Because GCC thinks all the value are 0, so the pattern is changed as follows >> which is not a stepped vector: >> >> x = (const_vector:VNx1DI repeat [ >> (const_int 0 [0]) >> ]) >> >> This is a const_vector duplicate vector. >> This is not the correct pattern you want so it fails in the following check. >> >> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is >> the only corner case will go wrong. >> >> To fix this, we should adjust test pattern generator to specifically >> poly_uint16 (1,1). We should make first value >> and repeat value different. I tried 2 solution and they both pass for >> poly_uint16 (1,1): >> >> ================ >> Original codes: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> ================ >> >> ================ >> 1st solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (-(int) i, int_mode)); >> >> x = (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int 1 [0x1]) >> ] >> ]) >> ================ >> >> ================ >> 2nd solution: >> rtx_vector_builder builder (inner_mode, count, 2); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (i, int_mode)); >> for (unsigned int i = 0; i < count; ++i) >> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode)); > > I don't think we need to single out count == 1. It should be OK to use > -1 - (int) i unconditionally. > > Thanks, > Richard > >> x= >> (const_vector:VNx1DI [ >> (const_int 0 [0]) >> repeat [ >> (const_int -1 [0xffffffffffffffff]) >> ] >> ]) >> ================ >> >> Do you agree with these solution? Or you could offer a better solution to >> fix this? Thank you so much. >> >> >> juzhe.zh...@rivai.ai >> >> From: Richard Sandiford >> Date: 2022-08-19 20:52 >> To: juzhe.zhong\@rivai.ai >> CC: gcc-patches; rguenther; kito.cheng >> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, >> 1) and allow the machine_mode definition with poly_uint16 (1, 1) >> "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: >>>> Ah, right, sorry for the bogus suggestion. >>>> In that case, what specifically goes wrong? Which test in >>>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute >>> test_vector_subregs_modes. >>> The fail ICE: >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: >>> FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, >>> byte)) >>> expected: (nil) >>> >>> actual: (const_int 0 [0]) >>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57 >>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char >>> const*, rtx_def*, rtx_def*) >>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57 >>> 0x1332504 test_vector_subregs_modes >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395 >>> 0x1332988 test_vector_subregs_fore_back >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442 >>> 0x1332ae7 test_vector_subregs >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467 >>> 0x1332c57 test_vector_ops >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487 >>> 0x1332c7b selftest::simplify_rtx_cc_tests() >>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547 >>> 0x21318fa selftest::run_tests() >>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115 >>> 0x1362a76 toplev::run_self_tests() >>> ../../../riscv-gcc/gcc/toplev.cc:2205 >>> >>> I analyzed the codes: >>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = >>> NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0. >>> So the assertion fails. >> >> Hmm, ok, so the subreg operation is unexpected succeeding. >> >>> This is the test for stepped vector using 2 element per pattern. For >>> poly_uint16 (1,1), it's true it is possible only has 1 element. >> >> The stepped case is 3 elements per pattern rather than 2. In a stepped >> pattern: a, b, b+n are represented explicitly, then the rest are >> implicitly b+n*2, b+n*3, etc. >> >> The case being handled by this code is instead the 2-element case: >> a, b are represented explicitly, then the rest are implicitly all b. >> >> Why is (1,1) different though? The test is constructing: >> >> nunits: 1 + 1x >> shape: nelts_per_pattern == 2, npatterns == 1 >> elements: a, b[, b, b, b, b, ...] >> >> It then tests subregs starting at 0 + 1x (so starting at the first b). >> But for (2,2) we should have: >> >> nunits: 2 + 2x >> shape: nelts_per_pattern == 2, npatterns == 2 >> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...] >> >> and we should test subregs starting at 0 + 2x (so starting at the >> first b1). The two cases should be very similar, it's just that the >> (2,2) case doubles the number of patterns. >> >>> I think it makes sense to fail the test. However for poly (1,1) machine >>> mode, can we have the chance that some target define this >>> kind of machine mode only used for intrinsics? I already developed full >>> RVV support in GCC (including intrinsc and auto-vectorization). >>> I only enable auto-vectorization with mode larger than (2,2) and test it >>> fully. >>> From my experience, it seems the stepped vector only created during VLA >>> auto-vectorization. And I think only allow poly (1,1)mode used in >>> intrinsics will >>> not create issues. Am I understanding wrong ?Feel free to correct me. >>> Thanks ~ >> >> Following on from what I said above, it doesn't look like this particular >> case is related to stepped vectors. >> >> (1,1) shouldn't (need to) be a special case though. Any potentital >> problems that would occur for (1,1) with npatterns==1 would also occur >> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic >> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2) >> would be problematic for (2,2). >> >> So yeah, preventing a mode being used for autovectorisation would allow >> the target to have a bit more control over which constants are actually >> generated. But it shouldn't be necessary to do that for correctness. >> >> Thanks, >> Richard >> >>> juzhe.zh...@rivai.ai >>> >>> From: Richard Sandiford >>> Date: 2022-08-19 17:35 >>> To: juzhe.zhong\@rivai.ai >>> CC: rguenther; gcc-patches; kito.cheng >>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, >>> 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>> "juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: >>>> Hi, Richard. I tried the codes: >>>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true >>>> value. >>> >>> Ah, right, sorry for the bogus suggestion. >>> >>> In that case, what specifically goes wrong? Which test in >>> test_vector_subregs_modes fails, and what does the ICE look like? >>> >>> Thanks, >>> Richard >>> >>>> But I tried: >>>> if (!nunits.is_constant () && known_gt (nunits, 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It pass. But it report a warning: "warning: comparison between signed and >>>> unsigned integer expressions [-Wsign-compare]" during the compilation. >>>> >>>> Finally, I tried: >>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) >>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> It passed with no warning. >>>> >>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this? >>>> Thanks! >>>> >>>> >>>> juzhe.zh...@rivai.ai >>>> >>>> From: Richard Sandiford >>>> Date: 2022-08-19 16:03 >>>> To: juzhe.zhong >>>> CC: gcc-patches; rguenther; kito.cheng >>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, >>>> 1) and allow the machine_mode definition with poly_uint16 (1, 1) >>>> juzhe.zh...@rivai.ai writes: >>>>> From: zhongjuzhe <juzhe.zh...@rivai.ai> >>>>> >>>>> Hello. This patch is preparing for following RVV support. >>>>> >>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic >>>>> vector. >>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant >>>>> of ARM SVE is always 128-bit blocks. >>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' >>>>> sub-extension and 64bit in 'Zve*' sub-extension. >>>>> >>>>> So I define the machine_mode as follows: >>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0); >>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks); >>>>> The riscv_vector_chunks = poly_uint16 (1, 1) >>>>> >>>>> The compilation is failed for the stepped vector test: >>>>> (const_vector:VNx1DI repeat [ >>>>> (const_int 8 [0x8]) >>>>> (const_int 7 [0x7]) >>>>> ]) >>>>> >>>>> I understand for stepped vector should always have aleast 2 elements and >>>>> stepped vector initialization is common >>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that >>>>> report fail for stepped vector of poly_uint16 (1, 1). >>>>> >>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in >>>>> intrinsics. And I would like to enable RVV auto-vectorization >>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V >>>>> backend. I think it will not create issue if we define >>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or >>>>> offer me some other better solutions. Thanks! >>>>> >>>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for >>>>> poly_uint16 (1, 1). >>>>> >>>>> --- >>>>> gcc/simplify-rtx.cc | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >>>>> index 7d09bf7103d..61e0dfa00d0 100644 >>>>> --- a/gcc/simplify-rtx.cc >>>>> +++ b/gcc/simplify-rtx.cc >>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode >>>>> inner_mode) >>>>> rtx x = builder.build (); >>>>> >>>>> test_vector_subregs_modes (x); >>>>> - if (!nunits.is_constant ()) >>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1))) >>>>> test_vector_subregs_modes (x, nunits - min_nunits, count); >>>> >>>> I think instead we should use maybe_gt (nunits, 1), on the basis that >>>> the fore_back tests require vectors that have a minimum of 2 elements. >>>> Something like poly_uint16 (1, 2) would have the same problem as >>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in >>>> principle.) >>>> >>>> This corresponds to the minimum of 3 elements for the stepped tests: >>>> >>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT >>>> && maybe_gt (GET_MODE_NUNITS (mode), 2)) >>>> { >>>> test_vector_ops_series (mode, scalar_reg); >>>> test_vector_subregs (mode); >>>> } >>>> >>>> Thanks, >>>> Richard >>>> >>> >> >