Juzhe and Kito,

Thank you for the clarification.

On 4/18/23 18:48, juzhe.zh...@rivai.ai wrote:
Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless to fix it here. We dynamic adjust the minimum vector-length for different '-march' according to RVV ISA specification.
So we strongly suggest that we should drop this fix.

Thanks.
------------------------------------------------------------------------
juzhe.zh...@rivai.ai

    *From:* Kito Cheng <mailto:kito.ch...@gmail.com>
    *Date:* 2023-04-19 02:21
    *To:* Richard Biener <mailto:richard.guent...@gmail.com>; Jeff Law
    <mailto:jeffreya...@gmail.com>; Palmer Dabbelt
    <mailto:pal...@dabbelt.com>
    *CC:* Michael Collison <mailto:colli...@rivosinc.com>; gcc-patches
    <mailto:gcc-patches@gcc.gnu.org>; 钟居哲 <mailto:juzhe.zh...@rivai.ai>
    *Subject:* Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS
    is a multiple of 2.
    Few more background about RVV:
    RISC-V has provide different VLEN configuration by different ISA
    extension like `zve32x`, `zve64x` and `v`
    zve32x just guarantee the minimal VLEN is 32 bits,
    zve64x guarantee the minimal VLEN is 64 bits,
    and v guarantee the minimal VLEN is 128 bits,
    Current status (without that patch):
    Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
    is invalid mode
    - one vector register could hold 1 + 1x SImode where x is 0~n, so it
    might hold just one SI
    Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
    - one vector register could hold 1 + 1x DImode where x is 0~n, so it
    might hold just one DI
    - one vector register could hold 2 + 2x SImode where x is 0~n, so it
    might hold just two SI
    So what I want to say here is VNx1DImode is really NOT safe to assume
    to have more than two DI in theory.
    However `v` extension guarantees the minimal VLEN is 128 bits.
    We are trying to introduce another type/mode mapping for this
    configure:
    v: Mode for one vector register mode is VNx2DImode or VNx4SImode
    - one vector register could hold 2 + 2x DImode where x is 0~n, so it
    will hold at least two DI
    - one vector register could hold 4 + 4x SImode where x is 0~n, so it
    will hold at least four DI
    So GET_MODE_NUNITS for a single vector register with DI mode will
    become 2 (VNx2DImode) if it is really possible, which is a more
    precise way to model the vector extension for RISC-V .
    On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng <kito.ch...@gmail.com>
    wrote:
    >
    > Wait, VNx1DImode can be really evaluate to just one element if
    > -march=rv64g_zve64x,
    >
    > I thinks this should be just fixed on backend by this patch:
    >
    >
    
https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
    >
    > On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
    > <gcc-patches@gcc.gnu.org> wrote:
    > >
    > > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison
    <colli...@rivosinc.com> wrote:
    > > >
    > > > While working on autovectorizing for the RISCV port I
    encountered an issue
    > > > where can_duplicate_and_interleave_p assumes that
    GET_MODE_NUNITS is a
    > > > evenly divisible by two. The RISC-V target has vector modes
    (e.g. VNx1DImode),
    > > > where GET_MODE_NUNITS is equal to one.
    > > >
    > > > Tested on RISCV and x86_64-linux-gnu. Okay?
    > >
    > > OK.
    > >
    > > > 2023-03-09  Michael Collison <colli...@rivosinc.com>
    > > >
    > > >         * tree-vect-slp.cc (can_duplicate_and_interleave_p):
    > > >         Check that GET_MODE_NUNITS is a multiple of 2.
    > > > ---
    > > >  gcc/tree-vect-slp.cc | 7 +++++--
    > > >  1 file changed, 5 insertions(+), 2 deletions(-)
    > > >
    > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
    > > > index d73deaecce0..a64fe454e19 100644
    > > > --- a/gcc/tree-vect-slp.cc
    > > > +++ b/gcc/tree-vect-slp.cc
    > > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p
    (vec_info *vinfo, unsigned int count,
    > > >             (GET_MODE_BITSIZE (int_mode), 1);
    > > >           tree vector_type
    > > >             = get_vectype_for_scalar_type (vinfo, int_type,
    count);
    > > > +         poly_int64 half_nelts;
    > > >           if (vector_type
    > > >               && VECTOR_MODE_P (TYPE_MODE (vector_type))
    > > >               && known_eq (GET_MODE_SIZE (TYPE_MODE
    (vector_type)),
    > > > -                          GET_MODE_SIZE (base_vector_mode)))
    > > > +                          GET_MODE_SIZE (base_vector_mode))
    > > > +             && multiple_p (GET_MODE_NUNITS (TYPE_MODE
    (vector_type)),
    > > > +                            2, &half_nelts))
    > > >             {
    > > >               /* Try fusing consecutive sequences of COUNT /
    NVECTORS elements
    > > >                  together into elements of type INT_TYPE and
    using the result
    > > > @@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info
    *vinfo, unsigned int count,
    > > >               poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE
    (vector_type));
    > > >               vec_perm_builder sel1 (nelts, 2, 3);
    > > >               vec_perm_builder sel2 (nelts, 2, 3);
    > > > -             poly_int64 half_nelts = exact_div (nelts, 2);
    > > > +
    > > >               for (unsigned int i = 0; i < 3; ++i)
    > > >                 {
    > > >                   sel1.quick_push (i);
    > > > --
    > > > 2.34.1
    > > >

Reply via email to