I have added tests for memset and memcpy.
Additionally, I have added a test to ensure that -mstrict-align is
still working.
I've run the complete GCC test suite with and without the resulting
patch with no regressions
(rv32imac/ilp32/medlow, rv32imafdc/ilp32d/medlow,
rv64imac/lp64/medlow, rv64imafdc/lp64d/medlow).

I have not changed the patch itself, because I think it is reasonable
to assume that overlapping access
(i.e. reducing store instructions) will always be cheaper on targets,
that don't have penalties for unaligned accesses.
If maintainers disagree, I can change accordingly.

The new patch can be found here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html


On Thu, Jul 22, 2021 at 8:23 PM Christoph Müllner <cmuell...@gcc.gnu.org> wrote:
>
> On Thu, Jul 22, 2021 at 7:27 PM Palmer Dabbelt <pal...@dabbelt.com> wrote:
> >
> > On Thu, 22 Jul 2021 06:29:46 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> > > Could you add a testcase? Otherwise LGTM.
> > >
> > > Option: -O2 -mtune=thead-c906 -march=rv64gc -mabi=lp64
> > > void foo(char *dst){
> > >    __builtin_memset(dst, 0, 15);
> > > }
> >
> > I'd like to see:
> >
> > * Test results.  This is only on for one target right now, so relying on
> >   it to just work on others isn't a good idea.
>
> So you prefer the previous version of the patch, where each
> tune struct can decide if that feature should be enabled or not?
>
> > * Something to demonstrate this doesn't break -mstrict-align.
>
> -mstrict-align sets riscv_slow_unaligned_access_p to true.
> This will be returned by TARGET_SLOW_UNALIGNED_ACCESS.
> So, this cannot happen. I will add an additional test for that.
>
> > * Some sort of performance analysis.  Most machines that support
> >   unaligned access do so with some performance degredation, it's unclear
> >   that this conversion will actually manifst a performance improvement.
> >   I don't have a C906 and don't know what workloads people care about
> >   running on one, but I'm certainly not convinced this is a win --
> >   what's listed here looks to be the best case, and that's only saving
> >   two instructions to generate a pretty pathological sequence
> >   (misaligned access that conflicts with a prior store).
> >
> > Jojo: do you have any description of the C906 pipeline?  Specifically in
> > this case it'd be good to know how it handles unaligned accesses.
>
> The tune struct already includes the field slow_unaligned_access.
> For c906 this is set to false. So the answer is: c906 handles unaligned
> access reasonably well (assuming the contents of its tune struct are correct).
>
> Note, that the overlapping access only happens if unaligned accesses
> are allowed.
> If slow_unaligned_access is set, then you will get a sequence of
> store-byte instructions.
>
> >
> > >
> > > On Thu, Jul 22, 2021 at 8:53 PM Christoph Muellner via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >> This patch enables the overlap-by-pieces feature of the by-pieces
> > >> infrastructure for inlining builtins in case the target has set
> > >> riscv_slow_unaligned_access_p to false.
> > >>
> > >> To demonstrate the effect for targets with fast unaligned access,
> > >> the following code sequences are generated for a 15-byte memset-zero.
> > >>
> > >> Without overlap_op_by_pieces we get:
> > >>   8e:   00053023                sd      zero,0(a0)
> > >>   92:   00052423                sw      zero,8(a0)
> > >>   96:   00051623                sh      zero,12(a0)
> > >>   9a:   00050723                sb      zero,14(a0)
> > >>
> > >> With overlap_op_by_pieces we get:
> > >>   7e:   00053023                sd      zero,0(a0)
> > >>   82:   000533a3                sd      zero,7(a0)
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>         * config/riscv/riscv.c (riscv_overlap_op_by_pieces): New 
> > >> function.
> > >>         (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
> > >>         riscv_overlap_op_by_pieces.
> > >>
> > >> Signed-off-by: Christoph Muellner <cmuell...@gcc.gnu.org>
> > >> ---
> > >>  gcc/config/riscv/riscv.c | 11 +++++++++++
> > >>  1 file changed, 11 insertions(+)
> > >>
> > >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> > >> index 576960bb37c..98c76ba657a 100644
> > >> --- a/gcc/config/riscv/riscv.c
> > >> +++ b/gcc/config/riscv/riscv.c
> > >> @@ -5201,6 +5201,14 @@ riscv_slow_unaligned_access (machine_mode, 
> > >> unsigned int)
> > >>    return riscv_slow_unaligned_access_p;
> > >>  }
> > >>
> > >> +/* Implement TARGET_OVERLAP_OP_BY_PIECES_P.  */
> > >> +
> > >> +static bool
> > >> +riscv_overlap_op_by_pieces (void)
> > >> +{
> > >> +  return !riscv_slow_unaligned_access_p;
> > >> +}
> > >> +
> > >>  /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
> > >>
> > >>  static bool
> > >> @@ -5525,6 +5533,9 @@ riscv_asan_shadow_offset (void)
> > >>  #undef TARGET_SLOW_UNALIGNED_ACCESS
> > >>  #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
> > >>
> > >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> > >> +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
> > >> +
> > >>  #undef TARGET_SECONDARY_MEMORY_NEEDED
> > >>  #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed
> > >>
> > >> --
> > >> 2.31.1
> > >>

Reply via email to