Re: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander

2021-10-08 Thread Andreas Krebbel via Gcc-patches
On 10/8/21 16:23, Stefan Schulze Frielinghaus wrote:
> On Thu, Oct 07, 2021 at 11:16:24AM +0200, Andreas Krebbel wrote:
>> On 9/20/21 11:24, Stefan Schulze Frielinghaus wrote:
>>> This patch implements the rawmemchr expander as introduced in
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html
>>>
>>> Bootstrapped and regtested in conjunction with the patch from above on
>>> IBM Z.  Ok for mainline?
>>>
>>
>>> From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001
>>> From: Stefan Schulze Frielinghaus 
>>> Date: Mon, 8 Feb 2021 10:35:39 +0100
>>> Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype.
>>> (s390_rawmemchrhi): Add prototype.
>>> (s390_rawmemchrsi): Add prototype.
>>> * config/s390/s390.c (s390_rawmemchr): New function.
>>> (s390_rawmemchrqi): New function.
>>> (s390_rawmemchrhi): New function.
>>> (s390_rawmemchrsi): New function.
>>> * config/s390/s390.md (rawmemchr): New expander.
>>> (rawmemchr): New expander.
>>> * config/s390/vector.md (vec_vfees): Basically a copy of
>>> the pattern vfees from vx-builtins.md.
>>> * config/s390/vx-builtins.md (*vfees): Remove.
>>
>> Thanks! Would it make sense to also extend the strlen and movstr expanders
>> we have to support the additional character modes?
> 
> For strlen-like loops over non-character arrays the current
> implementation in the loop distribution pass uses rawmemchr and
> computes pointer difference in order to compute the length.  Thus we get
> strlen for free and don't need to reimplement it.

Good to know. Thanks!

...
> Please find a new version attached.  I did another bootstrap+regtest on
> IBM Z.  Ok for mainline?
> 
> Thanks for your detailed review!

Ok for mainline. Thanks!

Andreas


Re: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander

2021-10-08 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Thu, Oct 07, 2021 at 11:16:24AM +0200, Andreas Krebbel wrote:
> On 9/20/21 11:24, Stefan Schulze Frielinghaus wrote:
> > This patch implements the rawmemchr expander as introduced in
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html
> > 
> > Bootstrapped and regtested in conjunction with the patch from above on
> > IBM Z.  Ok for mainline?
> > 
> 
> > From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001
> > From: Stefan Schulze Frielinghaus 
> > Date: Mon, 8 Feb 2021 10:35:39 +0100
> > Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander
> >
> > gcc/ChangeLog:
> >
> > * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype.
> > (s390_rawmemchrhi): Add prototype.
> > (s390_rawmemchrsi): Add prototype.
> > * config/s390/s390.c (s390_rawmemchr): New function.
> > (s390_rawmemchrqi): New function.
> > (s390_rawmemchrhi): New function.
> > (s390_rawmemchrsi): New function.
> > * config/s390/s390.md (rawmemchr): New expander.
> > (rawmemchr): New expander.
> > * config/s390/vector.md (vec_vfees): Basically a copy of
> > the pattern vfees from vx-builtins.md.
> > * config/s390/vx-builtins.md (*vfees): Remove.
> 
> Thanks! Would it make sense to also extend the strlen and movstr expanders
> we have to support the additional character modes?

For strlen-like loops over non-character arrays the current
implementation in the loop distribution pass uses rawmemchr and
computes pointer difference in order to compute the length.  Thus we get
strlen for free and don't need to reimplement it.

> 
> A few style comments below.
> 
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/s390/rawmemchr-1.c: New test.
> > ---
> >  gcc/config/s390/s390-protos.h   |  4 +
> >  gcc/config/s390/s390.c  | 89 ++
> >  gcc/config/s390/s390.md | 20 +
> >  gcc/config/s390/vector.md   | 26 ++
> >  gcc/config/s390/vx-builtins.md  | 26 --
> >  gcc/testsuite/gcc.target/s390/rawmemchr-1.c | 99 +
> >  6 files changed, 238 insertions(+), 26 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/rawmemchr-1.c
> >
> > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> > index 4b03c6e99f5..0d9619e8254 100644
> > --- a/gcc/config/s390/s390-protos.h
> > +++ b/gcc/config/s390/s390-protos.h
> > @@ -66,6 +66,10 @@ s390_asm_declare_function_size (FILE *asm_out_file,
> > const char *fnname ATTRIBUTE_UNUSED, tree decl);
> >  #endif
> >
> > +extern void s390_rawmemchrqi(rtx dst, rtx src, rtx pat);
> > +extern void s390_rawmemchrhi(rtx dst, rtx src, rtx pat);
> > +extern void s390_rawmemchrsi(rtx dst, rtx src, rtx pat);
> > +
> >  #ifdef RTX_CODE
> >  extern int s390_extra_constraint_str (rtx, int, const char *);
> >  extern int s390_const_ok_for_constraint_p (HOST_WIDE_INT, int, const char 
> > *);
> > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> > index 54dd6332c3a..1435ce156e2 100644
> > --- a/gcc/config/s390/s390.c
> > +++ b/gcc/config/s390/s390.c
> > @@ -16559,6 +16559,95 @@ s390_excess_precision (enum excess_precision_type 
> > type)
> >  }
> >  #endif
> >
> > +template  > + machine_mode elt_mode,
> > + rtx (*gen_vec_vfees) (rtx, rtx, rtx, rtx)>
> > +static void
> > +s390_rawmemchr(rtx dst, rtx src, rtx pat) {
> 
> I think it would be a bit easier to turn the vec_vfees expander into a
> 'parameterized name' and add the mode as parameter.  I'll attach a patch
> to illustrate how this might look like.

Right, didn't know about parameterized names which looks more clean to
me.  Thanks for the hint!

> 
> > +  rtx lens = gen_reg_rtx (V16QImode);
> > +  rtx pattern = gen_reg_rtx (vec_mode);
> > +  rtx loop_start = gen_label_rtx ();
> > +  rtx loop_end = gen_label_rtx ();
> > +  rtx addr = gen_reg_rtx (Pmode);
> > +  rtx offset = gen_reg_rtx (Pmode);
> > +  rtx tmp = gen_reg_rtx (Pmode);
> > +  rtx loadlen = gen_reg_rtx (SImode);
> > +  rtx matchlen = gen_reg_rtx (SImode);
> > +  rtx mem;
> > +
> > +  pat = GEN_INT (trunc_int_for_mode (INTVAL (pat), elt_mode));
> > +  emit_insn (gen_rtx_SET (pattern, gen_rtx_VEC_DUPLICATE (vec_mode, pat)));
> > +
> > +  emit_move_insn (addr, XEXP (src, 0));
> > +
> > +  // alignment
> > +  emit_insn (gen_vlbb (lens, gen_rtx_MEM (BLKmode, addr), GEN_INT (6)));
> > +  emit_insn (gen_lcbb (loadlen, addr, GEN_INT (6)));
> > +  lens = convert_to_mode (vec_mode, lens, 1);
> > +  emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (0)));
> > +  lens = convert_to_mode (V4SImode, lens, 1);
> > +  emit_insn (gen_vec_extractv4sisi (matchlen, lens, GEN_INT (1)));
> > +  lens = convert_to_mode (vec_mode, lens, 1);
> 
> That back and forth NOP conversion stuff is ugly but I couldn't find a
> more elegant way to write this without generating worse code.  Of
> course we want to benefit here from the fact that th

Re: [PATCH] IBM Z: Provide rawmemchr{qi,hi,si} expander

2021-10-07 Thread Andreas Krebbel via Gcc-patches
On 9/20/21 11:24, Stefan Schulze Frielinghaus wrote:
> This patch implements the rawmemchr expander as introduced in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579649.html
> 
> Bootstrapped and regtested in conjunction with the patch from above on
> IBM Z.  Ok for mainline?
> 

> From 551362cda54048dc1a51588112f11c070ed52020 Mon Sep 17 00:00:00 2001
> From: Stefan Schulze Frielinghaus 
> Date: Mon, 8 Feb 2021 10:35:39 +0100
> Subject: [PATCH 2/2] IBM Z: Provide rawmemchr{qi,hi,si} expander
>
> gcc/ChangeLog:
>
>   * config/s390/s390-protos.h (s390_rawmemchrqi): Add prototype.
>   (s390_rawmemchrhi): Add prototype.
>   (s390_rawmemchrsi): Add prototype.
>   * config/s390/s390.c (s390_rawmemchr): New function.
>   (s390_rawmemchrqi): New function.
>   (s390_rawmemchrhi): New function.
>   (s390_rawmemchrsi): New function.
>   * config/s390/s390.md (rawmemchr): New expander.
>   (rawmemchr): New expander.
>   * config/s390/vector.md (vec_vfees): Basically a copy of
>   the pattern vfees from vx-builtins.md.
>   * config/s390/vx-builtins.md (*vfees): Remove.

Thanks! Would it make sense to also extend the strlen and movstr expanders
we have to support the additional character modes?

A few style comments below.

>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/s390/rawmemchr-1.c: New test.
> ---
>  gcc/config/s390/s390-protos.h   |  4 +
>  gcc/config/s390/s390.c  | 89 ++
>  gcc/config/s390/s390.md | 20 +
>  gcc/config/s390/vector.md   | 26 ++
>  gcc/config/s390/vx-builtins.md  | 26 --
>  gcc/testsuite/gcc.target/s390/rawmemchr-1.c | 99 +
>  6 files changed, 238 insertions(+), 26 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/rawmemchr-1.c
>
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index 4b03c6e99f5..0d9619e8254 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -66,6 +66,10 @@ s390_asm_declare_function_size (FILE *asm_out_file,
>   const char *fnname ATTRIBUTE_UNUSED, tree decl);
>  #endif
>
> +extern void s390_rawmemchrqi(rtx dst, rtx src, rtx pat);
> +extern void s390_rawmemchrhi(rtx dst, rtx src, rtx pat);
> +extern void s390_rawmemchrsi(rtx dst, rtx src, rtx pat);
> +
>  #ifdef RTX_CODE
>  extern int s390_extra_constraint_str (rtx, int, const char *);
>  extern int s390_const_ok_for_constraint_p (HOST_WIDE_INT, int, const char *);
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 54dd6332c3a..1435ce156e2 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -16559,6 +16559,95 @@ s390_excess_precision (enum excess_precision_type 
> type)
>  }
>  #endif
>
> +template  +   machine_mode elt_mode,
> +   rtx (*gen_vec_vfees) (rtx, rtx, rtx, rtx)>
> +static void
> +s390_rawmemchr(rtx dst, rtx src, rtx pat) {

I think it would be a bit easier to turn the vec_vfees expander into a
'parameterized name' and add the mode as parameter.  I'll attach a patch
to illustrate how this might look like.

> +  rtx lens = gen_reg_rtx (V16QImode);
> +  rtx pattern = gen_reg_rtx (vec_mode);
> +  rtx loop_start = gen_label_rtx ();
> +  rtx loop_end = gen_label_rtx ();
> +  rtx addr = gen_reg_rtx (Pmode);
> +  rtx offset = gen_reg_rtx (Pmode);
> +  rtx tmp = gen_reg_rtx (Pmode);
> +  rtx loadlen = gen_reg_rtx (SImode);
> +  rtx matchlen = gen_reg_rtx (SImode);
> +  rtx mem;
> +
> +  pat = GEN_INT (trunc_int_for_mode (INTVAL (pat), elt_mode));
> +  emit_insn (gen_rtx_SET (pattern, gen_rtx_VEC_DUPLICATE (vec_mode, pat)));
> +
> +  emit_move_insn (addr, XEXP (src, 0));
> +
> +  // alignment
> +  emit_insn (gen_vlbb (lens, gen_rtx_MEM (BLKmode, addr), GEN_INT (6)));
> +  emit_insn (gen_lcbb (loadlen, addr, GEN_INT (6)));
> +  lens = convert_to_mode (vec_mode, lens, 1);
> +  emit_insn (gen_vec_vfees (lens, lens, pattern, GEN_INT (0)));
> +  lens = convert_to_mode (V4SImode, lens, 1);
> +  emit_insn (gen_vec_extractv4sisi (matchlen, lens, GEN_INT (1)));
> +  lens = convert_to_mode (vec_mode, lens, 1);

That back and forth NOP conversion stuff is ugly but I couldn't find a
more elegant way to write this without generating worse code.  Of
course we want to benefit here from the fact that the result operand
of vfees is already zero-extended.  Perhaps factor this out into a
utility function or an extra expander because we appear to need this
frequently?! Not a requirement for this patch though.

> +  emit_cmp_and_jump_insns (matchlen, loadlen, LT, NULL_RTX, SImode, 1, 
> loop_end);
> +  force_expand_binop (Pmode, and_optab, addr, GEN_INT (15), tmp, 1, 
> OPTAB_DIRECT);
> +  force_expand_binop (Pmode, sub_optab, GEN_INT (16), tmp, tmp, 1, 
> OPTAB_DIRECT);
> +  force_expand_binop (Pmode, add_optab, addr, tmp, addr, 1, OPTAB_DIRECT);

Couldn't we just do this as '(addr + 16) & ~0xf' her