On Sat, Jun 8, 2024 at 3:41 PM Richard Sandiford <richard.sandif...@arm.com>
wrote:

> Mariam Arutunian <mariamarutun...@gmail.com> writes:
> > This patch introduces two new expanders for the aarch64 backend,
> > dedicated to generate optimized code for CRC computations.
> > The new expanders are designed to leverage specific hardware capabilities
> > to achieve faster CRC calculations,
> > particularly using the pmul or crc32 instructions when supported by the
> > target architecture.
>
> Thanks for porting this to aarch64!
>
> > Expander 1: Bit-Forward CRC (crc<ALLI:mode><ALLX:mode>4)
> > For targets that support pmul instruction (TARGET_AES),
> > the expander will generate code that uses the pmul (crypto_pmulldi)
> > instruction for CRC computation.
> >
> > Expander 2: Bit-Reversed CRC (crc_rev<ALLI:mode><ALLX:mode>4)
> > The expander first checks if the target supports the CRC32 instruction
> set
> > (TARGET_CRC32)
> > and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are
> met,
> > it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
> > crc32w, or crc32x depending on the data size).
> > If the target does not support crc32 but supports pmul, it then uses the
> > pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.
> >
> > Otherwise table-based CRC is generated.
> >
> >   gcc/config/aarch64/
> >
> >     * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
> > function declaration.
> >     (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> >     * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
> >     (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> >     * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
> >     (crc_rev<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
> >     (crc<ALLI:mode><ALLX:mode>4): New expander for reversed CRC.
> >     * iterators.md (crc_data_type): New mode attribute.
> >
> >   gcc/testsuite/gcc.target/aarch64/
> >
> >     * crc-1-pmul.c: Likewise.
> >     * crc-10-pmul.c: Likewise.
> >     * crc-12-pmul.c: Likewise.
> >     * crc-13-pmul.c: Likewise.
> >     * crc-14-pmul.c: Likewise.
> >     * crc-17-pmul.c: Likewise.
> >     * crc-18-pmul.c: Likewise.
> >     * crc-21-pmul.c: Likewise.
> >     * crc-22-pmul.c: Likewise.
> >     * crc-23-pmul.c: Likewise.
> >     * crc-4-pmul.c: Likewise.
> >     * crc-5-pmul.c: Likewise.
> >     * crc-6-pmul.c: Likewise.
> >     * crc-7-pmul.c: Likewise.
> >     * crc-8-pmul.c: Likewise.
> >     * crc-9-pmul.c: Likewise.
> >     * crc-CCIT-data16-pmul.c: Likewise.
> >     * crc-CCIT-data8-pmul.c: Likewise.
> >     * crc-coremark-16bitdata-pmul.c: Likewise.
> >     * crc-crc32-data16.c: New test.
> >     * crc-crc32-data32.c: Likewise.
> >     * crc-crc32-data8.c: Likewise.
> >
> > Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > index 1d3f94c813e..167e1140f0d 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree,
> rtx, int);
> >
> >  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
> >  void aarch64_restore_za (rtx);
> > +void aarch64_expand_crc_using_clmul (rtx *);
> > +void aarch64_expand_reversed_crc_using_clmul (rtx *);
> > +
> >
> >  #endif /* GCC_AARCH64_PROTOS_H */
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index ee12d8897a8..05cd0296d38 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname,
> bool write_p, bool is128op)
> >    return sysreg->encoding;
> >  }
> >
> > +/* Generate assembly to calculate CRC
> > +   using carry-less multiplication instruction.
> > +   OPERANDS[1] is input CRC,
> > +   OPERANDS[2] is data (message),
> > +   OPERANDS[3] is the polynomial without the leading 1.  */
> > +
> > +void
> > +aarch64_expand_crc_using_clmul (rtx *operands)
>
> This should probably be pmul rather than clmul.
>
> > +{
> > +  /* Check and keep arguments.  */
> > +  gcc_assert (!CONST_INT_P (operands[0]));
> > +  gcc_assert (CONST_INT_P (operands[3]));
> > +  rtx crc = operands[1];
> > +  rtx data = operands[2];
> > +  rtx polynomial = operands[3];
> > +
> > +  unsigned HOST_WIDE_INT
> > +      crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant
> ();
> > +  gcc_assert (crc_size <= 32);
> > +  unsigned HOST_WIDE_INT
> > +      data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();
>
> We could instead make the interface:
>
> void
> aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode,
>                                rtx *operands)
>
> so that the lines above don't need the to_constant.  This should "just
> work" on the .md file side, since the modes being passed are naturally
> scalar_mode.
>
> I think it'd be worth asserting also that data_size <= crc_size.
> (Although we could handle any MAX (data_size, crc_size) <= 32
> with some adjustment.)
>
> > +
> > +  /* Calculate the quotient.  */
> > +  unsigned HOST_WIDE_INT
> > +      q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size +
> 1);
> > +
> > +  /* CRC calculation's main part.  */
> > +  if (crc_size > data_size)
> > +    crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
> > +                     NULL_RTX, 1);
> > +
> > +  rtx t0 = gen_reg_rtx (DImode);
> > +  aarch64_emit_move (t0, gen_int_mode (q, DImode));
>
> It's only a minor simplification, but this could instead be:
>
>   rtx t0 = force_reg (DImode, gen_int_mode (q, DImode));
>
> > +  rtx t1 = gen_reg_rtx (DImode);
> > +  aarch64_emit_move (t1, polynomial);
>
> If polynomial is a constant operand of mode crc_mode, GCC's standard
> CONST_INT representation is to sign-extend the constant to 64 bits.
> E.g. a QImode value of 0b1000_0000 would be represented as -128.
>
> I think here we want the zero-extended form, so it might be safer to do:
>
>   polynomial = simplify_gen_unary (ZERO_EXTEND, DImode, polynomial,
> crc_mode);
>   rtx t1 = force_reg (DImode, polynomial);
>
> > +
> > +  rtx a0 = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1,
> > +                      OPTAB_WIDEN);
> > +
> > +  rtx clmul_res = gen_reg_rtx (TImode);
> > +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t0));
> > +  a0 = gen_lowpart (DImode, clmul_res);
> > +
> > +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, crc_size, NULL_RTX, 1);
> > +
> > +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t1));
> > +  a0 = gen_lowpart (DImode, clmul_res);
> > +
> > +  if (crc_size > data_size)
> > +    {
> > +      rtx crc_part = expand_shift (LSHIFT_EXPR, DImode, operands[1],
> data_size,
> > +                                NULL_RTX, 0);
> > +      a0 =  expand_binop (DImode, xor_optab, a0, crc_part, NULL_RTX, 1,
> > +                       OPTAB_DIRECT);
>
> Formatting nit: extra space after "a0 = "
>
> > +    }
> > +  /* Zero upper bits beyond crc_size.  */
> > +  rtx num_shift = gen_int_mode (64 - crc_size, DImode);
> > +  a0 = expand_shift (LSHIFT_EXPR, DImode, a0, 64 - crc_size,  NULL_RTX,
> 0);
> > +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, 64 - crc_size,  NULL_RTX,
> 1);
>
> Rather than shift left and then right, I think we should just AND:
>
>   rtx mask = gen_int_mode (GET_MODE_MASK (crc_mode), DImode);
>   a0 = expand_binop (DImode, and_optab, a0, mask, NULL_RTX, 1,
> OPTAB_DIRECT);
>
> That said, it looks like operands[0] has crc_mode.  The register bits
> above crc_size therefore shouldn't matter, since they're undefined on read.
> E.g. even though (reg:SI R) is stored in an X register, only the low 32
> bits are defined; the upper 32 bits can be any value.
>
> So I'd expect we could replace this and...
>
> > +
> > +  rtx tgt = simplify_gen_subreg (DImode, operands[0],
> > +                              GET_MODE (operands[0]), 0);
> > +  aarch64_emit_move (tgt, a0);
>
> ...this with just:
>
>   aarch64_emitmove (operands[0], gen_lowpart (crc_mode, a0));
>
> Perhaps that would break down if operands[0] is a subreg with
> SUBREG_PROMOTED_VAR_P set, but I think it's up to target-independent
> code to handle that case.
>
> > @@ -4543,6 +4545,63 @@
> >    [(set_attr "type" "crc")]
> >  )
> >
> > +;; Reversed CRC
> > +(define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
> > +      ;; return value (calculated CRC)
> > +  [(set (match_operand:ALLX 0 "register_operand" "=r")
> > +                   ;; initial CRC
> > +     (unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r")
> > +                   ;; data
> > +                   (match_operand:ALLI 2 "register_operand" "r")
> > +                   ;; polynomial without leading 1
> > +                   (match_operand:ALLX 3)]
> > +     UNSPEC_CRC_REV))]
>
> Since we (rightly) never generate the RTL above, I think this can just be:
>
> (define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
>   [;; return value (calculated CRC)
>    (match_operand:ALLX 0 "register_operand")
>    ;; initial CRC
>    (match_operand:ALLX 1 "register_operand")
>    ;; data
>    (match_operand:ALLI 2 "register_operand")
>    ;; polynomial without leading 1
>    (match_operand:ALLX 3)]
>
> without the unspec and constraints.
>
> > +  ""
> > +  {
> > +    /* If the polynomial is the same as the polynomial of crc32
> instruction,
> > +       put that instruction.  crc32 uses iSCSI polynomial
> (0x1EDC6F41).  */
> > +    if (TARGET_CRC32 && INTVAL (operands[3]) == 517762881)
>
> The hex constant feels a little easier to read.  I think it'd also
> be worth checking <ALLX:MODE>mode == SImode, even though it's currently
> redundant (given that no other choice would allow that polynomial).
>
> > +      {
> > +     rtx crc_result = gen_reg_rtx (SImode);
> > +     rtx crc = operands[1];
> > +     rtx data = operands[2];
> > +     emit_insn (gen_aarch64_crc32c<ALLI:crc_data_type> (crc_result, crc,
> > +                                                        data));
> > +     emit_move_insn (operands[0],
> > +                     gen_lowpart (GET_MODE (operands[0]), crc_result));
>
> If operands[0] has ALLX mode (== SImode), it looks like we should be
> able to use operands[0] directly as the result of the CRC32C.
>
> FWIW, there's also CRC32 for the HDLC etc. polynomial 0x04C11DB7.
>
> > +      }
> > +    else if (TARGET_AES)
>
> I think we also need to check <ALLI:sizen> <= <ALLX:sizen> for this.
> Similarly for the unreversed CRC pattern.
>
> Thanks again for doing this.  I realise RISC-V is the lead target for
> this work, so you've gone above and beyond by doing a full AArch64
> port too.  It'd be perfectly valid to ask Arm developers to deal
> with the comments above, so please let me know if you'd prefer that.
> The patch looks close to ready to me though.
>

Thanks for your suggestions and explanations, and thank you for recognizing
my work. I'll resolve all the comments.


Best regards,
Mariam


> Richard
>
> > +      aarch64_expand_reversed_crc_using_clmul (operands);
> > +    else
> > +      {
> > +     /* Otherwise, generate table-based CRC.  */
> > +     expand_reversed_crc_table_based (operands[0], operands[1],
> operands[2],
> > +                                      operands[3], GET_MODE
> (operands[2]),
> > +
> generate_reflecting_code_standard);
> > +      }
> > +    DONE;
> > +  }
> > +)
> > +
> > +;; Bit-forward CRC
> > +(define_expand "crc<ALLI:mode><ALLX:mode>4"
> > +      ;; return value (calculated CRC)
> > +  [(set (match_operand:ALLX 0 "register_operand" "=r")
> > +                   ;; initial CRC
> > +     (unspec:ALLX [(match_operand:ALLX 1 "register_operand" "r")
> > +                   ; data
> > +                   (match_operand:ALLI 2 "register_operand" "r")
> > +                   ;; polynomial without leading 1
> > +                   (match_operand:ALLX 3)]
> > +     UNSPEC_CRC))]
> > +  "TARGET_AES"
> > +  {
> > +    aarch64_expand_crc_using_clmul (operands);
> > +    DONE;
> > +  }
> > +)
> > +
> > +
> >  (define_insn "*csinc2<mode>_insn"
> >    [(set (match_operand:GPI 0 "register_operand" "=r")
> >          (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
> > diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> > index 99cde46f1ba..86e4863d684 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -1276,6 +1276,10 @@
> >  ;; Map a mode to a specific constraint character.
> >  (define_mode_attr cmode [(QI "q") (HI "h") (SI "s") (DI "d")])
> >
> > +;; Map a mode to a specific constraint character for calling
> > +;; appropriate version of crc.
> > +(define_mode_attr crc_data_type [(QI "b") (HI "h") (SI "w") (DI "x")])
> > +
> >  ;; Map modes to Usg and Usj constraints for SISD right shifts
> >  (define_mode_attr cmode_simd [(SI "g") (DI "j")])
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> > new file mode 100644
> > index 00000000000..2bea6280762
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-1-pmul.c
> > @@ -0,0 +1,8 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-1.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> > new file mode 100644
> > index 00000000000..846eecbaa85
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-10-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-10.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> > new file mode 100644
> > index 00000000000..0eea6aa6741
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-12-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details -fdisable-tree-phiopt2 -fdisable-tree-phiopt3" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-12.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> > new file mode 100644
> > index 00000000000..7ff8fbcb665
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-13-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-13.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> > new file mode 100644
> > index 00000000000..80766daf487
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-14-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-14.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> > new file mode 100644
> > index 00000000000..0e32fffa0b6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-17-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-17.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> > new file mode 100644
> > index 00000000000..87f4c63b5ea
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-18-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-18.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> > new file mode 100644
> > index 00000000000..6eeac8cf97f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-21-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-21.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> > new file mode 100644
> > index 00000000000..76e3c00ce9f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-22-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-22.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> > new file mode 100644
> > index 00000000000..e3a5e99ffba
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-23-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-23.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> > new file mode 100644
> > index 00000000000..528006c0099
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-4-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-4.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> > new file mode 100644
> > index 00000000000..41e1f8202bc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-5-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -w -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-5.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> > new file mode 100644
> > index 00000000000..83db99ccb8b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-6-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-6.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> > new file mode 100644
> > index 00000000000..7ad777aac8c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-7-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-7.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> > new file mode 100644
> > index 00000000000..da1b619c418
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-8-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-8.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> > new file mode 100644
> > index 00000000000..33bbe0bfb26
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-9-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-9.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> > new file mode 100644
> > index 00000000000..0c452c1c0f4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data16-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-CCIT-data16.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> > new file mode 100644
> > index 00000000000..87a0b4489a2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-CCIT-data8-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-CCIT-data8.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git
> a/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> > new file mode 100644
> > index 00000000000..75ed5aff80b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-coremark-16bitdata-pmul.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-w -march=armv8-a+crypto -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include "../../gcc.c-torture/execute/crc-coremark16-data16.c"
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "pmull" "dfinish"} } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> > new file mode 100644
> > index 00000000000..d5aeee7c0c4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data16.c
> > @@ -0,0 +1,53 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +
> > +__attribute__ ((noinline,optimize(0)))
> > +uint32_t _crc32_O0 (uint32_t crc, uint16_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +uint32_t _crc32 (uint32_t crc, uint16_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +int main ()
> > +{
> > +  uint32_t crc = 0x0D800D80;
> > +  for (uint16_t i = 0; i < 0xffff; i++)
> > +    {
> > +      uint32_t res1 = _crc32_O0 (crc, i);
> > +      uint32_t res2 = _crc32 (crc, i);
> > +      if (res1 != res2)
> > +     abort ();
> > +      crc = res1;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> > new file mode 100644
> > index 00000000000..f0e319b3ab8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data32.c
> > @@ -0,0 +1,52 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +__attribute__ ((noinline,optimize(0)))
> > +uint32_t _crc32_O0 (uint32_t crc, uint32_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 32; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +uint32_t _crc32 (uint32_t crc, uint32_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 32; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +int main ()
> > +{
> > +  uint32_t crc = 0x0D800D80;
> > +  for (uint8_t i = 0; i < 0xff; i++)
> > +    {
> > +      uint32_t res1 = _crc32_O0 (crc, i);
> > +      uint32_t res2 = _crc32 (crc, i);
> > +      if (res1 != res2)
> > +     abort ();
> > +      crc = res1;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> > new file mode 100644
> > index 00000000000..95ffde6a9d2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/crc-crc32-data8.c
> > @@ -0,0 +1,53 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-march=armv8-a+crc -O2 -fdump-rtl-dfinish
> -fdump-tree-crc-details" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto"} } */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +
> > +__attribute__ ((noinline,optimize(0)))
> > +uint32_t _crc32_O0 (uint32_t crc, uint8_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +uint32_t _crc32 (uint32_t crc, uint8_t data) {
> > +  int i;
> > +  crc = crc ^ data;
> > +
> > +  for (i = 0; i < 8; i++) {
> > +      if (crc & 1)
> > +     crc = (crc >> 1) ^ 0x82F63B78;
> > +      else
> > +     crc = (crc >> 1);
> > +    }
> > +
> > +  return crc;
> > +}
> > +
> > +int main ()
> > +{
> > +  uint32_t crc = 0x0D800D80;
> > +  for (uint8_t i = 0; i < 0xff; i++)
> > +    {
> > +      uint32_t res1 = _crc32_O0 (crc, i);
> > +      uint32_t res2 = _crc32 (crc, i);
> > +      if (res1 != res2)
> > +     abort ();
> > +      crc = res1;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "calculates CRC!" "crc"} } */
> > +/* { dg-final { scan-tree-dump-times "Couldn't generate faster CRC
> code." 0 "crc"} } */
> > +/* { dg-final { scan-rtl-dump "UNSPEC_CRC32" "dfinish"} } */
> > +/* { dg-final { scan-rtl-dump-times "pmull" 0 "dfinish"} } */
>

Reply via email to