On Thu, Jun 27, 2024 at 9:01 AM Li, Pan2 <[email protected]> wrote:
>
> It only requires the backend implement the standard name for vector mode I
> bet.
There are several standard names present for x86:
{ss,us}{add,sub}{v8qi,v16qi,v32qi,v64qi,v4hi,v8hi,v16hi,v32hi},
defined in sse.md:
(define_expand "<insn><mode>3<mask_name>"
[(set (match_operand:VI12_AVX2_AVX512BW 0 "register_operand")
(sat_plusminus:VI12_AVX2_AVX512BW
(match_operand:VI12_AVX2_AVX512BW 1 "vector_operand")
(match_operand:VI12_AVX2_AVX512BW 2 "vector_operand")))]
"TARGET_SSE2 && <mask_mode512bit_condition> && <mask_avx512bw_condition>"
"ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
but all of these handle only 8 and 16 bit elements.
> How about a simpler one like below.
>
> #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T) \
> void __attribute__((noinline)) \
> vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
> unsigned limit) \
> { \
> unsigned i; \
> for (i = 0; i < limit; i++) \
> { \
> IN_T x = op_1[i]; \
> out[i] = (OUT_T)(x >= y ? x - y : 0); \
> } \
> }
>
> DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t);
I tried with:
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t);
And the compiler was able to detect several .SAT_SUB patterns:
$ grep SAT_SUB pr51492-1.c.266t.optimized
vect_patt_37.14_85 = .SAT_SUB (vect_x_13.12_81, vect_cst__84);
vect_patt_37.14_86 = .SAT_SUB (vect_x_13.13_83, vect_cst__84);
vect_patt_42.26_126 = .SAT_SUB (vect_x_62.24_122, vect_cst__125);
vect_patt_42.26_127 = .SAT_SUB (vect_x_62.25_124, vect_cst__125);
iftmp.0_24 = .SAT_SUB (x_3, y_14(D));
Uros.
>
> The riscv backend is able to detect the pattern similar as below. I can help
> to check x86 side after the running test suites.
>
> ;; basic block 2, loop depth 0
> ;; pred: ENTRY
> if (limit_11(D) != 0)
> goto <bb 3>; [89.00%]
> else
> goto <bb 5>; [11.00%]
> ;; succ: 3
> ;; 5
> ;; basic block 3, loop depth 0
> ;; pred: 2
> vect_cst__71 = [vec_duplicate_expr] y_14(D);
> _78 = (unsigned long) limit_11(D);
> ;; succ: 4
>
> ;; basic block 4, loop depth 1
> ;; pred: 4
> ;; 3
> # vectp_op_1.7_68 = PHI <vectp_op_1.7_69(4), op_1_12(D)(3)>
> # vectp_out.12_75 = PHI <vectp_out.12_76(4), out_16(D)(3)>
> # ivtmp_79 = PHI <ivtmp_80(4), _78(3)>
> _81 = .SELECT_VL (ivtmp_79, POLY_INT_CST [2, 2]);
> ivtmp_67 = _81 * 8;
> vect_x_13.9_70 = .MASK_LEN_LOAD (vectp_op_1.7_68, 64B, { -1, ... }, _81, 0);
> vect_patt_48.10_72 = .SAT_SUB (vect_x_13.9_70, vect_cst__71);
> // .SAT_SUB pattern
> vect_patt_49.11_73 = (vector([2,2]) unsigned int) vect_patt_48.10_72;
> ivtmp_74 = _81 * 4;
> .MASK_LEN_STORE (vectp_out.12_75, 32B, { -1, ... }, _81, 0,
> vect_patt_49.11_73);
> vectp_op_1.7_69 = vectp_op_1.7_68 + ivtmp_67;
> vectp_out.12_76 = vectp_out.12_75 + ivtmp_74;
> ivtmp_80 = ivtmp_79 - _81;
>
> riscv64-unknown-elf-gcc (GCC) 15.0.0 20240627 (experimental)
> Copyright (C) 2024 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> Pan
>
> -----Original Message-----
> From: Uros Bizjak <[email protected]>
> Sent: Thursday, June 27, 2024 2:48 PM
> To: Li, Pan2 <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v2] Vect: Support truncate after .SAT_SUB pattern in zip
>
> On Mon, Jun 24, 2024 at 3:55 PM <[email protected]> wrote:
> >
> > From: Pan Li <[email protected]>
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> > unsigned a = 0;
> > register uint16_t *p = x;
> >
> > do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
> > } while (--n);
> > }
> >
No, the current compiler does not recognize .SAT_SUB for x86 with the
above code, although many vector sat sub instructions involving 16bit
elements are present.
Uros.