On 24 Oct 19:14, Uros Bizjak wrote: > On Tue, Oct 24, 2017 at 4:46 PM, Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Oct 24, 2017 at 05:44:44AM -0700, H.J. Lu wrote: > >> > What I can see from config/atom.md: > >> > ;; if palignr or psrldq > >> > (define_insn_reservation "atom_sseishft_2" 1 > >> > (and (eq_attr "cpu" "atom") > >> > (and (eq_attr "type" "sseishft") > >> > (and (eq_attr "atom_unit" "sishuf") > >> > (match_operand 2 "immediate_operand")))) > >> > "atom-simple-0") > >> > > >> > This leads back to initial commit of atom.md. > >> > So, discrimination of psrldq and pslldq looks intentional. > >> > > >> > On the over hand, I see in Software Optimization Guide, Table 14-2 that > >> > PSRLDQ and PSLLDQ occupy same line which directs both insns to port-0 (p > >> > 14-18). > >> > So, looking from that point, definition for PSLLDQ which allow either of > >> > port-0 > >> > and port-1 looks wrong (atom-simple-either reservation). > >> > > >> > In absence of other information, I'd play on safe side and leave things > >> > as they > >> > occur right now. > >> > > >> > >> I prefer to leave atom.md ASIS. As for (set_attr "atom_unit" > >> "sishuf"), it was added > >> for > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44615 > >> > >> You can drop (set_attr "atom_unit" "sishuf") if > >> gcc.target/i386/sse2-vec-2a.c > >> still compiles. > > > > No, it was added earlier than that, that PR was about insns with psrldq with > > implicit immediate (which don't have a CONST_INT operands[2]). This insn > > does have it, the testcase passes regardless of whether sishuf or other is > > used, it is purely a tuning thing. > > > > In any case, here is an updated patch that just preserves the status quo > > (psrldq having the sishuf unit, pslldq not) using a simple code attribute. > > Agner Fog's tables confirm Jakub's observation: > > PSLL/RL/RAW/D/Q (x)mm,(x)mm 2 FP0 5 5 > PSLL/RL/RAW/D/Q (x)xmm,i 1 FP0 1 1 > PSLL/RLDQ xmm,i 1 FP0 1 1 > > I fail to see how could left and right shifts use different units. > Since the test passes, let's change pslldq to use sishuf unit. There > is no better alternative from the list of units. Then I bet your patch is OK for main trunk.
-- Thanks, K > > > 2017-10-24 Jakub Jelinek <ja...@redhat.com> > > > > PR target/82370 > > * config/i386/sse.md (VIMAX_AVX2): Remove V4TImode. > > (VIMAX_AVX2_AVX512BW, VIMAX_AVX512VL): New mode iterators. > > (vec_shl_<mode>): Remove unused expander. > > (avx512bw_<shift_insn><mode>3): New define_insn. > > (atom_shift_unit): New code iterator. > > (<sse2_avx2>_ashl<mode>3, <sse2_avx2>_lshr<mode>3): Replaced by ... > > (<sse2_avx2>_<shift_insn><mode>3): ... this. New define_insn. > > > > * gcc.target/i386/pr82370.c: New test. > > OK with the change od pslldq's unit to sishuf. > > Thanks, > Uros. > > > --- gcc/config/i386/sse.md.jj 2017-10-20 16:30:35.286208652 +0200 > > +++ gcc/config/i386/sse.md 2017-10-24 16:29:54.848934888 +0200 > > @@ -371,10 +371,17 @@ (define_mode_iterator V16FI > > [V16SF V16SI]) > > > > ;; ??? We should probably use TImode instead. > > -(define_mode_iterator VIMAX_AVX2 > > +(define_mode_iterator VIMAX_AVX2_AVX512BW > > [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI]) > > > > -;; ??? This should probably be dropped in favor of VIMAX_AVX2. > > +;; Suppose TARGET_AVX512BW as baseline > > +(define_mode_iterator VIMAX_AVX512VL > > + [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")]) > > + > > +(define_mode_iterator VIMAX_AVX2 > > + [(V2TI "TARGET_AVX2") V1TI]) > > + > > +;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW. > > (define_mode_iterator SSESCALARMODE > > [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI]) > > > > @@ -10778,9 +10785,9 @@ (define_insn "<shift_insn><mode>3<mask_n > > (set_attr "mode" "<sseinsnmode>")]) > > > > > > -(define_expand "vec_shl_<mode>" > > +(define_expand "vec_shr_<mode>" > > [(set (match_dup 3) > > - (ashift:V1TI > > + (lshiftrt:V1TI > > (match_operand:VI_128 1 "register_operand") > > (match_operand:SI 2 "const_0_to_255_mul_8_operand"))) > > (set (match_operand:VI_128 0 "register_operand") (match_dup 4))] > > @@ -10791,48 +10798,26 @@ (define_expand "vec_shl_<mode>" > > operands[4] = gen_lowpart (<MODE>mode, operands[3]); > > }) > > > > -(define_insn "<sse2_avx2>_ashl<mode>3" > > - [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v") > > - (ashift:VIMAX_AVX2 > > - (match_operand:VIMAX_AVX2 1 "register_operand" "0,v") > > - (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))] > > - "TARGET_SSE2" > > +(define_insn "avx512bw_<shift_insn><mode>3" > > + [(set (match_operand:VIMAX_AVX512VL 0 "register_operand" "=v") > > + (any_lshift:VIMAX_AVX512VL > > + (match_operand:VIMAX_AVX512VL 1 "nonimmediate_operand" "vm") > > + (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))] > > + "TARGET_AVX512BW" > > { > > operands[2] = GEN_INT (INTVAL (operands[2]) / 8); > > - > > - switch (which_alternative) > > - { > > - case 0: > > - return "pslldq\t{%2, %0|%0, %2}"; > > - case 1: > > - return "vpslldq\t{%2, %1, %0|%0, %1, %2}"; > > - default: > > - gcc_unreachable (); > > - } > > + return "vp<vshift>dq\t{%2, %1, %0|%0, %1, %2}"; > > } > > - [(set_attr "isa" "noavx,avx") > > - (set_attr "type" "sseishft") > > + [(set_attr "type" "sseishft") > > (set_attr "length_immediate" "1") > > - (set_attr "prefix_data16" "1,*") > > - (set_attr "prefix" "orig,vex") > > + (set_attr "prefix" "maybe_evex") > > (set_attr "mode" "<sseinsnmode>")]) > > > > -(define_expand "vec_shr_<mode>" > > - [(set (match_dup 3) > > - (lshiftrt:V1TI > > - (match_operand:VI_128 1 "register_operand") > > - (match_operand:SI 2 "const_0_to_255_mul_8_operand"))) > > - (set (match_operand:VI_128 0 "register_operand") (match_dup 4))] > > - "TARGET_SSE2" > > -{ > > - operands[1] = gen_lowpart (V1TImode, operands[1]); > > - operands[3] = gen_reg_rtx (V1TImode); > > - operands[4] = gen_lowpart (<MODE>mode, operands[3]); > > -}) > > +(define_code_attr atom_shift_unit [(ashift "*") (lshiftrt "sishuf")]) > > > > -(define_insn "<sse2_avx2>_lshr<mode>3" > > +(define_insn "<sse2_avx2>_<shift_insn><mode>3" > > [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v") > > - (lshiftrt:VIMAX_AVX2 > > + (any_lshift:VIMAX_AVX2 > > (match_operand:VIMAX_AVX2 1 "register_operand" "0,v") > > (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))] > > "TARGET_SSE2" > > @@ -10842,9 +10827,9 @@ (define_insn "<sse2_avx2>_lshr<mode>3" > > switch (which_alternative) > > { > > case 0: > > - return "psrldq\t{%2, %0|%0, %2}"; > > + return "p<vshift>dq\t{%2, %0|%0, %2}"; > > case 1: > > - return "vpsrldq\t{%2, %1, %0|%0, %1, %2}"; > > + return "vp<vshift>dq\t{%2, %1, %0|%0, %1, %2}"; > > default: > > gcc_unreachable (); > > } > > @@ -10852,7 +10837,7 @@ (define_insn "<sse2_avx2>_lshr<mode>3" > > [(set_attr "isa" "noavx,avx") > > (set_attr "type" "sseishft") > > (set_attr "length_immediate" "1") > > - (set_attr "atom_unit" "sishuf") > > + (set_attr "atom_unit" "<atom_shift_unit>") > > (set_attr "prefix_data16" "1,*") > > (set_attr "prefix" "orig,vex") > > (set_attr "mode" "<sseinsnmode>")]) > > --- gcc/testsuite/gcc.target/i386/pr82370.c.jj 2017-10-24 > > 16:22:16.665464886 +0200 > > +++ gcc/testsuite/gcc.target/i386/pr82370.c 2017-10-24 > > 16:22:16.665464886 +0200 > > @@ -0,0 +1,18 @@ > > +/* PR target/82370 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -mavx512vl -mavx512bw -masm=att" } */ > > +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, > > \\(%\[a-z0-9,]*\\), %xmm\[0-9]\+" 1 } } */ > > +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, > > \\(%\[a-z0-9,]*\\), %xmm\[0-9]\+" 1 } } */ > > +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, > > \\(%\[a-z0-9,]*\\), %ymm\[0-9]\+" 1 } } */ > > +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, > > \\(%\[a-z0-9,]*\\), %ymm\[0-9]\+" 1 } } */ > > +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, > > \\(%\[a-z0-9,]*\\), %zmm\[0-9]\+" 1 } } */ > > +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, > > \\(%\[a-z0-9,]*\\), %zmm\[0-9]\+" 1 } } */ > > + > > +#include <x86intrin.h> > > + > > +__m512i f1 (__m512i *x) { return _mm512_bslli_epi128 (*x, 5); } > > +__m512i f2 (__m512i *x) { return _mm512_bsrli_epi128 (*x, 5); } > > +__m256i f3 (__m256i *x) { return _mm256_bslli_epi128 (*x, 5); } > > +__m256i f4 (__m256i *x) { return _mm256_bsrli_epi128 (*x, 5); } > > +__m128i f5 (__m128i *x) { return _mm_bslli_si128 (*x, 5); } > > +__m128i f6 (__m128i *x) { return _mm_bsrli_si128 (*x, 5); } > > > > > > Jakub