On 21.06.2023 09:37, Hongtao Liu wrote:
> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Is there a reason why vec_dupv4sf uses sseshuf1 for its shuffle
>> alternatives, but *vec_dupv4si uses sselog1? I'd be happy to correct
>> this in whichever is the appropriate direction, while touching this
>> anyway.
> It should be sseshuf1(or sseshuf depending on input operands number in
> the pattern) for shufps, sselog means logical instructions.

Would you be okay for me to fold in that adjustment, or do you
insist on a separate patch?

>> I'm working from the assumption that the isa attributes to the original
>> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
>> or avx_noavx2 as applicable), as the new earlier alternatives cover all
>> operand forms already when at least AVX2 is enabled.
>>
>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> According to comments, yes, no extra prefix is needed.
> 
> ;; There are also additional prefixes in 3DNOW, SSSE3.
> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte,
> ;; sseiadd1,ssecvt1 to 0f7a with no DREX byte.
> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.

Right, that's what triggered my question. I guess dropping these
"prefix_extra" really wants to be a separate patch (or maybe even
multiple, but it's hard to see how to split), dealing with all of the
instances which likely have accumulated simply via copy-and-paste.

>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -26141,41 +26141,64 @@
>>         (const_int 1)))])
>>
>>  (define_insn "vec_dupv4sf"
>> -  [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
>> +  [(set (match_operand:V4SF 0 "register_operand" "=v,v,v,x")
>>         (vec_duplicate:V4SF
>> -         (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
>> +         (match_operand:SF 1 "nonimmediate_operand" "Yv,v,m,0")))]
>>    "TARGET_SSE"
>>    "@
>> -   vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
>> +   * return TARGET_AVX2 ? \"vbroadcastss\t{%1, %0|%0, %1}\" : 
>> \"vshufps\t{$0, %d1, %0|%0, %d1, 0}\";
>> +   vbroadcastss\t{%1, %g0|%g0, %1}
>>     vbroadcastss\t{%1, %0|%0, %1}
>>     shufps\t{$0, %0, %0|%0, %0, 0}"
>> -  [(set_attr "isa" "avx,avx,noavx")
>> -   (set_attr "type" "sseshuf1,ssemov,sseshuf1")
>> -   (set_attr "length_immediate" "1,0,1")
>> -   (set_attr "prefix_extra" "0,1,*")
>> -   (set_attr "prefix" "maybe_evex,maybe_evex,orig")
>> -   (set_attr "mode" "V4SF")])
>> +  [(set_attr "isa" "avx,*,avx,noavx")
>> +   (set (attr "type")
>> +       (cond [(and (eq_attr "alternative" "0")
>> +                   (match_test "!TARGET_AVX2"))
>> +                (const_string "sseshuf1")
>> +              (eq_attr "alternative" "3")
>> +                (const_string "sseshuf1")
>> +             ]
>> +             (const_string "ssemov")))
>> +   (set (attr "length_immediate")
>> +       (if_then_else (eq_attr "type" "sseshuf1")
>> +                     (const_string "1")
>> +                     (const_string "0")))
>> +   (set_attr "prefix_extra" "0,0,1,*")
>> +   (set_attr "prefix" "maybe_evex,evex,maybe_evex,orig")
>> +   (set_attr "mode" "V4SF,V16SF,V4SF,V4SF")
>> +   (set (attr "enabled")
>> +       (if_then_else (eq_attr "alternative" "1")
>> +                     (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
>> +                                  && !TARGET_PREFER_AVX256")
>> +                     (const_string "*")))])
>>
>>  (define_insn "*vec_dupv4si"
>> -  [(set (match_operand:V4SI 0 "register_operand"     "=v,v,x")
>> +  [(set (match_operand:V4SI 0 "register_operand"     "=v,v,v,v,x")
>>         (vec_duplicate:V4SI
>> -         (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
>> +         (match_operand:SI 1 "nonimmediate_operand" "Yvm,v,Yv,m,0")))]
>>    "TARGET_SSE"
>>    "@
>> +   vpbroadcastd\t{%1, %0|%0, %1}
>> +   vpbroadcastd\t{%1, %g0|%g0, %1}
>>     %vpshufd\t{$0, %1, %0|%0, %1, 0}
>>     vbroadcastss\t{%1, %0|%0, %1}
>>     shufps\t{$0, %0, %0|%0, %0, 0}"
>> -  [(set_attr "isa" "sse2,avx,noavx")
>> -   (set_attr "type" "sselog1,ssemov,sselog1")
>> -   (set_attr "length_immediate" "1,0,1")
>> -   (set_attr "prefix_extra" "0,1,*")
>> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
>> -   (set_attr "mode" "TI,V4SF,V4SF")
>> +  [(set_attr "isa" "avx2,*,sse2,avx,noavx")
>> +   (set_attr "type" "ssemov,ssemov,sselog1,ssemov,sselog1")
>> +   (set_attr "length_immediate" "0,0,1,0,1")
>> +   (set_attr "prefix_extra" "0,0,0,1,*")
>> +   (set_attr "prefix" "maybe_evex,evex,maybe_vex,maybe_evex,orig")
>> +   (set_attr "mode" "TI,XI,TI,V4SF,V4SF")
>>     (set (attr "preferred_for_speed")
>> -     (cond [(eq_attr "alternative" "1")
>> +     (cond [(eq_attr "alternative" "3")
>>               (symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC")
>>            ]
>> -          (symbol_ref "true")))])
>> +          (symbol_ref "true")))
>> +   (set (attr "enabled")
>> +       (if_then_else (eq_attr "alternative" "1")
>> +                     (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
>> +                                  && !TARGET_PREFER_AVX256")
>> +                     (const_string "*")))])
>>
>>  (define_insn "*vec_dupv2di"
>>    [(set (match_operand:V2DI 0 "register_operand"     "=x,v,v,v,x")
> Could you write a testcase for the newly added constraint?
> 
> You can use an inline assembly to explicitly use an evex xmm register.
> .i.e.
> 
> typedef float v4sf __attribute__((vector_size(16)));
> typedef int v4si __attribute__((vector_size(16)));
> v4sf
> foo (float a)
> {
> register float c __asm("xmm16") = a;
> asm volatile ("": "+v"(c));
> return __extension__(v4sf) {c, c, c, c};
> }
> 
> 
> v4si
> foo1 (int a)
> {
> register int c __asm("xmm16") = a;
> asm volatile ("": "+v"(c));
> return __extension__(v4si) {c, c, c, c};
> }
> 
> with your patch, the above testcase should generate
> vbroadcastss/vpbroadcastd %xmm16, %zmm0 with -mavx512f -mno-avx512vl
> -mprefer-vector-width=512.
> 
> Refer to  gcc/testsuite/gcc.target/i386/avx512bw-pack-2.c

I can certainly do that. Not having had a need to adjust any existing
tests made me conclude that the earlier alternatives also don't (all)
have a corresponding testcase.

Jan

Reply via email to