On Thu, May 28, 2020 at 11:37 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Thu, May 28, 2020 at 8:00 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: > > > > "Yangfei (Felix)" <felix.y...@huawei.com> writes: > > > Thanks for reviewing this. > > > Attached please find the v5 patch. > > > Note: we also need to modify local variable "mode" once we catch one > > > case. I see test failure without this change. > > > > Looks good. Patch is OK assuming the x86 folks don't want to rewrite > > gcc.target/i386/pr67609.c to avoid the new optimisation. I'll hold off > > applying until the AVX512 thing is sorted. > > > > > Bootstrapped and tested on aarch64-linux-gnu. > > > Also bootstrapped on x86_64-linux-gnu. Regression test show two failed > > > tests on this platform: > > > > Thanks for the extra testing. > > > > > 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors) > > > 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa > > > > > > I have adjust the second one in the v4 patch. > > > > So this is: > > > > movdqa reg(%rip), %xmm1 > > movaps %xmm1, -24(%rsp) > > movsd %xmm0, -24(%rsp) > > movapd -24(%rsp), %xmm2 > > movaps %xmm2, reg(%rip) > > ret > > > > to: > > > > movq %xmm0, reg(%rip) > > ret > > > > Nice. I think it's safe to say that's an improvement :-) > > > > I don't know whether this means we're no longer testing what the test > > was intended to test. Maybe one of the x86 folks has an opinion about > > whether we should instead rewrite the test somehow. > > > > > But The first one looks strange to me. > > > I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like: > > > > > > 125 vcvtps2ph $0, %zmm0, -112(%rbp){%k1} > > > 126 vcvtps2ph $0, %zmm0, -80(%rbp){%k1}{z} > > > > > > This happens in the combine phase, where an combined insn looks like: > > > > > > 1989 Trying 31 -> 33: > > > 1990 31: r106:V16HI=vec_merge(unspec[r103:V16SF,0] > > > 133,[frame:DI-0x60],r109:HI) > > > 1991 33: [frame:DI-0x60]=r106:V16HI > > > 1992 REG_DEAD r106:V16HI > > > 1993 Successfully matched this instruction: > > > 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > > 1995 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32 > > > A256]) > > > 1996 (vec_merge:V16HI (unspec:V16HI [ > > > 1997 (reg:V16SF 103) > > > 1998 (const_int 0 [0]) > > > 1999 ] UNSPEC_VCVTPS2PH) > > > 2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > > 2001 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 > > > S32 A256]) > > > 2002 (reg:HI 109))) > > > 2003 allowing combination of insns 31 and 33 > > > 2004 original costs 16 + 4 = 20 > > > 2005 replacement cost 16 > > > 2006 deferring deletion of insn with uid = 31. > > > 2007 modifying insn i3 33: > > > [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0] > > > 133,[frame:DI-0x60],r109:HI) > > > 2008 deferring rescan insn with uid = 33. > > > > > > And this can be matched with pattern: avx512f_vcvtps2ph512_mask > > > 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > > 2283 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 > > > S32 A256]) > > > 2284 (vec_merge:V16HI (unspec:V16HI [ > > > 2285 (reg:V16SF 103) > > > 2286 (const_int 0 [0]) > > > 2287 ] UNSPEC_VCVTPS2PH) > > > 2288 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame) > > > 2289 (const_int -96 [0xffffffffffffffa0])) [2 > > > res2.x+0 S32 A256]) > > > 2290 (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324 > > > {avx512f_vcvtps2ph512_mask} > > > 2291 (nil)) > > > > > > gcc/config/i386/sse.md: > > > 21663 (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>" > > > 21664 [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm") > > > 21665 (unspec:V16HI > > > 21666 [(match_operand:V16SF 1 "register_operand" "v") > > > 21667 (match_operand:SI 2 "const_0_to_255_operand" "N")] > > > 21668 UNSPEC_VCVTPS2PH))] > > > 21669 "TARGET_AVX512F" > > > 21670 "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}" > > > 21671 [(set_attr "type" "ssecvt") > > > 21672 (set_attr "prefix" "evex") > > > 21673 (set_attr "mode" "V16SF")]) > > > > > > How can that happen? > > > > This is due to define_subst magic. The generators automatically > > create a vec_merge form of the instruction based on the information > > in the <mode_*> attributes. > > > > AFAICT the rtl above is for the line-125 instruction, which looks ok. > > The problem is the line-126 instruction, since vcvtps2ph doesn't > > AIUI allow zero masking. > >
zero masking is not allowed for mem_operand here, but available for register_operand. there's something wrong in the pattern, we need to fix it. (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>" > > The "mask" define_subst allows both zeroing and merging, > > so I guess this means that the pattern should either be using > > a different define_subst, or should be enforcing merging in > > some other way. Please could one of the x86 devs take a look? > > > > Hongtao, can you take a look? > > Thanks. > > > -- > H.J. BTW, i failed to build gcc when apply pr95254-v4.txt. gcc configure: Using built-in specs. COLLECT_GCC=./gcc/xgcc Target: x86_64-pc-linux-gnu Configured with: ../../gcc/gnu-toolchain/gcc/configure --enable-languages=c,c++,fortran --disable-bootstrap Thread model: posix Supported LTO compression algorithms: zlib gcc version 11.0.0 20200528 (experimental) (GCC) host on x86_64 rel8. error message: during RTL pass: expand ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_eq_dd.c: In function ‘__bid_eqdd2’: ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_eq_dd.c:35:9: internal compiler error: in emit_single_push_insn, at expr.c:4405 35 | res = __bid64_quiet_equal (ux.i, uy.i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ during RTL pass: expand ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_gt_dd.c: In function ‘__bid_gtdd2’: ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_gt_dd.c:35:9: internal compiler error: in emit_single_push_insn, at expr.c:4405 35 | res = __bid64_quiet_greater (ux.i, uy.i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ during RTL pass: expand ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_dd_to_di.c: In function ‘__bid_fixdddi’: ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_dd_to_di.c:34:9: internal compiler error: in emit_single_push_insn, at expr.c:4405 34 | res = __bid64_to_int64_xint (ux.i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ during RTL pass: expand ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_ge_dd.c: In function ‘__bid_gedd2’: ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_ge_dd.c:35:9: internal compiler error: in emit_single_push_insn, at expr.c:4405 35 | res = __bid64_quiet_greater_equal (ux.i, uy.i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ during RTL pass: expand ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_le_dd.c: In function ‘__bid_ledd2’: ../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_le_dd.c:35:9: internal compiler error: in emit_single_push_insn, at expr.c:4405 35 | res = __bid64_quiet_less_equal (ux.i, uy.i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- BR, Hongtao