On Thu, May 28, 2020 at 11:37 PM H.J. Lu <[email protected]> wrote:
>
> On Thu, May 28, 2020 at 8:00 AM Richard Sandiford
> <[email protected]> wrote:
> >
> > "Yangfei (Felix)" <[email protected]> writes:
> > > Thanks for reviewing this.
> > > Attached please find the v5 patch.
> > > Note: we also need to modify local variable "mode" once we catch one
> > > case. I see test failure without this change.
> >
> > Looks good. Patch is OK assuming the x86 folks don't want to rewrite
> > gcc.target/i386/pr67609.c to avoid the new optimisation. I'll hold off
> > applying until the AVX512 thing is sorted.
> >
> > > Bootstrapped and tested on aarch64-linux-gnu.
> > > Also bootstrapped on x86_64-linux-gnu. Regression test show two failed
> > > tests on this platform:
> >
> > Thanks for the extra testing.
> >
> > > 1> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
> > > 2> FAIL: gcc.target/i386/pr67609.c scan-assembler movdqa
> > >
> > > I have adjust the second one in the v4 patch.
> >
> > So this is:
> >
> > movdqa reg(%rip), %xmm1
> > movaps %xmm1, -24(%rsp)
> > movsd %xmm0, -24(%rsp)
> > movapd -24(%rsp), %xmm2
> > movaps %xmm2, reg(%rip)
> > ret
> >
> > to:
> >
> > movq %xmm0, reg(%rip)
> > ret
> >
> > Nice. I think it's safe to say that's an improvement :-)
> >
> > I don't know whether this means we're no longer testing what the test
> > was intended to test. Maybe one of the x86 folks has an opinion about
> > whether we should instead rewrite the test somehow.
> >
> > > But The first one looks strange to me.
> > > I see gcc emits invalid x86 vcvtps2ph instrunctions which looks like:
> > >
> > > 125 vcvtps2ph $0, %zmm0, -112(%rbp){%k1}
> > > 126 vcvtps2ph $0, %zmm0, -80(%rbp){%k1}{z}
> > >
> > > This happens in the combine phase, where an combined insn looks like:
> > >
> > > 1989 Trying 31 -> 33:
> > > 1990 31: r106:V16HI=vec_merge(unspec[r103:V16SF,0]
> > > 133,[frame:DI-0x60],r109:HI)
> > > 1991 33: [frame:DI-0x60]=r106:V16HI
> > > 1992 REG_DEAD r106:V16HI
> > > 1993 Successfully matched this instruction:
> > > 1994 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > > 1995 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0 S32
> > > A256])
> > > 1996 (vec_merge:V16HI (unspec:V16HI [
> > > 1997 (reg:V16SF 103)
> > > 1998 (const_int 0 [0])
> > > 1999 ] UNSPEC_VCVTPS2PH)
> > > 2000 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > > 2001 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0
> > > S32 A256])
> > > 2002 (reg:HI 109)))
> > > 2003 allowing combination of insns 31 and 33
> > > 2004 original costs 16 + 4 = 20
> > > 2005 replacement cost 16
> > > 2006 deferring deletion of insn with uid = 31.
> > > 2007 modifying insn i3 33:
> > > [frame:DI-0x60]=vec_merge(unspec[r103:V16SF,0]
> > > 133,[frame:DI-0x60],r109:HI)
> > > 2008 deferring rescan insn with uid = 33.
> > >
> > > And this can be matched with pattern: avx512f_vcvtps2ph512_mask
> > > 2282 (insn 33 31 37 4 (set (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > > 2283 (const_int -96 [0xffffffffffffffa0])) [2 res2.x+0
> > > S32 A256])
> > > 2284 (vec_merge:V16HI (unspec:V16HI [
> > > 2285 (reg:V16SF 103)
> > > 2286 (const_int 0 [0])
> > > 2287 ] UNSPEC_VCVTPS2PH)
> > > 2288 (mem/j/c:V16HI (plus:DI (reg/f:DI 19 frame)
> > > 2289 (const_int -96 [0xffffffffffffffa0])) [2
> > > res2.x+0 S32 A256])
> > > 2290 (reg:HI 109))) "avx512f-vcvtps2ph-2.c":80:10 5324
> > > {avx512f_vcvtps2ph512_mask}
> > > 2291 (nil))
> > >
> > > gcc/config/i386/sse.md:
> > > 21663 (define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
> > > 21664 [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm")
> > > 21665 (unspec:V16HI
> > > 21666 [(match_operand:V16SF 1 "register_operand" "v")
> > > 21667 (match_operand:SI 2 "const_0_to_255_operand" "N")]
> > > 21668 UNSPEC_VCVTPS2PH))]
> > > 21669 "TARGET_AVX512F"
> > > 21670 "vcvtps2ph\t{%2, %1, %0<mask_operand3>|%0<mask_operand3>, %1, %2}"
> > > 21671 [(set_attr "type" "ssecvt")
> > > 21672 (set_attr "prefix" "evex")
> > > 21673 (set_attr "mode" "V16SF")])
> > >
> > > How can that happen?
> >
> > This is due to define_subst magic. The generators automatically
> > create a vec_merge form of the instruction based on the information
> > in the <mode_*> attributes.
> >
> > AFAICT the rtl above is for the line-125 instruction, which looks ok.
> > The problem is the line-126 instruction, since vcvtps2ph doesn't
> > AIUI allow zero masking.
> >
zero masking is not allowed for mem_operand here, but available for
register_operand.
there's something wrong in the pattern, we need to fix it.
(define_insn "<mask_codefor>avx512f_vcvtps2ph512<mask_name>"
> > The "mask" define_subst allows both zeroing and merging,
> > so I guess this means that the pattern should either be using
> > a different define_subst, or should be enforcing merging in
> > some other way. Please could one of the x86 devs take a look?
> >
>
> Hongtao, can you take a look?
>
> Thanks.
>
>
> --
> H.J.
BTW, i failed to build gcc when apply pr95254-v4.txt.
gcc configure:
Using built-in specs.
COLLECT_GCC=./gcc/xgcc
Target: x86_64-pc-linux-gnu
Configured with: ../../gcc/gnu-toolchain/gcc/configure
--enable-languages=c,c++,fortran --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200528 (experimental) (GCC)
host on x86_64 rel8.
error message:
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_eq_dd.c: In
function ‘__bid_eqdd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_eq_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
35 | res = __bid64_quiet_equal (ux.i, uy.i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_gt_dd.c: In
function ‘__bid_gtdd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_gt_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
35 | res = __bid64_quiet_greater (ux.i, uy.i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_dd_to_di.c:
In function ‘__bid_fixdddi’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_dd_to_di.c:34:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
34 | res = __bid64_to_int64_xint (ux.i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_ge_dd.c: In
function ‘__bid_gedd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_ge_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
35 | res = __bid64_quiet_greater_equal (ux.i, uy.i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
during RTL pass: expand
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_le_dd.c: In
function ‘__bid_ledd2’:
../../../../../gcc/gnu-toolchain/gcc/libgcc/config/libbid/_le_dd.c:35:9:
internal compiler error: in emit_single_push_insn, at expr.c:4405
35 | res = __bid64_quiet_less_equal (ux.i, uy.i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
BR,
Hongtao