On Thu, May 27, 2021 at 7:03 AM Hongtao Liu <crazy...@gmail.com> wrote: > > Hi: > This is an updated patch which implements vzeroupper as call_insn > which has a special vzeroupper ABI, also in this patch i reverted > r11-7684, r10-6451, r10-3677 which seems to fix the same issue but in > a different way. > Bootstrapped and regtested on x86_64-linux-gnux{-m32,} and > x86_64-linux-gnux{-m32 \-march=cascadelake,-march=cascadelake}. > Also test the patch on SPEC2017 and eembc, no performance impact as > expected. > Ok for trunk? > > gcc/ChangeLog: > > PR target/82735 > * config/i386/i386-expand.c (ix86_expand_builtin): Remove > assignment of cfun->machine->has_explicit_vzeroupper. > * config/i386/i386-features.c > (ix86_add_reg_usage_to_vzerouppers): Delete. > (ix86_add_reg_usage_to_vzeroupper): Ditto. > (rest_of_handle_insert_vzeroupper): Remove > ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end > of the function. > (gate): Remove cfun->machine->has_explicit_vzeroupper. > * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper): > Declared. > * config/i386/i386.c (ix86_insn_callee_abi): New function. > (ix86_initialize_callee_abi): Ditto. > (ix86_expand_avx_vzeroupper): Ditto. > (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper > ABI. > (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi. > * config/i386/i386.h (enum i386_insn_callee_abi_index): New. > (struct GTY(()) machine_function): Delete > has_explicit_vzeroupper. > * config/i386/i386.md (enum unspec): New member > UNSPEC_CALLEE_ABI. > * config/i386/predicates.md (vzeroupper_pattern): Adjust. > * config/i386/sse.md (avx_vzeroupper): Call > ix86_expand_avx_vzeroupper. > (*avx_vzeroupper): Rename to .. > (avx_vzeroupper_callee_abi): .. this, and adjust pattern as > call_insn which has a special vzeroupper ABI. > (*avx_vzeroupper_1): Deleted. > * df-scan.c (df_get_call_refs): When call_insn is a fake call, > it won't use stack pointer reg. > * final.c (leaf_function_p): When call_insn is a fake call, it > won't affect caller as a leaf function. > * reg-stack.c (callee_clobbers_any_stack_reg): New. > (subst_stack_regs): When call_insn doesn't clobber any stack > reg, don't clear the arguments. > * rtl.c (shallow_copy_rtx): Don't clear flag used when orig is > a insn. > * shrink-wrap.c (requires_stack_frame_p): No need for stack > frame for a fake call. > > gcc/testsuite/ChangeLog: > > PR target/82735 > * gcc.target/i386/pr82735-1.c: New test. > * gcc.target/i386/pr82735-2.c: New test. > * gcc.target/i386/pr82735-3.c: New test. > * gcc.target/i386/pr82735-4.c: New test. > * gcc.target/i386/pr82735-5.c: New test.
Please split the patch to middle-end and target part. The middle-end should be approved first. (define_expand "avx_vzeroupper" - [(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])] - "TARGET_AVX") + [(parallel [(call (mem:QI (unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)) + (const_int 0)) + (unspec [(const_int 1)] UNSPEC_CALLEE_ABI)])] The call insn doesn't look like a valid RTX. Why not just: + [(parallel [(call (mem:QI (const_int 0) + (const_int 0)) for a fake call? Also, UNSPEC_VZEROUPPER can be removed this way since the const_int 1 of UNSPEC_CALLEE_ABI is now used to detect vzeroupper. Also, you don't need the avx_vzeroupper pattern to just call ix86_expand_avx_vzeroupper. Just call the function directly from the call site: case AVX_U128: if (mode == AVX_U128_CLEAN) emit_insn (gen_avx_vzeroupper ()); break; + (unspec [(const_int 1)] UNSPEC_CALLEE_ABI)])] Can this const_int 1 be somehow more descriptive? Perhaps use define_constant to define I386_VZEROUPPER ABI and use it in .md as well as .c files. Uros.