[Bug target/90878] [8/9/10 Regression] integer -> SSE register move isn't generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed Sep 18 19:50:45 2019 New Revision: 275906 URL: https://gcc.gnu.org/viewcvs?rev=275906=gcc=rev Log: i386: Restore Skylake SImode hard register store cost On Skylake, we should move integer register to SSE register without going through memory. This patch restores Skylake SImode hard register store cost to 6. gcc/ PR target/90878 * config/i386/x86-tune-costs.h (skylake_cost): Restore SImode hard register store cost to 6. gcc/testsuite/ PR target/90878 * gcc.target/i386/pr90878.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr90878.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/x86-tune-costs.h trunk/gcc/testsuite/ChangeLog
[Bug target/91446] Wrong cost for scalar_load/scalar_store of vector type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91446 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed Sep 18 19:49:19 2019 New Revision: 275905 URL: https://gcc.gnu.org/viewcvs?rev=275905=gcc=rev Log: i386: Increase Skylake SImode pseudo register store cost On Skylake, SImode store cost isn't less than half cost of 128-bit vector store. This patch increases Skylake SImode pseudo register store cost to make it the same as QImode and HImode. gcc/ PR target/91446 * config/i386/x86-tune-costs.h (skylake_cost): Increase SImode pseudo register store cost from 3 to 6 to make it the same as QImode and HImode. gcc/testsuite/ PR target/91446 * gcc.target/i386/pr91446.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr91446.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/x86-tune-costs.h trunk/gcc/testsuite/ChangeLog
[Bug target/90878] [8/9/10 Regression] integer -> SSE register move isn't generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Aug 15 18:15:33 2019 New Revision: 274543 URL: https://gcc.gnu.org/viewcvs?rev=274543=gcc=rev Log: i386: Separate costs of pseudo registers from hard registers processor_costs has costs of RTL expressions with pseudo registers and and costs of hard register moves: 1. Costs of RTL expressions are used to generate the most efficient RTL operations with pseudo registers. 2. Costs of hard register moves are used by register allocator to decide how to allocate and move hard registers. Since relative costs of pseudo register load and store versus pseudo register moves in RTL expressions can be different from relative costs of hard registers, we should separate costs of RTL expressions with pseudo registers from costs of hard registers so that register allocator and RTL expressions can be improved independently. This patch moves costs of hard register moves to the new hard_register field and duplicates costs of moves which are also used for costs of RTL expressions. PR target/90878 * config/i386/i386.c (inline_memory_move_cost): Use hard_register for costs of hard register moves. (ix86_register_move_cost): Likewise. * config/i386/i386.h (processor_costs): Move costs of hard register moves to hard_register. Add int_load, int_store, xmm_move, ymm_move, zmm_move, sse_to_integer, integer_to_sse, sse_load, sse_store, sse_unaligned_load and sse_unaligned_store for costs of RTL expressions. * config/i386/x86-tune-costs.h: Move costs of hard register moves to hard_register. Duplicate int_load, int_store, xmm_move, ymm_move, zmm_move, sse_to_integer, integer_to_sse, sse_load, sse_store for costs of RTL expressions. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/config/i386/x86-tune-costs.h
[Bug target/90980] Missing AVX512 unaligned intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90980 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Jul 11 19:09:37 2019 New Revision: 273416 URL: https://gcc.gnu.org/viewcvs?rev=273416=gcc=rev Log: i386: Add AVX512 unaligned intrinsics __m512i _mm512_loadu_epi64( void * sa); void _mm512_storeu_epi64(void * d, __m512i a); __m512i _mm512_loadu_epi32( void * sa); void _mm512_storeu_epi32(void * d, __m512i a); void _mm256_storeu_epi64(void * d, __m256i a); void _mm_storeu_epi64(void * d, __m128i a); void _mm256_storeu_epi32(void * d, __m256i a); void _mm_storeu_epi32(void * d, __m128i a); Tested on x86-64. 2019-07-11 Sunil K Pandey gcc/ PR target/90980 * config/i386/avx512fintrin.h (_mm512_loadu_epi64): New. (_mm512_storeu_epi64): Likewise. (_mm512_loadu_epi32): Likewise. (_mm512_storeu_epi32): Likewise. * config/i386/avx512vlintrin.h (_mm256_storeu_epi64): New. (_mm_storeu_epi64): Likewise. (_mm256_storeu_epi32): Likewise. (_mm_storeu_epi32): Likewise. gcc/testsuite/ PR target/90980 * gcc.target/i386/pr90980-1.c: New test. * gcc.target/i386/pr90980-2.c: Likewise. * gcc.target/i386/pr90980-3.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr90980-1.c trunk/gcc/testsuite/gcc.target/i386/pr90980-2.c trunk/gcc/testsuite/gcc.target/i386/pr90980-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/avx512fintrin.h trunk/gcc/config/i386/avx512vlintrin.h trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855 --- Comment #10 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Jun 20 15:30:54 2019 New Revision: 272511 URL: https://gcc.gnu.org/viewcvs?rev=272511=gcc=rev Log: i386: Generate standard floating point scalar operation patterns Standard floating point scalar operation patterns for combiner, which preserve the rest of the vector, look like (vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 87)) (reg/v:V2DF 85 [ x ]) (const_int 1 [0x1])])) and (vec_merge:V2DF (vec_duplicate:V2DF (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ]) (parallel [ (const_int 0 [0])])) (reg:DF 87)) (reg/v:V2DF 85 [ x ]) (const_int 1 [0x1])])) This patch adds and generates such standard floating point scalar operation patterns for +, -, *, /, > and <. Tested on x86-64. gcc/ PR target/54855 * config/i386/i386-expand.c (ix86_expand_vector_set): Generate standard scalar operation pattern for V2DF. * config/i386/sse.md (*_vm3): New. (*_vm3): Likewise. (*ieee_3): Likewise. (vec_setv2df_0): Likewise. gcc/testsuite/ PR target/54855 * gcc.target/i386/pr54855-1.c: New test. * gcc.target/i386/pr54855-2.c: Likewise. * gcc.target/i386/pr54855-3.c: Likewise. * gcc.target/i386/pr54855-4.c: Likewise. * gcc.target/i386/pr54855-5.c: Likewise. * gcc.target/i386/pr54855-6.c: Likewise. * gcc.target/i386/pr54855-7.c: Likewise. * gcc.target/i386/pr54855-8.c: Likewise. * gcc.target/i386/pr54855-9.c: Likewise. * gcc.target/i386/pr54855-10.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr54855-1.c trunk/gcc/testsuite/gcc.target/i386/pr54855-10.c trunk/gcc/testsuite/gcc.target/i386/pr54855-2.c trunk/gcc/testsuite/gcc.target/i386/pr54855-3.c trunk/gcc/testsuite/gcc.target/i386/pr54855-4.c trunk/gcc/testsuite/gcc.target/i386/pr54855-5.c trunk/gcc/testsuite/gcc.target/i386/pr54855-6.c trunk/gcc/testsuite/gcc.target/i386/pr54855-7.c trunk/gcc/testsuite/gcc.target/i386/pr54855-8.c trunk/gcc/testsuite/gcc.target/i386/pr54855-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-expand.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/90765] preferred_stack_boundary is updated for callee
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90765 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Jun 14 16:24:56 2019 New Revision: 272296 URL: https://gcc.gnu.org/viewcvs?rev=272296=gcc=rev Log: Update preferred_stack_boundary only when expanding function call locate_and_pad_parm is called when expanding function call from initialize_argument_information and when generating function body from assign_parm_find_entry_rtl: /* Remember if the outgoing parameter requires extra alignment on the calling function side. */ if (crtl->stack_alignment_needed < boundary) crtl->stack_alignment_needed = boundary; if (crtl->preferred_stack_boundary < boundary) crtl->preferred_stack_boundary = boundary; stack_alignment_needed and preferred_stack_boundary should be updated only when expanding function call, not when generating function body. Add update_stack_alignment_for_call to update stack alignment when outgoing parameter is passed in the stack. gcc/ PR rtl-optimization/90765 * calls.c (update_stack_alignment_for_call): New function. (expand_call): Call update_stack_alignment_for_call when outgoing parameter is passed in the stack. (emit_library_call_value_1): Likewise. * function.c (locate_and_pad_parm): Don't update stack_alignment_needed and preferred_stack_boundary. gcc/testsuite/ PR rtl-optimization/90765 * gcc.target/i386/pr90765-1.c: New test. * gcc.target/i386/pr90765-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr90765-1.c trunk/gcc/testsuite/gcc.target/i386/pr90765-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/calls.c trunk/gcc/function.c trunk/gcc/testsuite/ChangeLog
[Bug target/90877] Dead codes in ix86_register_move_cost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90877 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Jun 14 15:41:43 2019 New Revision: 272294 URL: https://gcc.gnu.org/viewcvs?rev=272294=gcc=rev Log: i386: Update SSE <-> integer move costs Since inline_secondary_memory_needed has /* ??? This is a lie. We do have moves between mmx/general, and for mmx/sse2. But by saying we need secondary memory we discourage the register allocator from using the mmx registers unless needed. */ if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2)) return true; moves between MMX and non-MMX units require secondary memory. There is no need to check moves between MMX and integer units. struct processor_costs has: const int mmxsse_to_integer; /* cost of moving mmxsse register to integer. */ const int ssemmx_to_integer; /* cost of moving integer to mmxsse register. */ This patch also renames mmxsse_to_integer to sse_to_integer and ssemmx_to_integer to integer_to_sse. Tested on Linux/x86-64. PR target/90877 * config/i386/i386-features.c (dimode_scalar_chain::compute_convert_gain): Replace mmxsse_to_integer with sse_to_integer. * config/i386/i386.c (ix86_register_move_cost): Verify that moves between MMX and non-MMX units require secondary memory. Correct costs of moves between SSE and integer units. * config/i386/i386.h (processor_costs): Rename cost of moving SSE register to integer to sse_to_integer. Rename cost of moving integer register to SSE to integer_to_sse. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-features.c trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h
[Bug target/89355] Unnecessary ENDBR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89355 --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri May 31 23:59:16 2019 New Revision: 271828 URL: https://gcc.gnu.org/viewcvs?rev=271828=gcc=rev Log: i386: Don't insert ENDBR after NOTE_INSN_DELETED_LABEL NOTE_INSN_DELETED_LABEL is used to mark what used to be a 'code_label', but was not used for other purposes than taking its address which cannot be used as target for indirect jumps. Tested on Linux/x86-64 with -fcf-protection. For x86-64 libc.so on glibc master branch (commit f43b8dd55588c3), Before: 2961 endbr64 After: 2943 endbr64 gcc/ PR target/89355 * config/i386/i386-features.c (rest_of_insert_endbranch): Remove NOTE_INSN_DELETED_LABEL check. gcc/testsuite/ PR target/89355 * gcc.target/i386/cet-label-3.c: New test. * gcc.target/i386/cet-label-4.c: Likewise. * gcc.target/i386/cet-label-5.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/cet-label-3.c trunk/gcc/testsuite/gcc.target/i386/cet-label-4.c trunk/gcc/testsuite/gcc.target/i386/cet-label-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-features.c trunk/gcc/testsuite/ChangeLog
[Bug target/88483] Unnecessary stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88483 --- Comment #5 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 22 18:53:37 2019 New Revision: 271517 URL: https://gcc.gnu.org/viewcvs?rev=271517=gcc=rev Log: x86: Don't allocate stack frame nor align stack if not needed get_frame_size () returns used stack slots during compilation, which may be optimized out later. This patch does the followings: 1. Add stack_frame_required to machine_function to indicate that the function needs a stack frame. 2. Change ix86_find_max_used_stack_alignment to set stack_frame_required. 3. Always call ix86_find_max_used_stack_alignment to check if stack frame is needed. Tested on i686 and x86-64 with --with-arch=native --with-cpu=native Tested on AVX512 machine configured with --with-arch=native --with-cpu=native gcc/ PR target/88483 * config/i386/i386-options.c (ix86_init_machine_status): Set stack_frame_required to true. * config/i386/i386.c (ix86_get_frame_size): New function. (ix86_frame_pointer_required): Replace get_frame_size with ix86_get_frame_size. (ix86_compute_frame_layout): Likewise. (ix86_find_max_used_stack_alignment): Changed to void. Set stack_frame_required. (ix86_finalize_stack_frame_flags): Always call ix86_find_max_used_stack_alignment. Replace get_frame_size with ix86_get_frame_size. * config/i386/i386.h (machine_function): Add stack_frame_required. gcc/testsuite/ PR target/88483 * gcc.target/i386/stackalign/pr88483-1.c: New test. * gcc.target/i386/stackalign/pr88483-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/stackalign/pr88483-1.c trunk/gcc/testsuite/gcc.target/i386/stackalign/pr88483-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-options.c trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/testsuite/ChangeLog
[Bug bootstrap/90497] [10 Regression] Broken bootstrap on i686-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90497 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri May 17 14:48:37 2019 New Revision: 271328 URL: https://gcc.gnu.org/viewcvs?rev=271328=gcc=rev Log: i386: Enable MMX intrinsics without SSE/SSE2/SSSE3 Since MMX intrinsics are marked with SSE/SSE2/SSSE3 for SSE emulation, enable them without SSE/SSE2/SSSE3 if MMX is enabled. Restore TARGET_3DNOW check, which was changed to TARGET_3DNOW_A by revision 271235. gcc/ PR target/90497 * config/i386/i386-expand.c (ix86_expand_builtin): Enable MMX intrinsics without SSE/SSE2/SSSE3. * config/i386/mmx.md (mmx_uavgv8qi3): Restore TARGET_3DNOW check. (*mmx_uavgv8qi3): Likewise. gcc/testsuite/ PR target/90497 * gcc.target/i386/pr90497-1.c: New test. * gcc.target/i386/pr90497-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr90497-1.c trunk/gcc/testsuite/gcc.target/i386/pr90497-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-expand.c trunk/gcc/config/i386/mmx.md trunk/gcc/testsuite/ChangeLog
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #46 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:39:38 2019 New Revision: 271254 URL: https://gcc.gnu.org/viewcvs?rev=271254=gcc=rev Log: i386: Add tests for MMX intrinsic emulations with SSE Test MMX intrinsics with -msse2 in 32-bit mode and -msse2 -mno-mmx in 64-bit mode. PR target/89021 * gcc.target/i386/mmx-vals.h: New file. * gcc.target/i386/sse2-mmx-2.c: Likewise. * gcc.target/i386/sse2-mmx-3.c: Likewise. * gcc.target/i386/sse2-mmx-4.c: Likewise. * gcc.target/i386/sse2-mmx-5.c: Likewise. * gcc.target/i386/sse2-mmx-6.c: Likewise. * gcc.target/i386/sse2-mmx-7.c: Likewise. * gcc.target/i386/sse2-mmx-8.c: Likewise. * gcc.target/i386/sse2-mmx-9.c: Likewise. * gcc.target/i386/sse2-mmx-10.c: Likewise. * gcc.target/i386/sse2-mmx-11.c: Likewise. * gcc.target/i386/sse2-mmx-12.c: Likewise. * gcc.target/i386/sse2-mmx-13.c: Likewise. * gcc.target/i386/sse2-mmx-14.c: Likewise. * gcc.target/i386/sse2-mmx-15.c: Likewise. * gcc.target/i386/sse2-mmx-16.c: Likewise. * gcc.target/i386/sse2-mmx-17.c: Likewise. * gcc.target/i386/sse2-mmx-18a.c: Likewise. * gcc.target/i386/sse2-mmx-18b.c: Likewise. * gcc.target/i386/sse2-mmx-18c.c: Likewise. * gcc.target/i386/sse2-mmx-19a.c: Likewise. * gcc.target/i386/sse2-mmx-18b.c: Likewise. * gcc.target/i386/sse2-mmx-19c.c: Likewise. * gcc.target/i386/sse2-mmx-19d.c: Likewise. * gcc.target/i386/sse2-mmx-19e.c: Likewise. * gcc.target/i386/sse2-mmx-20.c: Likewise. * gcc.target/i386/sse2-mmx-21.c: Likewise. * gcc.target/i386/sse2-mmx-22.c: Likewise. * gcc.target/i386/sse2-mmx-cvtpi2ps.c: Likewise. * gcc.target/i386/sse2-mmx-cvtps2pi.c: Likewise. * gcc.target/i386/sse2-mmx-cvttps2pi.c: Likewise. * gcc.target/i386/sse2-mmx-maskmovq.c: Likewise. * gcc.target/i386/sse2-mmx-packssdw.c: Likewise. * gcc.target/i386/sse2-mmx-packsswb.c: Likewise. * gcc.target/i386/sse2-mmx-packuswb.c: Likewise. * gcc.target/i386/sse2-mmx-paddb.c: Likewise. * gcc.target/i386/sse2-mmx-paddd.c: Likewise. * gcc.target/i386/sse2-mmx-paddq.c: Likewise. * gcc.target/i386/sse2-mmx-paddsb.c: Likewise. * gcc.target/i386/sse2-mmx-paddsw.c: Likewise. * gcc.target/i386/sse2-mmx-paddusb.c: Likewise. * gcc.target/i386/sse2-mmx-paddusw.c: Likewise. * gcc.target/i386/sse2-mmx-paddw.c: Likewise. * gcc.target/i386/sse2-mmx-pand.c: Likewise. * gcc.target/i386/sse2-mmx-pandn.c: Likewise. * gcc.target/i386/sse2-mmx-pavgb.c: Likewise. * gcc.target/i386/sse2-mmx-pavgw.c: Likewise. * gcc.target/i386/sse2-mmx-pcmpeqb.c: Likewise. * gcc.target/i386/sse2-mmx-pcmpeqd.c: Likewise. * gcc.target/i386/sse2-mmx-pcmpeqw.c: Likewise. * gcc.target/i386/sse2-mmx-pcmpgtb.c: Likewise. * gcc.target/i386/sse2-mmx-pcmpgtd.c: Likewise. * gcc.target/i386/sse2-mmx-pcmpgtw.c: Likewise. * gcc.target/i386/sse2-mmx-pextrw.c: Likewise. * gcc.target/i386/sse2-mmx-pinsrw.c: Likewise. * gcc.target/i386/sse2-mmx-pmaddwd.c: Likewise. * gcc.target/i386/sse2-mmx-pmaxsw.c: Likewise. * gcc.target/i386/sse2-mmx-pmaxub.c: Likewise. * gcc.target/i386/sse2-mmx-pminsw.c: Likewise. * gcc.target/i386/sse2-mmx-pminub.c: Likewise. * gcc.target/i386/sse2-mmx-pmovmskb.c: Likewise. * gcc.target/i386/sse2-mmx-pmulhuw.c: Likewise. * gcc.target/i386/sse2-mmx-pmulhw.c: Likewise. * gcc.target/i386/sse2-mmx-pmullw.c: Likewise. * gcc.target/i386/sse2-mmx-pmuludq.c: Likewise. * gcc.target/i386/sse2-mmx-por.c: Likewise. * gcc.target/i386/sse2-mmx-psadbw.c: Likewise. * gcc.target/i386/sse2-mmx-pshufw.c: Likewise. * gcc.target/i386/sse2-mmx-pslld.c: Likewise. * gcc.target/i386/sse2-mmx-pslldi.c: Likewise. * gcc.target/i386/sse2-mmx-psllq.c: Likewise. * gcc.target/i386/sse2-mmx-psllqi.c: Likewise. * gcc.target/i386/sse2-mmx-psllw.c: Likewise. * gcc.target/i386/sse2-mmx-psllwi.c: Likewise. * gcc.target/i386/sse2-mmx-psrad.c: Likewise. * gcc.target/i386/sse2-mmx-psradi.c: Likewise. * gcc.target/i386/sse2-mmx-psraw.c: Likewise. * gcc.target/i386/sse2-mmx-psrawi.c: Likewise. * gcc.target/i386/sse2-mmx-psrld.c: Likewise. * gcc.target/i386/sse2-mmx-psrldi.c: Likewise. * gcc.target/i386/sse2-mmx-psrlq.c: Likewise. * gcc.target/i386/sse2-mmx-psrlqi.c: Likewise. * gcc.target/i386/sse2-mmx-psrlw.c: Likewise. * gcc.target/i386/sse2-mmx-psrlwi.c: Likewise. * gcc.target/i386/sse2-mmx-psubb.c: Likewise. * gcc.target/i386/sse2-mmx-psubd.c
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #45 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:33:43 2019 New Revision: 271253 URL: https://gcc.gnu.org/viewcvs?rev=271253=gcc=rev Log: i386: Enable TM MMX intrinsics with SSE2 This patch enables TM MMX intrinsics with SSE2 when MMX is disabled. PR target/89021 * config/i386/i386-builtins.c (bdesc_tm): Enable MMX intrinsics with SSE2. Modified: trunk/gcc/config/i386/i386-builtins.c trunk/gcc/testsuite/ChangeLog
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #44 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:32:33 2019 New Revision: 271252 URL: https://gcc.gnu.org/viewcvs?rev=271252=gcc=rev Log: i386: Allow MMX intrinsic emulation with SSE Allow MMX intrinsic emulation with SSE/SSE2/SSSE3. Don't enable MMX ISA by default with TARGET_MMX_WITH_SSE. For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 64-bit mode since MMX intrinsics can be emulated wit SSE. gcc/ PR target/89021 * config/i386/i386-builtin.def: Enable MMX intrinsics with SSE/SSE2/SSSE3. * config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins): Likewise. * config/i386/i386-expand.c (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX intrinsics with TARGET_MMX_WITH_SSE. * config/i386/mmintrin.h: Only require SSE2 if __MMX_WITH_SSE__ is defined. gcc/testsuite/ PR target/89021 * gcc.target/i386/pr82483-1.c: Error only on ia32. * gcc.target/i386/pr82483-2.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-builtin.def trunk/gcc/config/i386/i386-builtins.c trunk/gcc/config/i386/i386-expand.c trunk/gcc/config/i386/mmintrin.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr82483-1.c trunk/gcc/testsuite/gcc.target/i386/pr82483-2.c
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #43 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:31:18 2019 New Revision: 271251 URL: https://gcc.gnu.org/viewcvs?rev=271251=gcc=rev Log: i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE PR target/89021 * config/i386/mmx.md (*vec_dupv2sf): Changed to define_insn_and_split to support SSE emulation. (*vec_extractv2sf_0): Likewise. (*vec_extractv2sf_1): Likewise. (*vec_extractv2si_0): Likewise. (*vec_extractv2si_1): Likewise. (*vec_extractv2si_zext_mem): Likewise. (vec_setv2sf): Also allow TARGET_MMX_WITH_SSE. (vec_extractv2sf_1 splitter): Likewise. (vec_extractv2sfsf): Likewise. (vec_setv2si): Likewise. (vec_extractv2si_1 splitter): Likewise. (vec_extractv2sisi): Likewise. (vec_setv4hi): Likewise. (vec_extractv4hihi): Likewise. (vec_setv8qi): Likewise. (vec_extractv8qiqi): Likewise. (vec_extractv2sfsf): Also allow TARGET_MMX_WITH_SSE. Pass TARGET_MMX_WITH_SSE ix86_expand_vector_extract. (vec_extractv2sisi): Likewise. (vec_extractv4hihi): Likewise. (vec_extractv8qiqi): Likewise. (vec_initv2sfsf): Also allow TARGET_MMX_WITH_SSE. Pass TARGET_MMX_WITH_SSE to ix86_expand_vector_init. (vec_initv2sisi): Likewise. (vec_initv4hihi): Likewise. (vec_initv8qiqi): Likewise. (vec_setv2si): Also allow TARGET_MMX_WITH_SSE. Pass TARGET_MMX_WITH_SSE to ix86_expand_vector_set. (vec_setv4hi): Likewise. (vec_setv8qi): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #42 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:30:32 2019 New Revision: 271250 URL: https://gcc.gnu.org/viewcvs?rev=271250=gcc=rev Log: i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE PR target/89021 * config/i386/mmx.md (MMXMODE:mov): Also allow TARGET_MMX_WITH_SSE. (MMXMODE:*mov_internal): Likewise. (MMXMODE:movmisalign): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #41 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:29:28 2019 New Revision: 271249 URL: https://gcc.gnu.org/viewcvs?rev=271249=gcc=rev Log: Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE 2019-05-15 Uroš Bizjak PR target/89021 * config/i386/i386.md (*zero_extendsidi2): Add mmx_isa attribute. * config/i386/sse.md (sse2_cvtpi2pd): Ditto. (sse2_cvtpd2pi): Ditto. (sse2_cvttpd2pi): Ditto. (*vec_concatv2sf_sse4_1): Ditto. (*vec_concatv2sf_sse): Ditto. (*vec_concatv2si_sse4_1): Ditto. (*vec_concatv2si): Ditto. (*vec_concatv4si_0): Ditto. (*vec_concatv2di_0): Ditto. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #39 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:27:33 2019 New Revision: 271247 URL: https://gcc.gnu.org/viewcvs?rev=271247=gcc=rev Log: i386: Emulate MMX ssse3_palignrdi with SSE Emulate MMX version of palignrq with SSE version by concatenating 2 64-bit MMX operands into a single 128-bit SSE operand, followed by SSE psrldq. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_palignrdi): Changed to define_insn_and_split to support SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #40 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:28:04 2019 New Revision: 271248 URL: https://gcc.gnu.org/viewcvs?rev=271248=gcc=rev Log: i386: Emulate MMX abs2 with SSE Emulate MMX abs2 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (abs2): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #38 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:26:59 2019 New Revision: 271246 URL: https://gcc.gnu.org/viewcvs?rev=271246=gcc=rev Log: i386: Emulate MMX ssse3_psign3 with SSE Emulate MMX ssse3_psign3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_psign3): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #37 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:26:19 2019 New Revision: 271245 URL: https://gcc.gnu.org/viewcvs?rev=271245=gcc=rev Log: i386: Emulate MMX pshufb with SSE version Emulate MMX version of pshufb with SSE version by masking out the bit 3 of the shuffle control byte. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_pshufbv8qi3): Changed to define_insn_and_split. Also allow TARGET_MMX_WITH_SSE. Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #36 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:24:44 2019 New Revision: 271244 URL: https://gcc.gnu.org/viewcvs?rev=271244=gcc=rev Log: i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE Emulate MMX ssse3_pmulhrswv4hi3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_pmulhrswv4hi3): Require TARGET_MMX or TARGET_MMX_WITH_SSE. (*ssse3_pmulhrswv4hi3): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #35 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:23:49 2019 New Revision: 271243 URL: https://gcc.gnu.org/viewcvs?rev=271243=gcc=rev Log: i386: Emulate MMX ssse3_pmaddubsw with SSE Emulate MMX ssse3_pmaddubsw with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_pmaddubsw): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #34 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:23:11 2019 New Revision: 271242 URL: https://gcc.gnu.org/viewcvs?rev=271242=gcc=rev Log: i386: Emulate MMX ssse3_phdv2si3 with SSE Emulate MMX ssse3_phdv2si3 with SSE by moving bits 64:95 to bits 32:63 in SSE register. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_phdv2si3): Changed to define_insn_and_split to support SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #33 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:22:39 2019 New Revision: 271241 URL: https://gcc.gnu.org/viewcvs?rev=271241=gcc=rev Log: i386: Emulate MMX ssse3_phwv4hi3 with SSE Emulate MMX ssse3_phwv4hi3 with SSE by moving bits 64:95 to bits 32:63 in SSE register. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (ssse3_phwv4hi3): Changed to define_insn_and_split to support SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #31 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:21:39 2019 New Revision: 271239 URL: https://gcc.gnu.org/viewcvs?rev=271239=gcc=rev Log: i386: Emulate MMX umulv1siv1di3 with SSE2 Emulate MMX umulv1siv1di3 with SSE2. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (sse2_umulv1siv1di3): Add SSE emulation support. (*sse2_umulv1siv1di3): Add SSE2 emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #32 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:22:08 2019 New Revision: 271240 URL: https://gcc.gnu.org/viewcvs?rev=271240=gcc=rev Log: i386: Make _mm_empty () as NOP without MMX With SSE emulation of MMX intrinsics, we should make _mm_empty () as NOP without MMX. PR target/89021 * config/i386/mmx.md (mmx_): Renamed to ... (*mmx_): This. (mmx_): New expander. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #30 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:21:04 2019 New Revision: 271238 URL: https://gcc.gnu.org/viewcvs?rev=271238=gcc=rev Log: i386: Emulate MMX movntq with SSE2 movntidi Emulate MMX movntq with SSE2 movntidi. Only register source operand is allowed. PR target/89021 * config/i386/mmx.md (sse_movntq): Add SSE2 emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #29 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:20:28 2019 New Revision: 271237 URL: https://gcc.gnu.org/viewcvs?rev=271237=gcc=rev Log: i386: Emulate MMX mmx_psadbw with SSE Emulate MMX mmx_psadbw with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_psadbw): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #28 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:19:55 2019 New Revision: 271236 URL: https://gcc.gnu.org/viewcvs?rev=271236=gcc=rev Log: i386: Emulate MMX mmx_uavgv4hi3 with SSE Emulate MMX mmx_uavgv4hi3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_uavgv4hi3): Also check TARGET_MMX and TARGET_MMX_WITH_SSE. (*mmx_uavgv4hi3): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #27 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:19:19 2019 New Revision: 271235 URL: https://gcc.gnu.org/viewcvs?rev=271235=gcc=rev Log: i386: Emulate MMX mmx_uavgv8qi3 with SSE Emulate MMX mmx_uavgv8qi3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_uavgv8qi3): Also check TARGET_MMX and TARGET_MMX_WITH_SSE. (*mmx_uavgv8qi3): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #26 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:18:41 2019 New Revision: 271234 URL: https://gcc.gnu.org/viewcvs?rev=271234=gcc=rev Log: i386: Emulate MMX maskmovq with SSE2 maskmovdqu Emulate MMX maskmovq with SSE2 maskmovdqu for TARGET_MMX_WITH_SSE by zero-extending source and mask operands to 128 bits. Handle unmapped bits 64:127 at memory address by adjusting source and mask operands together with memory address. PR target/89021 * config/i386/xmmintrin.h: Emulate MMX maskmovq with SSE2 maskmovdqu for __MMX_WITH_SSE__. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/xmmintrin.h
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #25 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:17:25 2019 New Revision: 271233 URL: https://gcc.gnu.org/viewcvs?rev=271233=gcc=rev Log: i386: Emulate MMX mmx_umulv4hi3_highpart with SSE Emulate MMX mmx_umulv4hi3_highpart with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_umulv4hi3_highpart): Also check TARGET_MMX and TARGET_MMX_WITH_SSE. (*mmx_umulv4hi3_highpart): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #24 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:16:27 2019 New Revision: 271232 URL: https://gcc.gnu.org/viewcvs?rev=271232=gcc=rev Log: i386: Emulate MMX mmx_pmovmskb with SSE Emulate MMX mmx_pmovmskb with SSE by zero-extending result of SSE pmovmskb from QImode to SImode. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_pmovmskb): Changed to define_insn_and_split to support SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #23 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:15:44 2019 New Revision: 271231 URL: https://gcc.gnu.org/viewcvs?rev=271231=gcc=rev Log: i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_v4hi3): Also check TARGET_MMX and TARGET_MMX_WITH_SSE. (mmx_v8qi3): Likewise. (smaxmin:v4hi3): New. (umaxmin:v8qi3): Likewise. (smaxmin:*mmx_v4hi3): Add SSE emulation. (umaxmin:*mmx_v8qi3): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #22 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:14:03 2019 New Revision: 271230 URL: https://gcc.gnu.org/viewcvs?rev=271230=gcc=rev Log: i386: Emulate MMX mmx_pinsrw with SSE Emulate MMX mmx_pinsrw with SSE. Only SSE register destination operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_pinsrw): Also check TARGET_MMX and TARGET_MMX_WITH_SSE. (*mmx_pinsrw): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #21 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:13:31 2019 New Revision: 271229 URL: https://gcc.gnu.org/viewcvs?rev=271229=gcc=rev Log: i386: Emulate MMX mmx_pextrw with SSE Emulate MMX mmx_pextrw with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_pextrw): Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #20 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:12:47 2019 New Revision: 271228 URL: https://gcc.gnu.org/viewcvs?rev=271228=gcc=rev Log: i386: Emulate MMX sse_cvtpi2ps with SSE Emulate MMX sse_cvtpi2ps with SSE2 cvtdq2ps, preserving upper 64 bits of destination XMM register. Only SSE register source operand is allowed. PR target/89021 * config/i386/sse.md (sse_cvtpi2ps): Changed to define_insn_and_split. Also allow TARGET_MMX_WITH_SSE. Add SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #19 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:12:14 2019 New Revision: 271227 URL: https://gcc.gnu.org/viewcvs?rev=271227=gcc=rev Log: i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE. PR target/89021 * config/i386/sse.md (sse_cvtps2pi): Add SSE emulation. (sse_cvttps2pi): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #18 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:11:41 2019 New Revision: 271226 URL: https://gcc.gnu.org/viewcvs?rev=271226=gcc=rev Log: i386: Emulate MMX pshufw with SSE Emulate MMX pshufw with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_pshufw): Also check TARGET_MMX and TARGET_MMX_WITH_SSE. (mmx_pshufw_1): Add SSE emulation. (*vec_dupv4hi): Changed to define_insn_and_split and also allow TARGET_MMX_WITH_SSE to support SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #17 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:11:07 2019 New Revision: 271225 URL: https://gcc.gnu.org/viewcvs?rev=271225=gcc=rev Log: i386: Emulate MMX vec_dupv2si with SSE Emulate MMX vec_dupv2si with SSE. Add the "Yw" constraint to allow broadcast from integer register for AVX512BW with TARGET_AVX512VL. Only SSE register source operand is allowed. PR target/89021 * config/i386/constraints.md (Yw): New constraint. * config/i386/mmx.md (*vec_dupv2si): Changed to define_insn_and_split and also allow TARGET_MMX_WITH_SSE to support SSE emulation. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/constraints.md trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #16 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:10:32 2019 New Revision: 271224 URL: https://gcc.gnu.org/viewcvs?rev=271224=gcc=rev Log: i386: Emulate MMX mmx_eq/mmx_gt3 with SSE Emulate MMX mmx_eq/mmx_gt3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_eq3): Also allow TARGET_MMX_WITH_SSE. (*mmx_eq3): Also allow TARGET_MMX_WITH_SSE. Add SSE support. (mmx_gt3): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #15 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:09:50 2019 New Revision: 271223 URL: https://gcc.gnu.org/viewcvs?rev=271223=gcc=rev Log: i386: Emulate MMX mmx_andnot3 with SSE Emulate MMX mmx_andnot3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_andnot3): Also allow TARGET_MMX_WITH_SSE. Add SSE support. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #14 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:09:19 2019 New Revision: 271222 URL: https://gcc.gnu.org/viewcvs?rev=271222=gcc=rev Log: i386: Emulate MMX 3 with SSE Emulate MMX 3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (any_logic:mmx_3): Also allow TARGET_MMX_WITH_SSE. (any_logic:3): New. (any_logic:*mmx_3): Also allow TARGET_MMX_WITH_SSE. Add SSE support. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #13 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:08:38 2019 New Revision: 271221 URL: https://gcc.gnu.org/viewcvs?rev=271221=gcc=rev Log: i386: Emulate MMX ashr3/3 with SSE Emulate MMX ashr3/3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_ashr3): Also allow TARGET_MMX_WITH_SSE. Add SSE emulation. (mmx_3): Likewise. (ashr3): New. (3): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #12 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:08:04 2019 New Revision: 271220 URL: https://gcc.gnu.org/viewcvs?rev=271220=gcc=rev Log: i386: Emulate MMX mmx_pmaddwd with SSE Emulate MMX pmaddwd with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE. (*mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE. Add SSE support. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #11 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:07:04 2019 New Revision: 271219 URL: https://gcc.gnu.org/viewcvs?rev=271219=gcc=rev Log: i386: Emulate MMX smulv4hi3_highpart with SSE Emulate MMX mulv4hi3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_smulv4hi3_highpart): Also allow TARGET_MMX_WITH_SSE. (*mmx_smulv4hi3_highpart): Also allow TARGET_MMX_WITH_SSE. Add SSE support. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #10 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:06:28 2019 New Revision: 271218 URL: https://gcc.gnu.org/viewcvs?rev=271218=gcc=rev Log: i386: Emulate MMX mulv4hi3 with SSE Emulate MMX mulv4hi3 with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (mmx_mulv4hi3): Also allow TARGET_MMX_WITH_SSE. (mulv4hi3): New. (*mmx_mulv4hi3): Also allow TARGET_MMX_WITH_SSE. Add SSE support. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #9 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:05:48 2019 New Revision: 271217 URL: https://gcc.gnu.org/viewcvs?rev=271217=gcc=rev Log: i386: Emulate MMX plusminus/sat_plusminus with SSE Emulate MMX plusminus/sat_plusminus with SSE. Only SSE register source operand is allowed. PR target/89021 * config/i386/mmx.md (MMXMODEI8): Require TARGET_SSE2 for V1DI. (plusminus:mmx_3): Check TARGET_MMX_WITH_SSE. (sat_plusminus:mmx_3): Likewise. (3): New. (*mmx_3): Add SSE emulation. (*mmx_3): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:05:07 2019 New Revision: 271216 URL: https://gcc.gnu.org/viewcvs?rev=271216=gcc=rev Log: i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX. For MMX punpckhXX, move bits 64:127 to bits 0:63 in SSE register. Only SSE register source operand is allowed. PR target/89021 * config/i386/i386-expand.c (ix86_split_mmx_punpck): New function. * config/i386/i386-protos.h (ix86_split_mmx_punpck): New prototype. * config/i386/mmx.m (mmx_punpckhbw): Changed to define_insn_and_split to support SSE emulation. (mmx_punpcklbw): Likewise. (mmx_punpckhwd): Likewise. (mmx_punpcklwd): Likewise. (mmx_punpckhdq): Likewise. (mmx_punpckldq): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-expand.c trunk/gcc/config/i386/i386-protos.h trunk/gcc/config/i386/mmx.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:04:08 2019 New Revision: 271215 URL: https://gcc.gnu.org/viewcvs?rev=271215=gcc=rev Log: i386: Emulate MMX packsswb/packssdw/packuswb with SSE2 Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb plus moving bits 64:95 to bits 32:63 in SSE register. Only SSE register source operand is allowed. PR target/89021 * config/i386/i386-expand.c (ix86_move_vector_high_sse_to_mmx): New function. (ix86_split_mmx_pack): Likewise. * config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx): New prototype. (ix86_split_mmx_pack): Likewise. * config/i386/i386.md (mmx_isa): New. (enabled): Also check mmx_isa. * config/i386/mmx.md (any_s_truncate): New code iterator. (s_trunsuffix): New code attr. (mmx_packsswb): Removed. (mmx_packssdw): Likewise. (mmx_packuswb): Likewise. (mmx_packswb): New define_insn_and_split to emulate MMX packsswb/packuswb with SSE2. (mmx_packssdw): Likewise. * config/i386/predicates.md (register_mmxmem_operand): New. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-expand.c trunk/gcc/config/i386/i386-protos.h trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/mmx.md trunk/gcc/config/i386/predicates.md
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed May 15 15:02:54 2019 New Revision: 271213 URL: https://gcc.gnu.org/viewcvs?rev=271213=gcc=rev Log: i386: Allow MMX register modes in SSE registers In 64-bit mode, SSE2 can be used to emulate MMX instructions without 3DNOW. We can use SSE2 to support MMX register modes. PR target/89021 * config/i386/i386-c.c (ix86_target_macros_internal): Define __MMX_WITH_SSE__ for TARGET_MMX_WITH_SSE. * config/i386/i386.c (ix86_set_reg_reg_cost): Add support for TARGET_MMX_WITH_SSE with VALID_MMX_REG_MODE. (ix86_vector_mode_supported_p): Likewise. * config/i386/i386.h (TARGET_MMX_WITH_SSE): New. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-c.c trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h
[Bug target/89929] __attribute__((target("avx512bw"))) doesn't work on non avx512bw systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89929 --- Comment #25 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Apr 25 17:00:28 2019 New Revision: 270578 URL: https://gcc.gnu.org/viewcvs?rev=270578=gcc=rev Log: x86: Update message for target_clones and unsupported ISAs Before AVX512F, processors with the newer ISAs also support the older ISAs, i.e., AVX2 processors also support AVX and SSE4, SSE4 processors also support SSSE3, ... After AVX512F, an AVX512XX processor may not support AVX512YY. It means AVX512XX features, except for AVX512F, can't be used to decide priority in target_clones. This patch updates error message for ISAs with P_ZERO priority. It also merges _feature_list into _isa_names_table and marks ISAs, which have unknown priority, with P_ZERO so that we only need to update one place to add a new ISA feature. gcc/ 2019-04-25 H.J. Lu PR target/89929 * config/i386/i386.c (feature_priority): Moved to file scope. (processor_features): Likewise. (processor_model): Likewise. (_arch_names_table): Likewise. (arch_names_table): Likewise. (_feature_list): Removed. (feature_list): Likewise. (_isa_names_table): Moved to file scope. Add priority. (isa_names_table): Likewise. (get_builtin_code_for_version): Replace feature_list with isa_names_table. Update error message for P_ZERO priority. gcc/testsuite/ 2019-04-25 Martin Liska H.J. Lu PR target/89929 * g++.target/i386/mv28.C: New test. * gcc.target/i386/mvc14.c: Likewise. * g++.target/i386/pr57362.C: Updated. Added: trunk/gcc/testsuite/g++.target/i386/mv28.C trunk/gcc/testsuite/gcc.target/i386/mvc14.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.target/i386/pr57362.C
[Bug target/90178] [9 Regression] Missed optimization: duplicated terminal basic block with -mavx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90178 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Apr 21 18:23:00 2019 New Revision: 270484 URL: https://gcc.gnu.org/viewcvs?rev=270484=gcc=rev Log: LRA: Revert "Remove useless move insns" Useless move insn removal was added to LRA just to avoid wasting CPU cycles on such insn processing afterwards. Such insns are removed anyway later in the pass pipeline. The CPU time savings are tiny but the removal creates too many problems including PR target/90178. Vladimir pre-approved the patch to remove the code: https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00834.html gcc/ PR target/90178 Revert: 2018-11-21 Uros Bizjak Revert the revert: 2013-10-26 Vladimir Makarov Revert: 2013-10-25 Vladimir Makarov * lra-spills.c (lra_final_code_change): Remove useless move insns. gcc/testsuite/ PR target/90178 * gcc.target/i386/pr90178.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr90178.c Modified: trunk/gcc/ChangeLog trunk/gcc/lra-spills.c trunk/gcc/testsuite/ChangeLog
[Bug testsuite/89907] Incorrect check_avx2_available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89907 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Apr 1 18:34:00 2019 New Revision: 270066 URL: https://gcc.gnu.org/viewcvs?rev=270066=gcc=rev Log: Check avx2_available in check_avx2_available check_avx2_available should check avx2_available, instead of avx_available. Otherwise, check_avx2_available may use result from check_avx_available. PR testsuite/89907 * lib/target-supports.exp (check_avx2_available): Replace avx_available with avx2_available. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/target-supports.exp
[Bug c++/89630] [9 Regression] FAIL: g++.dg/cpp0x/alias-decl-64.C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89630 --- Comment #10 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Mar 18 21:22:30 2019 New Revision: 269781 URL: https://gcc.gnu.org/viewcvs?rev=269781=gcc=rev Log: Add a test for PR c++/89630 PR c++/89630 * g++.target/i386/pr89630.C: New test. Added: trunk/gcc/testsuite/g++.target/i386/pr89630.C Modified: trunk/gcc/testsuite/ChangeLog
[Bug target/89523] Incorrect AVX instructions with VSIB address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89523 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Mar 17 09:27:56 2019 New Revision: 269739 URL: https://gcc.gnu.org/viewcvs?rev=269739=gcc=rev Log: x32: Add addr32 prefix to VSIB address 32-bit indices in VSIB address are sign-extended to 64 bits. In x32, when 32-bit indices are used as addresses, like in vgatherdps %ymm7, 0(,%ymm9,1), %ymm6 32-bit indices, 0xf7fa3010, is sign-extended to 0xf7fa3010 which is invalid address. Add addr32 prefix to UNSPEC_VSIBADDR instructions for x32 if there is no base register nor symbol. This fixes 175.vpr and 254.gap in SPEC CPU 2000 on x32 with -Ofast -funroll-loops -march=haswell gcc/ Backport from mainline 2019-03-14 H.J. Lu PR target/89523 * config/i386/i386.c (ix86_print_operand): Handle 'M' to add addr32 prefix to VSIB address for X32. * config/i386/sse.md (*avx512pf_gatherpfsf_mask): Prepend "%M2" to opcode. (*avx512pf_gatherpfdf_mask): Likewise. (*avx512pf_scatterpfsf_mask): Likewise. (*avx512pf_scatterpfdf_mask): Likewise. (*avx2_gathersi): Prepend "%M3" to opcode. (*avx2_gathersi_2): Prepend "%M2" to opcode. (*avx2_gatherdi): Prepend "%M3" to opcode. (*avx2_gatherdi_2): Prepend "%M2" to opcode. (*avx2_gatherdi_3): Prepend "%M3" to opcode. (*avx2_gatherdi_4): Prepend "%M2" to opcode.` (*avx512f_gathersi): Prepend "%M4" to opcode. (*avx512f_gathersi_2): Prepend "%M3" to opcode. (*avx512f_gatherdi): Prepend "%M4" to opcode. (*avx512f_gatherdi_2): Prepend "%M3" to opcode. (*avx512f_scattersi): Prepend "%M0" to opcode. (*avx512f_scatterdi): Likewise. gcc/testsuite/ Backport from mainline 2019-03-14 H.J. Lu PR target/89523 * gcc.target/i386/pr89523-1a.c: New test. * gcc.target/i386/pr89523-1b.c: Likewise. * gcc.target/i386/pr89523-2.c: Likewise. * gcc.target/i386/pr89523-3.c: Likewise. * gcc.target/i386/pr89523-4.c: Likewise. * gcc.target/i386/pr89523-5.c: Likewise. * gcc.target/i386/pr89523-6.c: Likewise. * gcc.target/i386/pr89523-7.c: Likewise. * gcc.target/i386/pr89523-8.c: Likewise. * gcc.target/i386/pr89523-9.c: Likewise. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-1a.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-1b.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-2.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-3.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-4.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-5.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-6.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-7.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-8.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr89523-9.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/i386.c branches/gcc-7-branch/gcc/config/i386/sse.md branches/gcc-7-branch/gcc/testsuite/ChangeLog
[Bug target/89523] Incorrect AVX instructions with VSIB address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89523 --- Comment #2 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Mar 17 09:11:22 2019 New Revision: 269738 URL: https://gcc.gnu.org/viewcvs?rev=269738=gcc=rev Log: x32: Add addr32 prefix to VSIB address 32-bit indices in VSIB address are sign-extended to 64 bits. In x32, when 32-bit indices are used as addresses, like in vgatherdps %ymm7, 0(,%ymm9,1), %ymm6 32-bit indices, 0xf7fa3010, is sign-extended to 0xf7fa3010 which is invalid address. Add addr32 prefix to UNSPEC_VSIBADDR instructions for x32 if there is no base register nor symbol. This fixes 175.vpr and 254.gap in SPEC CPU 2000 on x32 with -Ofast -funroll-loops -march=haswell gcc/ Backport from mainline 2019-03-14 H.J. Lu PR target/89523 * config/i386/i386.c (ix86_print_operand): Handle 'M' to add addr32 prefix to VSIB address for X32. * config/i386/sse.md (*avx512pf_gatherpfsf_mask): Prepend "%M2" to opcode. (*avx512pf_gatherpfdf_mask): Likewise. (*avx512pf_scatterpfsf_mask): Likewise. (*avx512pf_scatterpfdf_mask): Likewise. (*avx2_gathersi): Prepend "%M3" to opcode. (*avx2_gathersi_2): Prepend "%M2" to opcode. (*avx2_gatherdi): Prepend "%M3" to opcode. (*avx2_gatherdi_2): Prepend "%M2" to opcode. (*avx2_gatherdi_3): Prepend "%M3" to opcode. (*avx2_gatherdi_4): Prepend "%M2" to opcode.` (*avx512f_gathersi): Prepend "%M4" to opcode. (*avx512f_gathersi_2): Prepend "%M3" to opcode. (*avx512f_gatherdi): Prepend "%M4" to opcode. (*avx512f_gatherdi_2): Prepend "%M3" to opcode. (*avx512f_scattersi): Prepend "%M0" to opcode. (*avx512f_scatterdi): Likewise. gcc/testsuite/ Backport from mainline 2019-03-14 H.J. Lu PR target/89523 * gcc.target/i386/pr89523-1a.c: New test. * gcc.target/i386/pr89523-1b.c: Likewise. * gcc.target/i386/pr89523-2.c: Likewise. * gcc.target/i386/pr89523-3.c: Likewise. * gcc.target/i386/pr89523-4.c: Likewise. * gcc.target/i386/pr89523-5.c: Likewise. * gcc.target/i386/pr89523-6.c: Likewise. * gcc.target/i386/pr89523-7.c: Likewise. * gcc.target/i386/pr89523-8.c: Likewise. * gcc.target/i386/pr89523-9.c: Likewise. Added: branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-1a.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-1b.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-2.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-3.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-4.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-5.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-6.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-7.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-8.c branches/gcc-8-branch/gcc/testsuite/gcc.target/i386/pr89523-9.c Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/i386/i386.c branches/gcc-8-branch/gcc/config/i386/sse.md branches/gcc-8-branch/gcc/testsuite/ChangeLog
[Bug target/89650] [9 Regression] ICE in pre_and_rev_post_order_compute, at cfganal.c:1055 since r269119
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89650 --- Comment #4 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Mar 14 20:38:52 2019 New Revision: 269694 URL: https://gcc.gnu.org/viewcvs?rev=269694=gcc=rev Log: i386: Handle REG_EH_REGION note When we split: (insn 18 17 76 2 (set (reg:SF 88 [ _19 ]) (float:SF (mem/c:SI (symbol_ref:DI ("d") [flags 0x2] ) [1 d+0 S4 A32]))) "x.ii":4:20 170 {*floatsisf2} (expr_list:REG_EH_REGION (const_int 2 [0x2]) (nil))) to (insn 94 17 18 2 (set (reg:V4SF 115) (vec_merge:V4SF (vec_duplicate:V4SF (float:SF (mem/c:SI (symbol_ref:DI ("d") [flags 0x2] ) [1 d+0 S4 A32]))) (reg:V4SF 114) (const_int 1 [0x1]))) "x.ii":4:20 -1 (nil)) (insn 18 94 76 2 (set (reg:SF 88 [ _19 ]) (subreg:SF (reg:V4SF 115) 0)) "x.ii":4:20 112 {*movsf_internal} (expr_list:REG_EH_REGION (const_int 2 [0x2]) (nil))) we must copy the REG_EH_REGION note to the first insn and split the block after the newly added insn. The REG_EH_REGION on the second insn will be removed later since it no longer traps. gcc/ PR target/89650 * config/i386/i386.c (remove_partial_avx_dependency): Handle REG_EH_REGION note. gcc/testsuite/ PR target/89650 * g++.target/i386/pr89650.C: New test. Added: trunk/gcc/testsuite/g++.target/i386/pr89650.C Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug target/89523] Incorrect AVX instructions with VSIB address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89523 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Mar 14 08:49:54 2019 New Revision: 269673 URL: https://gcc.gnu.org/viewcvs?rev=269673=gcc=rev Log: x32: Add addr32 prefix to VSIB address 32-bit indices in VSIB address are sign-extended to 64 bits. In x32, when 32-bit indices are used as addresses, like in vgatherdps %ymm7, 0(,%ymm9,1), %ymm6 32-bit indices, 0xf7fa3010, is sign-extended to 0xf7fa3010 which is invalid address. Add addr32 prefix to UNSPEC_VSIBADDR instructions for x32 if there is no base register nor symbol. This fixes 175.vpr and 254.gap in SPEC CPU 2000 on x32 with -Ofast -funroll-loops -march=haswell gcc/ PR target/89523 * config/i386/i386.c (ix86_print_operand): Handle 'M' to add addr32 prefix to VSIB address for X32. * config/i386/sse.md (*avx512pf_gatherpfsf_mask): Prepend "%M2" to opcode. (*avx512pf_gatherpfdf_mask): Likewise. (*avx512pf_scatterpfsf_mask): Likewise. (*avx512pf_scatterpfdf_mask): Likewise. (*avx2_gathersi): Prepend "%M3" to opcode. (*avx2_gathersi_2): Prepend "%M2" to opcode. (*avx2_gatherdi): Prepend "%M3" to opcode. (*avx2_gatherdi_2): Prepend "%M2" to opcode. (*avx2_gatherdi_3): Prepend "%M3" to opcode. (*avx2_gatherdi_4): Prepend "%M2" to opcode.` (*avx512f_gathersi): Prepend "%M4" to opcode. (*avx512f_gathersi_2): Prepend "%M3" to opcode. (*avx512f_gatherdi): Prepend "%M4" to opcode. (*avx512f_gatherdi_2): Prepend "%M3" to opcode. (*avx512f_scattersi): Prepend "%M0" to opcode. (*avx512f_scatterdi): Likewise. gcc/testsuite/ PR target/89523 * gcc.target/i386/pr89523-1a.c: New test. * gcc.target/i386/pr89523-1b.c: Likewise. * gcc.target/i386/pr89523-2.c: Likewise. * gcc.target/i386/pr89523-3.c: Likewise. * gcc.target/i386/pr89523-4.c: Likewise. * gcc.target/i386/pr89523-5.c: Likewise. * gcc.target/i386/pr89523-6.c: Likewise. * gcc.target/i386/pr89523-7.c: Likewise. * gcc.target/i386/pr89523-8.c: Likewise. * gcc.target/i386/pr89523-9.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr89523-1a.c trunk/gcc/testsuite/gcc.target/i386/pr89523-1b.c trunk/gcc/testsuite/gcc.target/i386/pr89523-2.c trunk/gcc/testsuite/gcc.target/i386/pr89523-3.c trunk/gcc/testsuite/gcc.target/i386/pr89523-4.c trunk/gcc/testsuite/gcc.target/i386/pr89523-5.c trunk/gcc/testsuite/gcc.target/i386/pr89523-6.c trunk/gcc/testsuite/gcc.target/i386/pr89523-7.c trunk/gcc/testsuite/gcc.target/i386/pr89523-8.c trunk/gcc/testsuite/gcc.target/i386/pr89523-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug target/89455] [9 Regression] FAIL: g++.target/i386/mv16.C on Westmere
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89455 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Feb 28 14:24:52 2019 New Revision: 269281 URL: https://gcc.gnu.org/viewcvs?rev=269281=gcc=rev Log: i386: Identify Westmere from PCLMUL Since AES has been removed from Westmere in GCC 9, we should identify Westmere from PCLMUL, instead of AES. PR target/89455 * config/i386/i386.c (get_builtin_code_for_version): Identify Westmere from PCLMUL, instead of AES. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c
[Bug target/87007] [8 Regression] 10% slowdown with -march=skylake-avx512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007 --- Comment #10 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Feb 24 22:41:55 2019 New Revision: 269178 URL: https://gcc.gnu.org/viewcvs?rev=269178=gcc=rev Log: i386: Compile PR target/87007 tests with -mfpmath=sse -mfpmath=sse is needed to enable SSE for FP math in 32-bit. PR target/87007 * gcc.target/i386/pr87007-1.c: Compile with -mfpmath=sse. * gcc.target/i386/pr87007-2.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr87007-1.c trunk/gcc/testsuite/gcc.target/i386/pr87007-2.c
[Bug fortran/89476] FAIL: gfortran.dg/ISO_Fortran_binding_5.f90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89476 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Sat Feb 23 19:58:52 2019 New Revision: 269165 URL: https://gcc.gnu.org/viewcvs?rev=269165=gcc=rev Log: Include "../../../libgfortran/ISO_Fortran_binding.h" PR testsuite/89476 * gfortran.dg/ISO_Fortran_binding_5.c: Include "../../../libgfortran/ISO_Fortran_binding.h". * gfortran.dg/ISO_Fortran_binding_6.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_5.c trunk/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_6.c
[Bug driver/69471] "-march=native" unintentionally breaks further -march/-mtune flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69471 --- Comment #10 from hjl at gcc dot gnu.org --- Author: hjl Date: Sat Feb 23 19:39:35 2019 New Revision: 269164 URL: https://gcc.gnu.org/viewcvs?rev=269164=gcc=rev Log: driver: Also prune joined switches with negation When -march=native is passed to host_detect_local_cpu to the backend, it overrides all command lines after it. That means $ gcc -march=native -march=skylake-avx512 is the treated as $ gcc -march=skylake-avx512 -march=native Prune joined switches with Negative and RejectNegative to allow -march=skylake-avx512 to override previous -march=native on command-line. gcc/ PR driver/69471 * opts-common.c (prune_options): Also prune joined switches with Negative and RejectNegative. * config/i386/i386.opt (march=): Add Negative(march=). (mtune=): Add Negative(mtune=). * doc/options.texi: Document Negative used together with Joined and RejectNegative. gcc/testsuite/ PR driver/69471 * gcc.dg/pr69471-1.c: New test. * gcc.dg/pr69471-2.c: Likewise. * gcc.target/i386/pr69471-3.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/pr69471-1.c trunk/gcc/testsuite/gcc.dg/pr69471-2.c trunk/gcc/testsuite/gcc.target/i386/pr69471-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.opt trunk/gcc/doc/options.texi trunk/gcc/opts-common.c trunk/gcc/testsuite/ChangeLog
[Bug target/89071] AVX vcvtsd2ss lets us avoid PXOR dependency breaking for scalar float<->double and other scalar xmm,xmm instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071 --- Comment #21 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Feb 22 15:54:08 2019 New Revision: 269119 URL: https://gcc.gnu.org/viewcvs?rev=269119=gcc=rev Log: i386: Add pass_remove_partial_avx_dependency With -mavx, for $ cat foo.i extern float f; extern double d; extern int i; void foo (void) { d = f; f = i; } we need to generate vxorp[ds] %xmmN, %xmmN, %xmmN ... vcvtss2sd f(%rip), %xmmN, %xmmX ... vcvtsi2ss i(%rip), %xmmN, %xmmY to avoid partial XMM register stall. This patch adds a pass to generate a single vxorps %xmmN, %xmmN, %xmmN at entry of the nearest dominator for basic blocks with SF/DF conversions, which is in the fake loop that contains the whole function, instead of generating one vxorp[ds] %xmmN, %xmmN, %xmmN for each SF/DF conversion. NB: The LCM algorithm isn't appropriate here since it may place a vxorps inside the loop. Simple testcase show this: $ cat badcase.c extern float f; extern double d; void foo (int n, int k) { for (int j = 0; j != n; j++) if (j < k) d = f; } It generates ... loop: if(j < k) vxorps%xmm0, %xmm0, %xmm0 vcvtss2sd f(%rip), %xmm0, %xmm0 ... loopend ... This is because LCM only works when there is a certain benifit. But for conditional branch, LCM wouldn't move vxorps %xmm0, %xmm0, %xmm0 out of loop. SPEC CPU 2017 on Intel Xeon with AVX512 shows: 1. The nearest dominator |RATE |Improvement| |500.perlbench_r| 0.55% | |538.imagick_r | 8.43% | |544.nab_r | 0.71% | 2. LCM |RATE |Improvement| |500.perlbench_r| -0.76% | |538.imagick_r | 7.96% | |544.nab_r | -0.13% | Performance impacts of SPEC CPU 2017 rate on Intel Xeon with AVX512 using -Ofast -flto -march=skylake-avx512 -funroll-loops before commit e739972ad6ad05e32a1dd5c29c0b950a4c4bd576 Author: uros Date: Thu Jan 31 20:06:42 2019 + PR target/89071 * config/i386/i386.md (*extendsfdf2): Split out reg->reg alternative to avoid partial SSE register stall for TARGET_AVX. (truncdfsf2): Ditto. (sse4_1_round2): Ditto. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268427 138bc75d-0d04-0410-961f-82ee72b054a4 are: |INT RATE |Improvement| |500.perlbench_r| 0.55% | |502.gcc_r | 0.14% | |505.mcf_r | 0.08% | |523.xalancbmk_r| 0.18% | |525.x264_r |-0.49% | |531.deepsjeng_r|-0.04% | |541.leela_r|-0.26% | |548.exchange2_r|-0.3% | |557.xz_r |BuildSame| |FP RATE|Improvement| |503.bwaves_r |-0.29% | |507.cactuBSSN_r| 0.04% | |508.namd_r |-0.74% | |510.parest_r |-0.01% | |511.povray_r | 2.23% | |519.lbm_r | 0.1% | |521.wrf_r | 0.49% | |526.blender_r | 0.13% | |527.cam4_r | 0.65% | |538.imagick_r | 8.43% | |544.nab_r | 0.71% | |549.fotonik3d_r| 0.15% | |554.roms_r | 0.08% | After commit e739972ad6ad05e32a1dd5c29c0b950a4c4bd576, on Skylake client, impacts on 538.imagick_r with -fno-unsafe-math-optimizations -march=native -Ofast -funroll-loops -flto 1. Size comparision: before: textdata bss dec hex filename 243637783524528 2449257 255f69 imagick_r after: textdata bss dec hex filename 242524983524528 2438129 2533f1 imagick_r 2. Number of vxorps: before after difference 49484135-19.66% 3. Performance improvement: |RATE |Improvement| |538.imagick_r | 5.5% | gcc/ 2019-02-22 H.J. Lu Hongtao Liu Sunil K Pandey PR target/87007 * config/i386/i386-passes.def: Add pass_remove_partial_avx_dependency. * config/i386/i386-protos.h (make_pass_remove_partial_avx_dependency): New. * config/i386/i386.c (make_pass_remove_partial_avx_dependency): New function. (pass_data_remove_partial_avx_dependency): New. (pass_remove_partial_avx_dependency): Likewise. (make_pass_remove_partial_avx_dependency): Likewise. * config/i386/i386.md (avx_partial_xmm_update): New attribute. (*extendsfdf2): Add avx_partial_xmm_update. (truncdfsf2): Likewise. (*float2): Likewise. (SF/DF conversion splitters): Disabled for TARGET_AVX. gcc/testsuite/ 2019-02-22 H.J. Lu Hongtao Liu Sunil K Pandey PR target/87007 * gcc.target/i386/pr87007-1.c: New test. * gcc.target/i386/pr87007-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr87007-1.c
[Bug target/87007] [8/9 Regression] 10% slowdown with -march=skylake-avx512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Feb 22 15:54:08 2019 New Revision: 269119 URL: https://gcc.gnu.org/viewcvs?rev=269119=gcc=rev Log: i386: Add pass_remove_partial_avx_dependency With -mavx, for $ cat foo.i extern float f; extern double d; extern int i; void foo (void) { d = f; f = i; } we need to generate vxorp[ds] %xmmN, %xmmN, %xmmN ... vcvtss2sd f(%rip), %xmmN, %xmmX ... vcvtsi2ss i(%rip), %xmmN, %xmmY to avoid partial XMM register stall. This patch adds a pass to generate a single vxorps %xmmN, %xmmN, %xmmN at entry of the nearest dominator for basic blocks with SF/DF conversions, which is in the fake loop that contains the whole function, instead of generating one vxorp[ds] %xmmN, %xmmN, %xmmN for each SF/DF conversion. NB: The LCM algorithm isn't appropriate here since it may place a vxorps inside the loop. Simple testcase show this: $ cat badcase.c extern float f; extern double d; void foo (int n, int k) { for (int j = 0; j != n; j++) if (j < k) d = f; } It generates ... loop: if(j < k) vxorps%xmm0, %xmm0, %xmm0 vcvtss2sd f(%rip), %xmm0, %xmm0 ... loopend ... This is because LCM only works when there is a certain benifit. But for conditional branch, LCM wouldn't move vxorps %xmm0, %xmm0, %xmm0 out of loop. SPEC CPU 2017 on Intel Xeon with AVX512 shows: 1. The nearest dominator |RATE |Improvement| |500.perlbench_r| 0.55% | |538.imagick_r | 8.43% | |544.nab_r | 0.71% | 2. LCM |RATE |Improvement| |500.perlbench_r| -0.76% | |538.imagick_r | 7.96% | |544.nab_r | -0.13% | Performance impacts of SPEC CPU 2017 rate on Intel Xeon with AVX512 using -Ofast -flto -march=skylake-avx512 -funroll-loops before commit e739972ad6ad05e32a1dd5c29c0b950a4c4bd576 Author: uros Date: Thu Jan 31 20:06:42 2019 + PR target/89071 * config/i386/i386.md (*extendsfdf2): Split out reg->reg alternative to avoid partial SSE register stall for TARGET_AVX. (truncdfsf2): Ditto. (sse4_1_round2): Ditto. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268427 138bc75d-0d04-0410-961f-82ee72b054a4 are: |INT RATE |Improvement| |500.perlbench_r| 0.55% | |502.gcc_r | 0.14% | |505.mcf_r | 0.08% | |523.xalancbmk_r| 0.18% | |525.x264_r |-0.49% | |531.deepsjeng_r|-0.04% | |541.leela_r|-0.26% | |548.exchange2_r|-0.3% | |557.xz_r |BuildSame| |FP RATE|Improvement| |503.bwaves_r |-0.29% | |507.cactuBSSN_r| 0.04% | |508.namd_r |-0.74% | |510.parest_r |-0.01% | |511.povray_r | 2.23% | |519.lbm_r | 0.1% | |521.wrf_r | 0.49% | |526.blender_r | 0.13% | |527.cam4_r | 0.65% | |538.imagick_r | 8.43% | |544.nab_r | 0.71% | |549.fotonik3d_r| 0.15% | |554.roms_r | 0.08% | After commit e739972ad6ad05e32a1dd5c29c0b950a4c4bd576, on Skylake client, impacts on 538.imagick_r with -fno-unsafe-math-optimizations -march=native -Ofast -funroll-loops -flto 1. Size comparision: before: textdata bss dec hex filename 243637783524528 2449257 255f69 imagick_r after: textdata bss dec hex filename 242524983524528 2438129 2533f1 imagick_r 2. Number of vxorps: before after difference 49484135-19.66% 3. Performance improvement: |RATE |Improvement| |538.imagick_r | 5.5% | gcc/ 2019-02-22 H.J. Lu Hongtao Liu Sunil K Pandey PR target/87007 * config/i386/i386-passes.def: Add pass_remove_partial_avx_dependency. * config/i386/i386-protos.h (make_pass_remove_partial_avx_dependency): New. * config/i386/i386.c (make_pass_remove_partial_avx_dependency): New function. (pass_data_remove_partial_avx_dependency): New. (pass_remove_partial_avx_dependency): Likewise. (make_pass_remove_partial_avx_dependency): Likewise. * config/i386/i386.md (avx_partial_xmm_update): New attribute. (*extendsfdf2): Add avx_partial_xmm_update. (truncdfsf2): Likewise. (*float2): Likewise. (SF/DF conversion splitters): Disabled for TARGET_AVX. gcc/testsuite/ 2019-02-22 H.J. Lu Hongtao Liu Sunil K Pandey PR target/87007 * gcc.target/i386/pr87007-1.c: New test. * gcc.target/i386/pr87007-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr87007-1.c
[Bug target/89444] [9 Regression] r264052 removed PTA_AES from PTA_SKYLAKE/PTA_GOLDMONT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89444 --- Comment #2 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Feb 22 12:49:21 2019 New Revision: 269114 URL: https://gcc.gnu.org/viewcvs?rev=269114=gcc=rev Log: x86: (Reapply) Move AESNI generation to Skylake and Goldmont This is a repeat of commit r263989, which commit r264052 accidentally reverted. 2019-02-22 Thiago Macieira PR target/89444 * config/i386/i386.h (PTA_WESTMERE): Remove PTA_AES. (PTA_SKYLAKE): Add PTA_AES. (PTA_GOLDMONT): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.h
[Bug target/87412] -fcf-protection and -mindirect-branch=thunk are incompatible on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87412 --- Comment #5 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Feb 21 18:59:00 2019 New Revision: 269074 URL: https://gcc.gnu.org/viewcvs?rev=269074=gcc=rev Log: i386: Replace -fcf-protection with -fcf-protection=branch Since -mindirect-branch is incompatible with -fcf-protection=return and -fcf-protection, replace -fcf-protection with -fcf-protection=branch. PR target/87412 * gcc.target/i386/indirect-thunk-attr-14.c: Replace -fcf-protection with -fcf-protection=branch. * gcc.target/i386/indirect-thunk-attr-15.c: Likewise. * gcc.target/i386/indirect-thunk-attr-16.c: Likewise. * gcc.target/i386/indirect-thunk-extern-8.c: Likewise. * gcc.target/i386/indirect-thunk-extern-9.c: Likewise. * gcc.target/i386/indirect-thunk-extern-10.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-14.c trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-15.c trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-16.c trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-10.c trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-8.c trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-9.c
[Bug target/87412] -fcf-protection and -mindirect-branch=thunk are incompatible on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87412 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Feb 21 13:41:53 2019 New Revision: 269068 URL: https://gcc.gnu.org/viewcvs?rev=269068=gcc=rev Log: i386: Check -mindirect-branch/-mfunction-return with -fcf-protection Issue an error when -mindirect-branch or -mfunction-return are used with incompatible -fcf-protection. gcc/ PR target/87412 * config/i386/i386.c (ix86_set_indirect_branch_type): Issue an error for -mindirect-branch/-mfunction-return with incompatible -fcf-protection. gcc/testsuite/ PR target/87412 * gcc.target/i386/pr87412-1.c: New file. * gcc.target/i386/pr87412-2.c: Likewise. * gcc.target/i386/pr87412-3.c: Likewise. * gcc.target/i386/pr87412-4.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr87412-1.c trunk/gcc/testsuite/gcc.target/i386/pr87412-2.c trunk/gcc/testsuite/gcc.target/i386/pr87412-3.c trunk/gcc/testsuite/gcc.target/i386/pr87412-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug sanitizer/89409] [9 Regression] FAIL: c-c++-common/ubsan/div-by-zero-[67].c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89409 --- Comment #5 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed Feb 20 16:20:50 2019 New Revision: 269042 URL: https://gcc.gnu.org/viewcvs?rev=269042=gcc=rev Log: libsanitizer: Restore internal_readlink for x32 Cherry-pick compiler-rt revision 354451: r316591 has @@ -389,13 +383,11 @@ uptr internal_dup2(int oldfd, int newfd) { } uptr internal_readlink(const char *path, char *buf, uptr bufsize) { -#if SANITIZER_NETBSD - return internal_syscall_ptr(SYSCALL(readlink), path, buf, bufsize); -#elif SANITIZER_USES_CANONICAL_LINUX_SYSCALLS +#if SANITIZER_USES_CANONICAL_LINUX_SYSCALLS return internal_syscall(SYSCALL(readlinkat), AT_FDCWD, (uptr)path, (uptr)buf, bufsize); #else - return internal_syscall(SYSCALL(readlink), (uptr)path, (uptr)buf, bufsize); + return internal_syscall_ptr(SYSCALL(readlink), path, buf, bufsize); #endif } which dropped the (uptr) cast and broke x32. This patch puts back the (uptr) cast to restore x32 and fixes: https://bugs.llvm.org/show_bug.cgi?id=40783 Differential Revision: https://reviews.llvm.org/D58413 PR sanitizer/89409 * sanitizer_common/sanitizer_linux.cc (internal_readlink): Cherry-pick compiler-rt r354451. Modified: trunk/libsanitizer/ChangeLog trunk/libsanitizer/sanitizer_common/sanitizer_linux.cc
[Bug target/89397] [7/8/9 Regression] ICE in build_call_expr_loc_array at gcc/tree.c:11563 since r229082
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89397 --- Comment #2 from hjl at gcc dot gnu.org --- Author: hjl Date: Tue Feb 19 14:19:33 2019 New Revision: 269017 URL: https://gcc.gnu.org/viewcvs?rev=269017=gcc=rev Log: i386: Set ix86_fpmath to FPMATH_387 without SSE ix86_fpmath should be set to combination of FPMATH_387 and FPMATH_SSE. When SSE is disabled, it should be set to FPMATH_387 and 387 codegen is also controlled by -msoft-float. gcc/ PR target/89397 * config/i386/i386.c (ix86_option_override_internal): Set opts->x_ix86_fpmath to FPMATH_387 when SSE is disabled. gcc/testsuite/ PR target/89397 * gcc.target/i386/pr89397.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr89397.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug target/89372] Incorrect PMULHRSW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89372 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Sat Feb 16 22:16:04 2019 New Revision: 268964 URL: https://gcc.gnu.org/viewcvs?rev=268964=gcc=rev Log: i386: Add ssse3_pmulhrswv4hi3 expander There is no V4HI pmulhrsw in AVX512BW and V4HI/V8HI pmulhrsw don't require AVX2. PR target/89372 * config/i386/sse.md (ssedoublemode): Remove V4HI. (PMULHRSW): Likewise. (_pmulhrsw3): Require TARGET_SSSE3, not TARGET_AVX2. (ssse3_pmulhrswv4hi3): New expander. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md
[Bug target/89229] [7/8 Regression] Unnecessary ZMM in movoi_internal_avx/movti_internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Tue Feb 12 19:00:35 2019 New Revision: 268811 URL: https://gcc.gnu.org/viewcvs?rev=268811=gcc=rev Log: i386: Revert revision 268678 and revision 268657 i386 backend has INT_MODE (OI, 32); INT_MODE (XI, 64); So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation, in case of const_1, all 512 bits set. We can load zeros with narrower instruction, (e.g. 256 bit by inherent zeroing of highpart in case of 128 bit xor), so TImode in this case. Some targets prefer V4SF mode, so they will emit float xorps for zeroing Then the introduction of AVX512F fubared everything by overloading the meaning of insn mode. How should we use INSN mode, MODE_XI, in standard_sse_constant_opcode and patterns which use standard_sse_constant_opcode? 2 options: 1. MODE_XI should only used to check if EXT_REX_SSE_REG_P is true in any register operand. The operand size must be determined by operand itself , not by MODE_XI. The operand encoding size should be determined by the operand size, EXT_REX_SSE_REG_P and AVX512VL. 2. MODE_XI should be used to determine the operand encoding size. EXT_REX_SSE_REG_P and AVX512VL should be checked for encoding instructions. gcc/ PR target/89229 * config/i386/i386.md (*movoi_internal_avx): Revert revision 268678 and revision 268657. (*movti_internal): Likewise. gcc/testsuite/ PR target/89229 * gcc.target/i386/pr89229-1.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr89229-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
[Bug target/89229] [7/8/9 Regression] Unnecessary ZMM in movoi_internal_avx/movti_internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229 --- Comment #5 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Feb 8 11:30:53 2019 New Revision: 268678 URL: https://gcc.gnu.org/viewcvs?rev=268678=gcc=rev Log: i386: Use OI/TImode in *mov[ot]i_internal_avx with AVX512VL OImode and TImode moves must be done in XImode to access upper 16 vector registers without AVX512VL. With AVX512VL, we can access upper 16 vector registers in OImode and TImode. PR target/89229 * config/i386/i386.md (*movoi_internal_avx): Set mode to XI for upper 16 vector registers without TARGET_AVX512VL. (*movti_internal): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
[Bug target/89229] [7/8/9 Regression] Unnecessary ZMM in movoi_internal_avx/movti_internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Feb 7 17:58:19 2019 New Revision: 268657 URL: https://gcc.gnu.org/viewcvs?rev=268657=gcc=rev Log: i386: Fix typo in *movoi_internal_avx/movti_internal PR target/89229 * config/i386/i386.md (*movoi_internal_avx): Set mode to OI for TARGET_AVX512VL. (*movti_internal): Set mode to TI for TARGET_AVX512VL. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
[Bug libgcc/88931] Inaccurate DI/TI -> SF/DF conversions in libgcc2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88931 --- Comment #5 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed Jan 23 21:41:59 2019 New Revision: 268216 URL: https://gcc.gnu.org/viewcvs?rev=268216=gcc=rev Log: libgcc2.c: Correct DI/TI -> SF/DF conversions FSTYPE FUNC (DWtype u) in libgcc2.c, which converts DI/TI to SF/DF, has /* No leading bits means u == minimum. */ if (count == 0) return -(Wtype_MAXp1_F * (Wtype_MAXp1_F / 2)); in the third case (where actually count == 0 only means the high part is minimum). It should be: /* No leading bits means u == minimum. */ if (count == 0) return Wtype_MAXp1_F * (FSTYPE) (hi | ((UWtype) u != 0)); instead. gcc/testsuite/ 2019-01-23 H.J. Lu PR libgcc/88931 * gcc.dg/torture/fp-int-convert-timode-1.c: New test. * gcc.dg/torture/fp-int-convert-timode-2.c: Likewise. * gcc.dg/torture/fp-int-convert-timode-3.c: Likewise. * gcc.dg/torture/fp-int-convert-timode-4.c: Likewise. libgcc/ 2019-01-23 Joseph Myers PR libgcc/88931 * libgcc2.c (FSTYPE FUNC (DWtype u)): Correct no leading bits case. Added: trunk/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-1.c trunk/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-2.c trunk/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c trunk/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c Modified: trunk/gcc/testsuite/ChangeLog trunk/libgcc/ChangeLog trunk/libgcc/libgcc2.c
[Bug target/88909] struct builtin_description doesn't support ix86_isa_flags2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88909 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Tue Jan 22 16:20:25 2019 New Revision: 268155 URL: https://gcc.gnu.org/viewcvs?rev=268155=gcc=rev Log: i386: Add mask2 to builtin_description There are struct builtin_description { const HOST_WIDE_INT mask; const enum insn_code icode; const char *const name; const enum ix86_builtins code; const enum rtx_code comparison; const int flag; }; Since "mask" is used for both ix86_isa_flags and ix86_isa_flags2, buitins with both flags can't be handled easily. This patch adds mask2 to builtin_description to handle it properly. 2019-01-22 Hongtao Liu H.J. Lu PR target/88909 * config/i386/i386-builtin.def: Add mask2 to all builtin initializations. Merge ARGS2 and SPECIAL_ARGS2 into ARGS and SPECIAL_ARGS. * config/i386/i386.c (BDESC): Add mask2 to the definition. (BDESC_FIRST): Likewise. (define_builtin): Add an argument for mask2. Updated to handle both ix86_isa_flags and ix86_isa_flags2. (define_builtin_const): Likewise. (define_builtin_pure): Likewise. (define_builtin2): Deleted. (define_builtin_const2): Likewise. (builtin_description): Add a member, mask2. (bdesc_*): Add mask2 to builtin initializations. (ix86_init_mmx_sse_builtins): Update calls to def_builtin, def_builtin_const and def_builtin_pure. Remove SPECIAL_ARGS2 support. (ix86_get_builtin_func_type): Remove SPECIAL_ARGS2 support. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-builtin.def trunk/gcc/config/i386/i386.c
[Bug target/88954] __attribute__((noplt)) doesn't work with function pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954 --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Tue Jan 22 14:53:41 2019 New Revision: 268152 URL: https://gcc.gnu.org/viewcvs?rev=268152=gcc=rev Log: i386: Load external function address via GOT slot With noplt attribute, we load the external function address via the GOT slot so that linker won't create an PLT entry for extern function address. gcc/ PR target/88954 * config/i386/i386.c (ix86_force_load_from_GOT_p): Also check noplt attribute. gcc/testsuite/ PR target/88954 * gcc.target/i386/pr88954-1.c: New test. * gcc.target/i386/pr88954-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr88954-1.c trunk/gcc/testsuite/gcc.target/i386/pr88954-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug target/71659] _xgetbv intrinsic missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71659 --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Jan 21 12:23:49 2019 New Revision: 268113 URL: https://gcc.gnu.org/viewcvs?rev=268113=gcc=rev Log: i386: Move Intel intrinsics head files to According to Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel intrinsics should be available by including . This patch moves remaining Intel intrinsics head files from to . PR target/71659 * config/i386/adxintrin.h: Just check _IMMINTRIN_H_INCLUDED. * config/i386/clflushoptintrin.h: Check _IMMINTRIN_H_INCLUDED instead of _X86INTRIN_H_INCLUDED. * onfig/i386/clwbintrin.h: Likewise. * config/i386/pkuintrin.h: Likewise. * config/i386/prfchwintrin.h: Likewise. * config/i386/rdseedintrin.h: Likewise. * config/i386/wbnoinvdintrin.h: Likewise. * config/i386/xsavecintrin.h: Likewise. * config/i386/xsavesintrin.h: Likewise. * config/i386/fxsrintrin.h: Enable _IMMINTRIN_H_INCLUDED check. * config/i386/xsaveintrin.h: Likewise. * config/i386/xsaveoptintrin.h: Likewise. * config/i386/x86intrin.h: Move "#include" , , , , , , , , , , and to ... * config/i386/immintrin.h: Here. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/adxintrin.h trunk/gcc/config/i386/clflushoptintrin.h trunk/gcc/config/i386/clwbintrin.h trunk/gcc/config/i386/fxsrintrin.h trunk/gcc/config/i386/immintrin.h trunk/gcc/config/i386/pkuintrin.h trunk/gcc/config/i386/prfchwintrin.h trunk/gcc/config/i386/rdseedintrin.h trunk/gcc/config/i386/wbnoinvdintrin.h trunk/gcc/config/i386/x86intrin.h trunk/gcc/config/i386/xsavecintrin.h trunk/gcc/config/i386/xsaveintrin.h trunk/gcc/config/i386/xsaveoptintrin.h trunk/gcc/config/i386/xsavesintrin.h
[Bug middle-end/88587] ICE in expand_debug_locations, at cfgexpand.c:5450
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88587 --- Comment #14 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Jan 18 14:33:46 2019 New Revision: 268079 URL: https://gcc.gnu.org/viewcvs?rev=268079=gcc=rev Log: Update PR middle-end/88587 tests It is wrong to use -m32 in dg-options. { target ia32 } should be used instead. Also add -fno-pic to g++.target/i386/pr88587.C since it is invalid with PIC. PR middle-end/88587 * g++.target/i386/pr88587.C (dg-do): Add { target ia32 }. (dg-options): Replace -m32 with -fno-pic. * gcc.target/i386/mvc13.c (dg-do): Add { target ia32 }. (dg-options): Remove -m32. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.target/i386/pr88587.C trunk/gcc/testsuite/gcc.target/i386/mvc13.c
[Bug c/51628] __attribute__((packed)) is unsafe in some cases (i.e. add -Waddress-of-packed-member, etc.)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51628 --- Comment #61 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Jan 18 13:05:18 2019 New Revision: 268075 URL: https://gcc.gnu.org/viewcvs?rev=268075=gcc=rev Log: c-family: Update unaligned adress of packed member check Check unaligned pointer conversion and strip NOPS. gcc/c-family/ PR c/51628 PR c/88664 * c-common.h (warn_for_address_or_pointer_of_packed_member): Remove the boolean argument. * c-warn.c (check_address_of_packed_member): Renamed to ... (check_address_or_pointer_of_packed_member): This. Also warn pointer conversion. (check_and_warn_address_of_packed_member): Renamed to ... (check_and_warn_address_or_pointer_of_packed_member): This. Also warn pointer conversion. (warn_for_address_or_pointer_of_packed_member): Remove the boolean argument. Don't check pointer conversion here. gcc/c PR c/51628 PR c/88664 * c-typeck.c (convert_for_assignment): Upate the warn_for_address_or_pointer_of_packed_member call. gcc/cp PR c/51628 PR c/88664 * call.c (convert_for_arg_passing): Upate the warn_for_address_or_pointer_of_packed_member call. * typeck.c (convert_for_assignment): Likewise. gcc/testsuite/ PR c/51628 PR c/88664 * c-c++-common/pr51628-33.c: New test. * c-c++-common/pr51628-35.c: New test. * c-c++-common/pr88664-1.c: Likewise. * c-c++-common/pr88664-2.c: Likewise. * gcc.dg/pr51628-34.c: Likewise. Added: trunk/gcc/testsuite/c-c++-common/pr51628-33.c trunk/gcc/testsuite/c-c++-common/pr51628-35.c trunk/gcc/testsuite/c-c++-common/pr88664-1.c trunk/gcc/testsuite/c-c++-common/pr88664-2.c trunk/gcc/testsuite/gcc.dg/pr51628-34.c Modified: trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.h trunk/gcc/c-family/c-warn.c trunk/gcc/c/ChangeLog trunk/gcc/c/c-typeck.c trunk/gcc/cp/ChangeLog trunk/gcc/cp/call.c trunk/gcc/cp/typeck.c trunk/gcc/testsuite/ChangeLog
[Bug c++/88664] [9 Regression] False positive -Waddress-of-packed-member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88664 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Jan 18 13:05:18 2019 New Revision: 268075 URL: https://gcc.gnu.org/viewcvs?rev=268075=gcc=rev Log: c-family: Update unaligned adress of packed member check Check unaligned pointer conversion and strip NOPS. gcc/c-family/ PR c/51628 PR c/88664 * c-common.h (warn_for_address_or_pointer_of_packed_member): Remove the boolean argument. * c-warn.c (check_address_of_packed_member): Renamed to ... (check_address_or_pointer_of_packed_member): This. Also warn pointer conversion. (check_and_warn_address_of_packed_member): Renamed to ... (check_and_warn_address_or_pointer_of_packed_member): This. Also warn pointer conversion. (warn_for_address_or_pointer_of_packed_member): Remove the boolean argument. Don't check pointer conversion here. gcc/c PR c/51628 PR c/88664 * c-typeck.c (convert_for_assignment): Upate the warn_for_address_or_pointer_of_packed_member call. gcc/cp PR c/51628 PR c/88664 * call.c (convert_for_arg_passing): Upate the warn_for_address_or_pointer_of_packed_member call. * typeck.c (convert_for_assignment): Likewise. gcc/testsuite/ PR c/51628 PR c/88664 * c-c++-common/pr51628-33.c: New test. * c-c++-common/pr51628-35.c: New test. * c-c++-common/pr88664-1.c: Likewise. * c-c++-common/pr88664-2.c: Likewise. * gcc.dg/pr51628-34.c: Likewise. Added: trunk/gcc/testsuite/c-c++-common/pr51628-33.c trunk/gcc/testsuite/c-c++-common/pr51628-35.c trunk/gcc/testsuite/c-c++-common/pr88664-1.c trunk/gcc/testsuite/c-c++-common/pr88664-2.c trunk/gcc/testsuite/gcc.dg/pr51628-34.c Modified: trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.h trunk/gcc/c-family/c-warn.c trunk/gcc/c/ChangeLog trunk/gcc/c/c-typeck.c trunk/gcc/cp/ChangeLog trunk/gcc/cp/call.c trunk/gcc/cp/typeck.c trunk/gcc/testsuite/ChangeLog
[Bug target/88717] Unnecessary vzeroupper
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88717 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Tue Jan 8 17:40:18 2019 New Revision: 267732 URL: https://gcc.gnu.org/viewcvs?rev=267732=gcc=rev Log: x86: Don't generate vzeroupper if caller passes AVX/AVX512 registers There is no need to generate vzeroupper if caller passes arguments in AVX/AVX512 registers. Tested on i686 and x86-64 with and without --with-arch=native. gcc/ PR target/88717 * config/i386/i386.c (ix86_avx_u128_mode_exit): Call ix86_avx_u128_mode_entry. gcc/testsuite/ PR target/88717 * gcc.target/i386/pr88717.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr88717.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug testsuite/88639] FAIL: gfortran.dg/ieee/ieee_10.f90 -O0 (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88639 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Dec 30 17:36:05 2018 New Revision: 267478 URL: https://gcc.gnu.org/viewcvs?rev=267478=gcc=rev Log: Use dg-additional-options instead of dg-options Use dg-additional-options rather than dg-options to avoid overwriting the default IEEE options which are passed by ieee.exp and necessary. PR testsuite/88639 * gfortran.dg/ieee/ieee_10.f90: Use dg-additional-options instead of dg-options. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/ieee/ieee_10.f90
[Bug c/51628] __attribute__((packed)) is unsafe in some cases (i.e. add -Waddress-of-packed-member, etc.)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51628 --- Comment #54 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Dec 20 21:41:48 2018 New Revision: 267313 URL: https://gcc.gnu.org/viewcvs?rev=267313=gcc=rev Log: C/C++: Add -Waddress-of-packed-member When address of packed member of struct or union is taken, it may result in an unaligned pointer value. This patch adds -Waddress-of-packed-member to check alignment at pointer assignment and warn unaligned address as well as unaligned pointer: $ cat x.i struct pair_t { char c; int i; } __attribute__ ((packed)); extern struct pair_t p; int *addr = $ gcc -O2 -S x.i x.i:8:13: warning: taking address of packed member of ‘struct pair_t’ may result in an unaligned pointer value [-Waddress-of-packed-member] 8 | int *addr = | ^ $ cat c.i struct B { int i; }; struct C { struct B b; } __attribute__ ((packed)); long* g8 (struct C *p) { return p; } $ gcc -O2 -S c.i -Wno-incompatible-pointer-types c.i: In function ‘g8’: c.i:4:18: warning: converting a packed ‘struct C *’ pointer (alignment 1) to ‘long int *’ (alignment 8) may may result in an unaligned pointer value [-Waddress-of-packed-member] 4 | long* g8 (struct C *p) { return p; } | ^ c.i:2:8: note: defined here 2 | struct C { struct B b; } __attribute__ ((packed)); |^ $ This warning is enabled by default. Since read_encoded_value_with_base in unwind-pe.h has union unaligned { void *ptr; unsigned u2 __attribute__ ((mode (HI))); unsigned u4 __attribute__ ((mode (SI))); unsigned u8 __attribute__ ((mode (DI))); signed s2 __attribute__ ((mode (HI))); signed s4 __attribute__ ((mode (SI))); signed s8 __attribute__ ((mode (DI))); } __attribute__((__packed__)); _Unwind_Internal_Ptr result; and GCC warns: gcc/libgcc/unwind-pe.h:210:37: warning: taking address of packed member of 'union unaligned' may result in an unaligned pointer value [-Waddress-of-packed-member] result = (_Unwind_Internal_Ptr) u->ptr; ^ we need to add GCC pragma to ignore -Waddress-of-packed-member. gcc/ PR c/51628 * doc/invoke.texi: Document -Wno-address-of-packed-member. gcc/c-family/ PR c/51628 * c-common.h (warn_for_address_or_pointer_of_packed_member): New. * c-warn.c (check_alignment_of_packed_member): New function. (check_address_of_packed_member): Likewise. (check_and_warn_address_of_packed_member): Likewise. (warn_for_address_or_pointer_of_packed_member): Likewise. * c.opt: Add -Wno-address-of-packed-member. gcc/c/ PR c/51628 * c-typeck.c (convert_for_assignment): Call warn_for_address_or_pointer_of_packed_member. gcc/cp/ PR c/51628 * call.c (convert_for_arg_passing): Call warn_for_address_or_pointer_of_packed_member. * typeck.c (convert_for_assignment): Likewise. gcc/testsuite/ PR c/51628 * c-c++-common/pr51628-1.c: New test. * c-c++-common/pr51628-2.c: Likewise. * c-c++-common/pr51628-3.c: Likewise. * c-c++-common/pr51628-4.c: Likewise. * c-c++-common/pr51628-5.c: Likewise. * c-c++-common/pr51628-6.c: Likewise. * c-c++-common/pr51628-7.c: Likewise. * c-c++-common/pr51628-8.c: Likewise. * c-c++-common/pr51628-9.c: Likewise. * c-c++-common/pr51628-10.c: Likewise. * c-c++-common/pr51628-11.c: Likewise. * c-c++-common/pr51628-12.c: Likewise. * c-c++-common/pr51628-13.c: Likewise. * c-c++-common/pr51628-14.c: Likewise. * c-c++-common/pr51628-15.c: Likewise. * c-c++-common/pr51628-26.c: Likewise. * c-c++-common/pr51628-27.c: Likewise. * c-c++-common/pr51628-28.c: Likewise. * c-c++-common/pr51628-29.c: Likewise. * c-c++-common/pr51628-30.c: Likewise. * c-c++-common/pr51628-31.c: Likewise. * c-c++-common/pr51628-32.c: Likewise. * gcc.dg/pr51628-17.c: Likewise. * gcc.dg/pr51628-18.c: Likewise. * gcc.dg/pr51628-19.c: Likewise. * gcc.dg/pr51628-20.c: Likewise. * gcc.dg/pr51628-21.c: Likewise. * gcc.dg/pr51628-22.c: Likewise. * gcc.dg/pr51628-23.c: Likewise. * gcc.dg/pr51628-24.c: Likewise. * gcc.dg/pr51628-25.c: Likewise. * c-c++-common/asan/misalign-1.c: Add -Wno-address-of-packed-member. * c-c++-common/asan/misalign-2.c: Likewise. * c-c++-common/ubsan/align-2.c: Likewise. * c-c++-common/ubsan/align-4.c: Likewise. * c-c++-common/ubsan/align-6.c: Likewise. * c-c++-common/ubsan/align-7.c: Likewise. * c-c++-common/ubsan/align-8.c: Likewise. * c-c++-common/ubsan/align-10.c: Likewise. * g++.dg/ubsan/align-2.C: Likewise. * gcc.target/i386/avx512bw-vmovdqu16-2.c: Likewise. * gcc.target/i386/avx512f-vmovdqu32-
[Bug debug/79342] [6 Regression] ICE in output_index_string, at dwarf2out.c:25635 with -gsplit-dwarf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79342 --- Comment #14 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Dec 17 13:49:16 2018 New Revision: 267202 URL: https://gcc.gnu.org/viewcvs?rev=267202=gcc=rev Log: DWARF: Don't expand hash table when no insertion is needed dwarf2out_finish performs: 1. save_macinfo_strings 2. hash table traverse of index_string 3. output_macinfo -> output_macinfo_op 4. output_indirect_strings -> hash table traverse of output_index_string find_slot_with_hash has if (insert == INSERT && m_size * 3 <= m_n_elements * 4) expand (); which may expand hash table even if no insertion is neeed and change hash table traverse order. When output_macinfo_op is called, all index strings have been added to hash table by save_macinfo_strings and we shouldn't expand index string hash table. Otherwise find_slot_with_hash will expand hash table when hash table has the right size and hash table traverse of output_index_string will have a different traverse order from index_string. PR debug/79342 * dwarf2out.c (find_AT_string_in_table): Add insert argument defaulting to INSERT and replace INSERT. (find_AT_string): Likewise. (output_macinfo_op): Pass NO_INSERT to find_AT_string. Modified: trunk/gcc/ChangeLog trunk/gcc/dwarf2out.c
[Bug target/88483] Unnecessary stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88483 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Dec 16 13:55:18 2018 New Revision: 267186 URL: https://gcc.gnu.org/viewcvs?rev=267186=gcc=rev Log: x86: Revert reversion 267133 Revert commit: commit 76c21b271247ccbd681bdb4530426d2fe35dbfa5 Author: hjl Date: Fri Dec 14 12:38:04 2018 + x86: Don't use get_frame_size when finalizing stack frame gcc/ PR target/88483 * config/i386/i386.c (ix86_finalize_stack_frame_flags): Revert reversion 267133. gcc/testsuite/ PR target/88483 * gcc.target/i386/stackalign/pr88483.c: Removed. Revert reversion 267133. Removed: trunk/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug target/88483] Unnecessary stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88483 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Dec 14 12:38:04 2018 New Revision: 267133 URL: https://gcc.gnu.org/viewcvs?rev=267133=gcc=rev Log: x86: Don't use get_frame_size when finalizing stack frame get_frame_size () returns used stack slots during compilation, which may be optimized out later. Since ix86_find_max_used_stack_alignment is called by ix86_finalize_stack_frame_flags to check if stack frame is required, there is no need to call get_frame_size () which may give inaccurate final stack frame size. Tested on AVX512 machine configured with --with-arch=native --with-cpu=native gcc/ PR target/88483 * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't use get_frame_size (). gcc/testsuite/ PR target/88483 * gcc.target/i386/stackalign/pr88483.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug target/87853] _mm_cmpgt_epi8 broken with -funsigned-char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87853 --- Comment #12 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu Dec 13 14:42:43 2018 New Revision: 267095 URL: https://gcc.gnu.org/viewcvs?rev=267095=gcc=rev Log: x86: Fix _mm_cmpgt_epi8 with -funsigned-char Backport from mainline 2018-11-05 Xuepeng Guo PR target/87853 * config/i386/emmintrin.h (__v16qs): New to cope with option -funsigned-char. (_mm_cmpeq_epi8): Replace __v16qi with __v16qs. (_mm_cmplt_epi8): Likewise. (_mm_cmpgt_epi8): Likewise. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/i386/emmintrin.h
[Bug target/87317] Missed optimisation: merging VMOVQ with operations that only use the low 8 bytes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87317 --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Wed Nov 21 13:18:54 2018 New Revision: 266342 URL: https://gcc.gnu.org/viewcvs?rev=266342=gcc=rev Log: x86: Add pmovzx/pmovsx patterns with memory operands Many x86 pmovzx/pmovsx instructions with memory operands are modeled in a wrong way. For example: (define_insn "sse4_1_v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (vec_select:V8QI (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]] should be defind for memory operands as: (define_insn "sse4_1_v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (match_operand:V8QI "memory_operand" "m,m,m")))] This patch updates them to (define_insn "sse4_1_v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (vec_select:V8QI (match_operand:V16QI 1 "register_operand" "Yr,*x,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]] (define_insn "*sse4_1_v8qiv8hi2_1" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (match_operand:V8QI "subreg_memory_operand" "m,m,m")))] with a splitter: (define_insn_and_split "*sse4_1_v8qiv8hi2_2" [(set (match_operand:V8HI 0 "register_operand") (any_extend:V8HI (vec_select:V8QI (subreg:V16QI (vec_concat:V2DI (match_operand:DI 1 "memory_operand") (const_int 0)) 0) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]] "TARGET_SSE4_1 && && "&& can_create_pseudo_p ()" "#" "&& 1" [(set (match_dup 0) (any_extend:V8HI (match_dup 1)))] "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);") This patch requires updating apply_subst_iterator to handle define_insn_and_split. gcc/ PR target/87317 * config/i386/sse.md (sse4_1_v8qiv8hi2): Replace nonimmediate_operand with register_operand. (avx2_v8qiv8si2): Likewise. (sse4_1_v4qiv4si2): Likewise. (sse4_1_v4hiv4si2): Likewise. (sse4_1_v2qiv2di2): Likewise. (avx512f_v8qiv8di2): Likewise. (avx2_v4qiv4di2): Likewise. (avx2_v4hiv4di2): Likewise. (sse4_1_v2hiv2di2): Likewise. (sse4_1_v2siv2di2): Likewise. (*sse4_1_v8qiv8hi2_1): New pattern. (*sse4_1_v8qiv8hi2_2): Likewise. (*avx2_v8qiv8si2_1): Likewise. (*avx2_v8qiv8si2_2): Likewise. (*sse4_1_v4qiv4si2_1): Likewise. (*sse4_1_v4qiv4si2_2): Likewise. (*sse4_1_v4hiv4si2_1): Likewise. (*sse4_1_v4hiv4si2_2): Likewise. (*avx512f_v8qiv8di2_1): Likewise. (*avx512f_v8qiv8di2_2): Likewise. (*avx2_v4qiv4di2_1): Likewise. (*avx2_v4qiv4di2_2): Likewise. (*avx2_v4hiv4di2_1): Likewise. (*avx2_v4hiv4di2_2): Likewise. (*sse4_1_v2hiv2di2_1): Likewise. (*sse4_1_v2hiv2di2_2): Likewise. (*sse4_1_v2siv2di2_1): Likewise. (*sse4_1_v2siv2di2_2): Likewise. gcc/testsuite/ PR target/87317 * gcc.target/i386/pr87317-1.c: New file. * gcc.target/i386/pr87317-2.c: Likewise. * gcc.target/i386/pr87317-3.c: Likewise. * gcc.target/i386/pr87317-4.c: Likewise. * gcc.target/i386/pr87317-5.c: Likewise. * gcc.target/i386/pr87317-6.c: Likewise. * gcc.target/i386/pr87317-7.c: Likewise. * gcc.target/i386/pr87317-8.c: Likewise. * gcc.target/i386/pr87317-9.c: Likewise. * gcc.target/i386/pr87317-10.c: Likewise. * gcc.target/i386/pr87317-11.c: Likewise. * gcc.target/i386/pr87317-12.c: Likewise. * gcc.target/i386/pr87317-13.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr87317-1.c trunk/gcc/testsuite/gcc.target/i386/pr87317-10.c trunk/gcc/testsuite/gcc.target/i386/pr87317-11.c trunk/gcc/testsuite/gcc.target/i386/pr87317-12.c trunk/gcc/testsuite/gcc.target/i386/pr87317-13.c trunk/gcc/testsuite/gcc.target/i386/pr87317-2.c trunk/gcc/testsuite/gcc.target/i386/pr87317-3.c
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #14 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Oct 22 07:35:48 2018 New Revision: 265370 URL: https://gcc.gnu.org/viewcvs?rev=265370=gcc=rev Log: i386: Enable AVX512 memory broadcast for INT andnot Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for INT andnot operations. gcc/ PR target/72782 * config/i386/sse.md (*andnot3_bcst): New. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-andn-di-zmm-1.c: New test. * gcc.target/i386/avx512f-andn-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-andn-si-zmm-3.c: Likewise. * gcc.target/i386/avx512f-andn-si-zmm-4.c: Likewise. * gcc.target/i386/avx512f-andn-si-zmm-5.c: Likewise. * gcc.target/i386/avx512vl-andn-si-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-andn-si-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-andn-si-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-andn-si-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #13 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Oct 22 07:29:03 2018 New Revision: 265369 URL: https://gcc.gnu.org/viewcvs?rev=265369=gcc=rev Log: i386: Enable AVX512 memory broadcast for INT logic Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for INT logic operations. gcc/ PR target/72782 * config/i386/sse.md (*3_bcst): New. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-and-di-zmm-1.c: New test. * gcc.target/i386/avx512f-and-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-and-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-and-si-zmm-3.c: Likewise. * gcc.target/i386/avx512f-and-si-zmm-4.c: Likewise. * gcc.target/i386/avx512f-and-si-zmm-5.c: Likewise. * gcc.target/i386/avx512f-and-si-zmm-6.c: Likewise. * gcc.target/i386/avx512f-or-di-zmm-1.c: Likewise. * gcc.target/i386/avx512f-or-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-or-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-or-si-zmm-3.c: Likewise. * gcc.target/i386/avx512f-or-si-zmm-4.c: Likewise. * gcc.target/i386/avx512f-or-si-zmm-5.c: Likewise. * gcc.target/i386/avx512f-or-si-zmm-6.c: Likewise. * gcc.target/i386/avx512f-xor-di-zmm-1.c: Likewise. * gcc.target/i386/avx512f-xor-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-xor-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-xor-si-zmm-3.c: Likewise. * gcc.target/i386/avx512f-xor-si-zmm-4.c: Likewise. * gcc.target/i386/avx512f-xor-si-zmm-5.c: Likewise. * gcc.target/i386/avx512f-xor-si-zmm-6.c: Likewise. * gcc.target/i386/avx512vl-and-si-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-and-si-ymm-1.c: Likewise. * gcc.target/i386/avx512vl-or-si-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-or-si-ymm-1.c: Likewise. * gcc.target/i386/avx512vl-xor-si-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-xor-si-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-and-di-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-and-si-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-and-si-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-and-si-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-and-si-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-and-si-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-and-si-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-di-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-si-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-si-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-si-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-si-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-si-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-or-si-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-di-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-si-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-si-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-si-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-si-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-si-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-xor-si-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-and-si-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-and-si-ymm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-or-si-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-or-si-ymm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-xor-si-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-xor-si-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #12 from hjl at gcc dot gnu.org --- Author: hjl Date: Mon Oct 22 07:25:51 2018 New Revision: 265368 URL: https://gcc.gnu.org/viewcvs?rev=265368=gcc=rev Log: i386: Enable AVX512 memory broadcast for INT add Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for INT add operations. gcc/ PR target/72782 * config/i386/sse.md (avx512bcst): Updated for V4SI, V2DI, V8SI, V4DI, V16SI and V8DI. (*sub3_bcst): New. (*add3_bcst): Likewise. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-add-di-zmm-1.c: New test. * gcc.target/i386/avx512f-add-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-add-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-add-si-zmm-3.c: Likewise. * gcc.target/i386/avx512f-add-si-zmm-4.c: Likewise. * gcc.target/i386/avx512f-add-si-zmm-5.c: Likewise. * gcc.target/i386/avx512f-add-si-zmm-6.c: Likewise. * gcc.target/i386/avx512f-sub-di-zmm-1.c: Likewise. * gcc.target/i386/avx512f-sub-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-sub-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-sub-si-zmm-3.c: Likewise. * gcc.target/i386/avx512f-sub-si-zmm-4.c: Likewise. * gcc.target/i386/avx512f-sub-si-zmm-5.c: Likewise. * gcc.target/i386/avx512vl-add-si-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-add-si-ymm-1.c: Likewise. * gcc.target/i386/avx512vl-sub-si-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-sub-si-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-add-di-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-si-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-di-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-si-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-si-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-si-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-si-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-si-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-add-si-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-add-si-ymm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-sub-si-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-sub-si-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #11 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 21 20:30:06 2018 New Revision: 265359 URL: https://gcc.gnu.org/viewcvs?rev=265359=gcc=rev Log: i386: Update AVX512 FMSUB/FNMADD/FNMSUB tests Update AVX512 tests to test the newly added FMSUB, FNMADD and FNMSUB builtin functions. PR target/72782 * gcc.target/i386/avx-1.c (__builtin_ia32_vfmsubpd512_mask): New. (__builtin_ia32_vfmsubpd512_maskz): Likewise. (__builtin_ia32_vfmsubps512_mask): Likewise. (__builtin_ia32_vfmsubps512_maskz): Likewise. (__builtin_ia32_vfnmaddpd512_mask3): Likewise. (__builtin_ia32_vfnmaddpd512_maskz): Likewise. (__builtin_ia32_vfnmaddps512_mask3): Likewise. (__builtin_ia32_vfnmaddps512_maskz): Likewise. (__builtin_ia32_vfnmsubpd512_maskz): Likewise. (__builtin_ia32_vfnmsubps512_maskz): Likewise. * testsuite/gcc.target/i386/sse-13.c (__builtin_ia32_vfmsubpd512_mask): Likewise. (__builtin_ia32_vfmsubpd512_maskz): Likewise. (__builtin_ia32_vfmsubps512_mask): Likewise. (__builtin_ia32_vfmsubps512_maskz): Likewise. (__builtin_ia32_vfnmaddpd512_mask3): Likewise. (__builtin_ia32_vfnmaddpd512_maskz): Likewise. (__builtin_ia32_vfnmaddps512_mask3): Likewise. (__builtin_ia32_vfnmaddps512_maskz): Likewise. (__builtin_ia32_vfnmsubpd512_maskz): Likewise. (__builtin_ia32_vfnmsubps512_maskz): Likewise. * testsuite/gcc.target/i386/sse-23.c (__builtin_ia32_vfmsubpd512_mask): Likewise. (__builtin_ia32_vfmsubpd512_maskz): Likewise. (__builtin_ia32_vfmsubps512_mask): Likewise. (__builtin_ia32_vfmsubps512_maskz): Likewise. (__builtin_ia32_vfnmaddpd512_mask3): Likewise. (__builtin_ia32_vfnmaddpd512_maskz): Likewise. (__builtin_ia32_vfnmaddps512_mask3): Likewise. (__builtin_ia32_vfnmaddps512_maskz): Likewise. (__builtin_ia32_vfnmsubpd512_maskz): Likewise. (__builtin_ia32_vfnmsubps512_maskz): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/gcc.target/i386/avx-1.c trunk/gcc/testsuite/gcc.target/i386/sse-13.c trunk/gcc/testsuite/gcc.target/i386/sse-23.c
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #10 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 21 20:28:56 2018 New Revision: 265358 URL: https://gcc.gnu.org/viewcvs?rev=265358=gcc=rev Log: i386: Enable AVX512 memory broadcast for FNMSUB Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FNMSUB operations. In order to support AVX512 memory broadcast for FNMSUB, FNMSUB builtin functions are also added, instead of passing the negated value to FMA builtin functions. gcc/ PR target/72782 * config/i386/avx512fintrin.h (_mm512_fnmsub_round_pd): Use __builtin_ia32_vfnmsubpd512_mask. (_mm512_mask_fnmsub_round_pd): Likewise. (_mm512_fnmsub_pd): Likewise. (_mm512_mask_fnmsub_pd): Likewise. (_mm512_maskz_fnmsub_round_pd): Use __builtin_ia32_vfnmsubpd512_maskz. (_mm512_maskz_fnmsub_pd): Likewise. (_mm512_fnmsub_round_ps): Use __builtin_ia32_vfnmsubps512_mask. (_mm512_mask_fnmsub_round_ps): Likewise. (_mm512_fnmsub_ps): Likewise. (_mm512_mask_fnmsub_ps): Likewise. (_mm512_maskz_fnmsub_round_ps): Use __builtin_ia32_vfnmsubps512_maskz. (_mm512_maskz_fnmsub_ps): Likewise. * config/i386/avx512vlintrin.h (_mm256_mask_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256_mask. (_mm256_maskz_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256_maskz. (_mm_mask_fnmsub_pd): Use __builtin_ia32_vfmaddpd128_mask (_mm_maskz_fnmsub_pd): Use __builtin_ia32_vfnmsubpd128_maskz. (_mm256_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_mask. (_mm256_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_mask. (_mm256_maskz_fnmsub_ps): Use __builtin_ia32_vfnmsubps256_maskz. (_mm_mask_fnmsub_ps): Use __builtin_ia32_vfnmsubps128_mask. (_mm_maskz_fnmsub_ps): Use __builtin_ia32_vfnmsubps128_maskz. * config/i386/fmaintrin.h (_mm_fnmsub_pd): Use __builtin_ia32_vfnmsubpd. (_mm256_fnmsub_pd): Use __builtin_ia32_vfnmsubpd256. (_mm_fnmsub_ps): Use __builtin_ia32_vfnmsubps. (_mm256_fnmsub_ps): Use __builtin_ia32_vfnmsubps256. (_mm_fnmsub_sd): Use __builtin_ia32_vfnmsubsd3. (_mm_fnmsub_ss): Use __builtin_ia32_vfnmsubss3. * config/i386/i386-builtin.def: Add __builtin_ia32_vfnmsubpd256_mask, __builtin_ia32_vfnmsubpd256_maskz, __builtin_ia32_vfnmsubpd128_mask, __builtin_ia32_vfnmsubpd128_maskz, __builtin_ia32_vfnmsubps256_mask, __builtin_ia32_vfnmsubps256_maskz, __builtin_ia32_vfnmsubps128_mask, __builtin_ia32_vfnmsubps128_maskz, __builtin_ia32_vfnmsubpd512_mask, __builtin_ia32_vfnmsubpd512_maskz, __builtin_ia32_vfnmsubps512_mask, __builtin_ia32_vfnmsubps512_maskz, __builtin_ia32_vfnmsubss3, __builtin_ia32_vfnmsubsd3, __builtin_ia32_vfnmsubps, __builtin_ia32_vfnmsubpd, __builtin_ia32_vfnmsubps256 and. __builtin_ia32_vfnmsubpd256. * config/i386/sse.md (fma4i_fnmsub_): New. (_fnmsub__maskz): Likewise. (*fma_fnmsub__bcst_1): Likewise. (*fma_fnmsub__bcst_2): Likewise. (*fma_fnmsub__bcst_3): Likewise. (fmai_vmfnmsub_): Likewise. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-fnmsub-df-zmm-1.c: New test. * gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Likewise. * gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c: Likewise. * gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-8.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-fnmsub-sf-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/avx512fintrin.h trunk/gcc/config/i386/avx512vlintrin.h trunk/gcc/config/i386/fmaintrin.h
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #9 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 21 20:27:09 2018 New Revision: 265357 URL: https://gcc.gnu.org/viewcvs?rev=265357=gcc=rev Log: i386: Enable AVX512 memory broadcast for FNMADD Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FNMADD operations. In order to support AVX512 memory broadcast for FNMADD, FNMADD builtin functions are also added, instead of passing the negated value to FMA builtin functions. gcc/ PR target/72782 * config/i386/avx512fintrin.h (_mm512_fnmadd_round_pd): Use __builtin_ia32_vfnmaddpd512_mask. (_mm512_mask_fnmadd_round_pd): Likewise. (_mm512_fnmadd_pd): Likewise. (_mm512_mask_fnmadd_pd): Likewise. (_mm512_maskz_fnmadd_round_pd): Use __builtin_ia32_vfnmaddpd512_maskz. (_mm512_maskz_fnmadd_pd): Likewise. (_mm512_fnmadd_round_ps): Use __builtin_ia32_vfnmaddps512_mask. (_mm512_mask_fnmadd_round_ps): Likewise. (_mm512_fnmadd_ps): Likewise. (_mm512_mask_fnmadd_ps): Likewise. (_mm512_maskz_fnmadd_round_ps): Use __builtin_ia32_vfnmaddps512_maskz. (_mm512_maskz_fnmadd_ps): Likewise. * config/i386/avx512vlintrin.h (_mm256_mask_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256_mask. (_mm256_maskz_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256_maskz. (_mm_mask_fnmadd_pd): Use __builtin_ia32_vfmaddpd128_mask (_mm_maskz_fnmadd_pd): Use __builtin_ia32_vfnmaddpd128_maskz. (_mm256_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_mask. (_mm256_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_mask. (_mm256_maskz_fnmadd_ps): Use __builtin_ia32_vfnmaddps256_maskz. (_mm_mask_fnmadd_ps): Use __builtin_ia32_vfnmaddps128_mask. (_mm_maskz_fnmadd_ps): Use __builtin_ia32_vfnmaddps128_maskz. * config/i386/fmaintrin.h (_mm_fnmadd_pd): Use __builtin_ia32_vfnmaddpd. (_mm256_fnmadd_pd): Use __builtin_ia32_vfnmaddpd256. (_mm_fnmadd_ps): Use __builtin_ia32_vfnmaddps. (_mm256_fnmadd_ps): Use __builtin_ia32_vfnmaddps256. (_mm_fnmadd_sd): Use __builtin_ia32_vfnmaddsd3. (_mm_fnmadd_ss): Use __builtin_ia32_vfnmaddss3. * config/i386/i386-builtin.def: Add __builtin_ia32_vfnmaddpd256_mask, __builtin_ia32_vfnmaddpd256_maskz, __builtin_ia32_vfnmaddpd128_mask, __builtin_ia32_vfnmaddpd128_maskz, __builtin_ia32_vfnmaddps256_mask, __builtin_ia32_vfnmaddps256_maskz, __builtin_ia32_vfnmaddps128_mask, __builtin_ia32_vfnmaddps128_maskz, __builtin_ia32_vfnmaddpd512_mask, __builtin_ia32_vfnmaddpd512_maskz, __builtin_ia32_vfnmaddps512_mask, __builtin_ia32_vfnmaddps512_maskz, __builtin_ia32_vfnmaddss3, __builtin_ia32_vfnmaddsd3, __builtin_ia32_vfnmaddps, __builtin_ia32_vfnmaddpd, __builtin_ia32_vfnmaddps256 and. __builtin_ia32_vfnmaddpd256. * config/i386/sse.md (fma4i_fnmadd_): New. (_fnmadd__maskz): Likewise. (*fma_fnmadd__bcst_1): Likewise. (*fma_fnmadd__bcst_2): Likewise. (*fma_fnmadd__bcst_3): Likewise. (fmai_vmfnmadd_): Likewise. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-fnmadd-df-zmm-1.c: New test. * gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Likewise. * gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c: Likewise. * gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-8.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-fnmadd-sf-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/avx512fintrin.h trunk/gcc/config/i386/avx512vlintrin.h trunk/gcc/config/i386/fmaintrin.h
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #8 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 21 20:24:50 2018 New Revision: 265356 URL: https://gcc.gnu.org/viewcvs?rev=265356=gcc=rev Log: Enable AVX512 memory broadcast for FMSUB Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FMSUB operations. In order to support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are also added, instead of passing the negated value to FMA builtin functions. gcc/ PR target/72782 * config/i386/avx512fintrin.h (_mm512_fmsub_round_pd): Use __builtin_ia32_vfmsubpd512_mask. (_mm512_mask_fmsub_round_pd): Likewise. (_mm512_fmsub_pd): Likewise. (_mm512_mask_fmsub_pd): Likewise. (_mm512_maskz_fmsub_round_pd): Use __builtin_ia32_vfmsubpd512_maskz. (_mm512_maskz_fmsub_pd): Likewise. (_mm512_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_mask. (_mm512_mask_fmsub_round_ps): Likewise. (_mm512_fmsub_ps): Likewise. (_mm512_mask_fmsub_ps): Likewise. (_mm512_maskz_fmsub_round_ps): Use __builtin_ia32_vfmsubps512_maskz. (_mm512_maskz_fmsub_ps): Likewise. * config/i386/avx512vlintrin.h (_mm256_mask_fmsub_pd): Use __builtin_ia32_vfmsubpd256_mask. (_mm256_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd256_maskz. (_mm_mask_fmsub_pd): Use __builtin_ia32_vfmaddpd128_mask (_mm_maskz_fmsub_pd): Use __builtin_ia32_vfmsubpd128_maskz. (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask. (_mm256_mask_fmsub_ps): Use __builtin_ia32_vfmsubps256_mask. (_mm256_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps256_maskz. (_mm_mask_fmsub_ps): Use __builtin_ia32_vfmsubps128_mask. (_mm_maskz_fmsub_ps): Use __builtin_ia32_vfmsubps128_maskz. * config/i386/fmaintrin.h (_mm_fmsub_pd): Use __builtin_ia32_vfmsubpd. (_mm256_fmsub_pd): Use __builtin_ia32_vfmsubpd256. (_mm_fmsub_ps): Use __builtin_ia32_vfmsubps. (_mm256_fmsub_ps): Use __builtin_ia32_vfmsubps256. (_mm_fmsub_sd): Use __builtin_ia32_vfmsubsd3. (_mm_fmsub_ss): Use __builtin_ia32_vfmsubss3. * config/i386/i386-builtin.def: Add __builtin_ia32_vfmsubpd256_mask, __builtin_ia32_vfmsubpd256_maskz, __builtin_ia32_vfmsubpd128_mask, __builtin_ia32_vfmsubpd128_maskz, __builtin_ia32_vfmsubps256_mask, __builtin_ia32_vfmsubps256_maskz, __builtin_ia32_vfmsubps128_mask, __builtin_ia32_vfmsubps128_maskz, __builtin_ia32_vfmsubpd512_mask, __builtin_ia32_vfmsubpd512_maskz, __builtin_ia32_vfmsubps512_mask, __builtin_ia32_vfmsubps512_maskz, __builtin_ia32_vfmsubss3, __builtin_ia32_vfmsubsd3, __builtin_ia32_vfmsubps, __builtin_ia32_vfmsubpd, __builtin_ia32_vfmsubps256 and. __builtin_ia32_vfmsubpd256. * config/i386/sse.md (fma4i_fmsub_): New. (_fmsub__maskz): Likewise. (*fma_fmsub__bcst_1): Likewise. (*fma_fmsub__bcst_2): Likewise. (*fma_fmsub__bcst_3): Likewise. (fmai_vmfmsub_): Likewise. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-fmsub-df-zmm-1.c: New test. * gcc.target/i386/avx512f-fmsub-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-6.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Likewise. * gcc.target/i386/avx512f-fmsub-sf-zmm-8.c: Likewise. * gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c trunk/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-8.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-fmsub-sf-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/avx512fintrin.h trunk/gcc/config/i386/avx512vlintrin.h trunk/gcc/config/i386/fmaintrin.h trunk/gcc/config/i386/i386-builtin.def trunk/gcc/config/i386/sse.md trunk/gcc/testsuite
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 21 10:35:36 2018 New Revision: 265351 URL: https://gcc.gnu.org/viewcvs?rev=265351=gcc=rev Log: i386: Enable AVX512 memory broadcast for FP mul Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FP mul operations. gcc/ PR target/72782 * config/i386/sse.md (*mul3_bcst): New. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-mul-df-zmm-1.c: New test. * gcc.target/i386/avx512f-mul-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-mul-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-mul-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-mul-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-mul-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512f-mul-sf-zmm-6.c: Likewise. * gcc.target/i386/avx512vl-mul-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-mul-sf-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-mul-sf-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-mul-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-mul-sf-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug target/87662] Missing AVX512VL intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87662 --- Comment #1 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Oct 21 10:23:58 2018 New Revision: 265350 URL: https://gcc.gnu.org/viewcvs?rev=265350=gcc=rev Log: i386: Add missing AVX512VL or/xor intrinsics gcc/ PR target/87662 * i386/avx512vlintrin.h (_mm256_or_epi32): New. (_mm_or_epi32): Likewise. (_mm256_xor_epi32): Likewise. (_mm_xor_epi32): Likewise. (_mm256_or_epi64): Likewise. (_mm_or_epi64): Likewise. (_mm256_xor_epi64): Likewise. (_mm_xor_epi64): Likewise. gcc/testsuite/ PR target/87662 * gcc.target/i386/pr87662.c Added: trunk/gcc/testsuite/gcc.target/i386/pr87662.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/avx512vlintrin.h trunk/gcc/testsuite/ChangeLog
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Sat Oct 20 20:41:10 2018 New Revision: 265345 URL: https://gcc.gnu.org/viewcvs?rev=265345=gcc=rev Log: i386: Enable AVX512 memory broadcast for FP div Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FP div operations. gcc/ PR target/72782 * config/i386/sse.md (*_div3_bcst): New. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512f-div-df-zmm-1.c: New test. * gcc.target/i386/avx512f-div-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-div-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-div-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-div-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-div-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512vl-div-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-div-sf-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512f-div-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-div-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-div-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-div-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-div-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-div-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-div-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-div-sf-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #5 from hjl at gcc dot gnu.org --- Author: hjl Date: Fri Oct 19 09:13:34 2018 New Revision: 265311 URL: https://gcc.gnu.org/viewcvs?rev=265311=gcc=rev Log: i386: Enable AVX512 memory broadcast for FP add Many AVX512 vector operations can broadcast from a scalar memory source. This patch enables memory broadcast for FP add operations. gcc/ PR target/72782 * config/i386/sse.md (*3_bcst_1): New. (*add3_bcst_2): Likewise. gcc/testsuite/ PR target/72782 * gcc.target/i386/avx512-binop-1.h: New file. * gcc.target/i386/avx512-binop-2.h: Likewise. * gcc.target/i386/avx512-binop-3.h: Likewise. * gcc.target/i386/avx512-binop-4.h: Likewise. * gcc.target/i386/avx512-binop-5.h: Likewise. * gcc.target/i386/avx512-binop-6.h: Likewise. * gcc.target/i386/avx512f-add-df-zmm-1.c: Likewise. * gcc.target/i386/avx512f-add-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-add-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-add-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-add-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-add-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512f-add-sf-zmm-6.c: Likewise. * gcc.target/i386/avx512f-sub-df-zmm-1.c: Likewise. * gcc.target/i386/avx512f-sub-sf-zmm-1.c: Likewise. * gcc.target/i386/avx512f-sub-sf-zmm-2.c: Likewise. * gcc.target/i386/avx512f-sub-sf-zmm-3.c: Likewise. * gcc.target/i386/avx512f-sub-sf-zmm-4.c: Likewise. * gcc.target/i386/avx512f-sub-sf-zmm-5.c: Likewise. * gcc.target/i386/avx512vl-add-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-add-sf-ymm-1.c: Likewise. * gcc.target/i386/avx512vl-sub-sf-xmm-1.c: Likewise. * gcc.target/i386/avx512vl-sub-sf-ymm-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/avx512-binop-1.h trunk/gcc/testsuite/gcc.target/i386/avx512-binop-2.h trunk/gcc/testsuite/gcc.target/i386/avx512-binop-3.h trunk/gcc/testsuite/gcc.target/i386/avx512-binop-4.h trunk/gcc/testsuite/gcc.target/i386/avx512-binop-5.h trunk/gcc/testsuite/gcc.target/i386/avx512-binop-6.h trunk/gcc/testsuite/gcc.target/i386/avx512f-add-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-6.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-df-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-2.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-3.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-4.c trunk/gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-5.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-sub-sf-xmm-1.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-sub-sf-ymm-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog