On Tue, Jan 28, 2020 at 6:51 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Tue, Jan 28, 2020 at 9:12 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Tue, Jan 28, 2020 at 4:34 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > You could move > > > > > > > > (match_test "TARGET_AVX") > > > > (const_string "TI") > > > > > > > > up to bypass the cases below. > > > > > > > > > > I don't think we can do that. There are 2 cases where we prefer > > > movaps/movups: > > > > > > /* Use packed single precision instructions where posisble. I.e. > > > movups instead of movupd. */ > > > DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, > > > "sse_packed_single_insn_optimal", > > > m_BDVER | m_ZNVER) > > > > > > /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores. > > > */ > > > DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores", > > > m_AMD_MULTIPLE | m_CORE_ALL | m_GENERIC) > > > > > > We should always use movaps/movups for > > > TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL. > > > It is wrong to bypass TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL with > > > TARGET_AVX > > > as m_BDVER | m_ZNVER support AVX. > > > > The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is > > only insn size, as advised in e.g. Software Optimization Guide for the > > AMD Family 15h Processors [1], section 7.1.2, where it is said: > > > > --quote-- > > 7.1.2 Reduce Instruction SizeOptimization > > > > Reduce the size of instructions when possible. > > > > Rationale > > > > Using smaller instruction sizes improves instruction fetch throughput. > > Specific examples include the following: > > > > *In SIMD code, use the single-precision (PS) form of instructions > > instead of the double-precision (PD) form. For example, for register > > to register moves, MOVAPS achieves the same result as MOVAPD, but uses > > one less byte to encode the instruction and has no prefix byte. Other > > examples in which single-precision forms can be substituted for > > double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS, > > and SHUFPS. > > ... > > --/quote-- > > > > Please note that this optimization applies only to non-AVX forms, as > > demonstrated by: > > > > 0: 0f 28 c8 movaps %xmm0,%xmm1 > > 3: 66 0f 28 c8 movapd %xmm0,%xmm1 > > 7: c5 f8 28 d1 vmovaps %xmm1,%xmm2 > > b: c5 f9 28 d1 vmovapd %xmm1,%xmm2 > > > > Also note that MOVDQA is missing in the above optimization. It is > > harmful to substitute MOVDQA with MOVAPS, as it can (and does) > > introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT > > (VALU) FP clusters. > > > > Following the above optimization, it is obvious that > > TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling was cargo-culted from > > one pattern to another. Its use should be reviewed and fixed where not > > appropriate. > > > > [1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf > > > > Uros. > > Here is the updated patch which moves TARGET_AVX before > TARGET_SSE_TYPELESS_STORES. OK for master if there is > no regression? > > Thanks.
+ (match_test "TARGET_AVX") + (const_string "<sseinsnmode>") (and (match_test "<MODE_SIZE> == 16") Only MODE_SIZE == 16 cases will be left here, since TARGET_AVX is necessary for MODE_SIZE > 16. This test can be removed. OK with the above change. Thanks, Uros. > -- > H.J.