Resending in plain text:
On 11 July 2011 23:50, Michael Zolotukhin <michael.v.zolotuk...@gmail.com> wrote: > > The attached patch enables use of vector instructions in memmov/memset > expanding. > > New algorithm for move-mode selection is implemented for move_by_pieces, > store_by_pieces. > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in > similar way, x86 cost-models parameters are slightly changed to support this. > This implementation checks if array's alignment is known at compile time and > chooses expanding algorithm and move-mode according to it. > > Bootstrapped, two new fails due to incorrect tests (see > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503). New implementation gives > quite big performance gain on memset/memcpy in some cases. > > A bunch of new tests are added to verify the implementation. > > Is it ok for trunk? > > Changelog: > > 2011-07-11 Zolotukhin Michael <michael.v.zolotuk...@intel.com> > > * config/i386/i386.h (processor_costs): Add second dimension to > stringop_algs array. > (clear_ratio): Tune value to improve performance. > * config/i386/i386.c (cost models): Initialize second dimension of > stringop_algs arrays. Tune cost model in atom_cost, generic32_cost > and generic64_cost. > (ix86_expand_move): Add support for vector moves, that use half of > vector register. > (expand_set_or_movmem_via_loop_with_iter): New function. > (expand_set_or_movmem_via_loop): Enable reuse of the same iters in > different loops, produced by this function. > (emit_strset): New function. > (promote_duplicated_reg): Add support for vector modes, add > declaration. > (promote_duplicated_reg_to_size): Likewise. > (expand_movmem_epilogue): Add epilogue generation for bigger sizes. > (expand_setmem_epilogue): Likewise. > (expand_movmem_prologue): Likewise for prologue. > (expand_setmem_prologue): Likewise. > (expand_constant_movmem_prologue): Likewise. > (expand_constant_setmem_prologue): Likewise. > (decide_alg): Add new argument align_unknown. Fix algorithm of > strategy selection if TARGET_INLINE_ALL_STRINGOPS is set. > (decide_alignment): Update desired alignment according to chosen move > mode. > (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves. > (ix86_expand_setmem): Likewise. > (ix86_slow_unaligned_access): Implementation of new hook > slow_unaligned_access. > (ix86_promote_rtx_for_memset): Implementation of new hook > promote_rtx_for_memset. > * config/i386/sse.md (sse2_loadq): Add expand for sse2_loadq. > (vec_dupv4si): Add expand for vec_dupv4si. > (vec_dupv2di): Add expand for vec_dupv2di. > * emit-rtl.c (adjust_address_1): Improve algorithm for determining > alignment of address+offset. > (get_mem_align_offset): Add handling of MEM_REFs. > * expr.c (compute_align_by_offset): New function. > (move_by_pieces_insn): New function. > (widest_mode_for_unaligned_mov): New function. > (widest_mode_for_aligned_mov): New function. > (widest_int_mode_for_size): Change type of size from int to > HOST_WIDE_INT. > (set_by_pieces_1): New function (new algorithm of memset expanding). > (set_by_pieces_2): New function. > (generate_move_with_mode): New function for set_by_pieces. > (alignment_for_piecewise_move): Use hook slow_unaligned_access instead > of macros SLOW_UNALIGNED_ACCESS. > (emit_group_load_1): Likewise. > (emit_group_store): Likewise. > (emit_push_insn): Likewise. > (store_field): Likewise. > (expand_expr_real_1): Likewise. > (compute_aligned_cost): New function. > (compute_unaligned_cost): New function. > (vector_mode_for_mode): New function. > (vector_extensions_used_for_mode): New function. > (move_by_pieces): New algorithm of memmove expanding. > (move_by_pieces_ninsns): Update according to changes in > move_by_pieces. > (move_by_pieces_1): Remove as unused. > (store_by_pieces): New algorithm for memset expanding. > (clear_by_pieces): Likewise. > (store_by_pieces_1): Remove incorrect parameters' attributes. > * expr.h (compute_align_by_offset): Add declaration. > * rtl.h (vector_extensions_used_for_mode): Add declaration. > * builtins.c (expand_builtin_memset_args): Update according to changes > in set_by_pieces. > * target.def (DEFHOOK): Add hook slow_unaligned_access and > promote_rtx_for_memset. > * targhooks.c (default_slow_unaligned_access): Add default hook > implementation. > (default_promote_rtx_for_memset): Likewise. > * targhooks.h (default_slow_unaligned_access): Add prototype. > (default_promote_rtx_for_memset): Likewise. > * cse.c (cse_insn): Stop forward propagation of vector constants. > * fwprop.c (forward_propagate_and_simplify): Likewise. > * doc/tm.texi (SLOW_UNALIGNED_ACCESS): Remove documentation for deleted > macro SLOW_UNALIGNED_ACCESS. > (TARGET_SLOW_UNALIGNED_ACCESS): Add documentation on new hook. > (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise. > * doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Likewise. > (TARGET_SLOW_UNALIGNED_ACCESS): Likewise. > (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise. > > 2011-07-11 Zolotukhin Michael <michael.v.zolotuk...@intel.com> > > * testsuite/gcc.target/i386/memset-s64-a0-1.c: New testcase. > * testsuite/gcc.target/i386/memset-s64-a0-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s16-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s16-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a0-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-au-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-au-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a0-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a0-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-au-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-au-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s3072-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s3072-a1-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s3072-au-1.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s3072-au-1.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-5.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s16-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s16-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-6.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a0-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-au-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-au-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a0-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a0-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-au-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-au-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s3072-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s3072-a1-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s3072-au-2.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s3072-au-2.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-7.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-8.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-5.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-6.c: Ditto. > * testsuite/gcc.target/i386/memset-s16-a1-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s16-a1-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-9.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a0-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a1-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a1-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-au-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-au-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a0-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a0-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a1-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a1-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-au-3.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-au-3.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-10.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-11.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-7.c: Ditto. > * testsuite/gcc.target/i386/memset-s768-a0-8.c: Ditto. > * testsuite/gcc.target/i386/memset-s16-a1-4.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s16-a1-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a0-12.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a0-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-a1-4.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-a1-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s64-au-4.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s64-au-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a0-4.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a0-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-a1-4.c: Ditto. > * testsuite/gcc.target/i386/memcpy-s512-a1-4.c: Ditto. > * testsuite/gcc.target/i386/memset-s512-au-4.c: Ditto.