Resending in plain text:

On 11 July 2011 23:50, Michael Zolotukhin
<michael.v.zolotuk...@gmail.com> wrote:
>
> The attached patch enables use of vector instructions in memmov/memset 
> expanding.
>
> New algorithm for move-mode selection is implemented for move_by_pieces, 
> store_by_pieces.
> x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in 
> similar way, x86 cost-models parameters are slightly changed to support this. 
> This implementation checks if array's alignment is known at compile time and 
> chooses expanding algorithm and move-mode according to it.
>
> Bootstrapped, two new fails due to incorrect tests (see 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503). New implementation gives 
> quite big performance gain on memset/memcpy in some cases.
>
> A bunch of new tests are added to verify the implementation.
>
> Is it ok for trunk?
>
> Changelog:
>
> 2011-07-11  Zolotukhin Michael  <michael.v.zolotuk...@intel.com>
>
>     * config/i386/i386.h (processor_costs): Add second dimension to
>     stringop_algs array.
>     (clear_ratio): Tune value to improve performance.
>     * config/i386/i386.c (cost models): Initialize second dimension of
>     stringop_algs arrays.  Tune cost model in atom_cost, generic32_cost
>     and generic64_cost.
>     (ix86_expand_move): Add support for vector moves, that use half of
>     vector register.
>     (expand_set_or_movmem_via_loop_with_iter): New function.
>     (expand_set_or_movmem_via_loop): Enable reuse of the same iters in
>     different loops, produced by this function.
>     (emit_strset): New function.
>     (promote_duplicated_reg): Add support for vector modes, add
>     declaration.
>     (promote_duplicated_reg_to_size): Likewise.
>     (expand_movmem_epilogue): Add epilogue generation for bigger sizes.
>     (expand_setmem_epilogue): Likewise.
>     (expand_movmem_prologue): Likewise for prologue.
>     (expand_setmem_prologue): Likewise.
>     (expand_constant_movmem_prologue): Likewise.
>     (expand_constant_setmem_prologue): Likewise.
>     (decide_alg): Add new argument align_unknown.  Fix algorithm of
>     strategy selection if TARGET_INLINE_ALL_STRINGOPS is set.
>     (decide_alignment): Update desired alignment according to chosen move
>     mode.
>     (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves.
>     (ix86_expand_setmem): Likewise.
>     (ix86_slow_unaligned_access): Implementation of new hook
>     slow_unaligned_access.
>     (ix86_promote_rtx_for_memset): Implementation of new hook
>     promote_rtx_for_memset.
>     * config/i386/sse.md (sse2_loadq): Add expand for sse2_loadq.
>     (vec_dupv4si): Add expand for vec_dupv4si.
>     (vec_dupv2di): Add expand for vec_dupv2di.
>     * emit-rtl.c (adjust_address_1): Improve algorithm for determining
>     alignment of address+offset.
>     (get_mem_align_offset): Add handling of MEM_REFs.
>     * expr.c (compute_align_by_offset): New function.
>     (move_by_pieces_insn): New function.
>     (widest_mode_for_unaligned_mov): New function.
>     (widest_mode_for_aligned_mov): New function.
>     (widest_int_mode_for_size): Change type of size from int to
>     HOST_WIDE_INT.
>     (set_by_pieces_1): New function (new algorithm of memset expanding).
>     (set_by_pieces_2): New function.
>     (generate_move_with_mode): New function for set_by_pieces.
>     (alignment_for_piecewise_move): Use hook slow_unaligned_access instead
>     of macros SLOW_UNALIGNED_ACCESS.
>     (emit_group_load_1): Likewise.
>     (emit_group_store): Likewise.
>     (emit_push_insn): Likewise.
>     (store_field): Likewise.
>     (expand_expr_real_1): Likewise.
>     (compute_aligned_cost): New function.
>     (compute_unaligned_cost): New function.
>     (vector_mode_for_mode): New function.
>     (vector_extensions_used_for_mode): New function.
>     (move_by_pieces): New algorithm of memmove expanding.
>     (move_by_pieces_ninsns): Update according to changes in
>     move_by_pieces.
>     (move_by_pieces_1): Remove as unused.
>     (store_by_pieces): New algorithm for memset expanding.
>     (clear_by_pieces): Likewise.
>     (store_by_pieces_1): Remove incorrect parameters' attributes.
>     * expr.h (compute_align_by_offset): Add declaration.
>     * rtl.h (vector_extensions_used_for_mode): Add declaration.
>     * builtins.c (expand_builtin_memset_args): Update according to changes
>     in set_by_pieces.
>     * target.def (DEFHOOK): Add hook slow_unaligned_access and
>     promote_rtx_for_memset.
>     * targhooks.c (default_slow_unaligned_access): Add default hook
>     implementation.
>     (default_promote_rtx_for_memset): Likewise.
>     * targhooks.h (default_slow_unaligned_access): Add prototype.
>     (default_promote_rtx_for_memset): Likewise.
>     * cse.c (cse_insn): Stop forward propagation of vector constants.
>     * fwprop.c (forward_propagate_and_simplify): Likewise.
>     * doc/tm.texi (SLOW_UNALIGNED_ACCESS): Remove documentation for deleted
>     macro SLOW_UNALIGNED_ACCESS.
>     (TARGET_SLOW_UNALIGNED_ACCESS): Add documentation on new hook.
>     (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise.
>     * doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Likewise.
>     (TARGET_SLOW_UNALIGNED_ACCESS): Likewise.
>     (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise.
>
> 2011-07-11  Zolotukhin Michael  <michael.v.zolotuk...@intel.com>
>
>     * testsuite/gcc.target/i386/memset-s64-a0-1.c: New testcase.
>     * testsuite/gcc.target/i386/memset-s64-a0-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s16-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s16-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a0-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-au-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-au-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a0-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a0-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-au-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-au-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s3072-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s3072-a1-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s3072-au-1.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s3072-au-1.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-5.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s16-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s16-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-6.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a0-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-au-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-au-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a0-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a0-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-au-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-au-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s3072-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s3072-a1-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s3072-au-2.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s3072-au-2.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-7.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-8.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-5.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-6.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s16-a1-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s16-a1-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-9.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a0-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a1-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a1-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-au-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-au-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a0-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a0-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a1-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a1-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-au-3.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-au-3.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-10.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-11.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-7.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s768-a0-8.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s16-a1-4.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s16-a1-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a0-12.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a0-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-a1-4.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-a1-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s64-au-4.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s64-au-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a0-4.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a0-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-a1-4.c: Ditto.
>     * testsuite/gcc.target/i386/memcpy-s512-a1-4.c: Ditto.
>     * testsuite/gcc.target/i386/memset-s512-au-4.c: Ditto.

Reply via email to