> >>> 2013-08-02 Xinliang David Li <davi...@google.com> > >>> > >>> * config/i386/stringop.def: New file. > >>> * config/i386/stringop.opt: New file. > >>> * config/i386/i386-opts.h: Include stringopt.def. > >>> * config/i386/i386.opt: Include stringopt.opt. > >>> * config/i386/i386.c (ix86_option_override_internal): > >>> Override default size based stringop inline strategies > >>> with options. > >>> * config/i386/i386.c (ix86_parse_stringop_strategy_string): > >>> New function. > >>> > >>> 2013-08-04 Xinliang David Li <davi...@google.com> > >>> > >>> * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test. > >>> * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto. > >>> * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto. > >>> * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
The patch looks resonable to me in general. I wonder why we need to bring all the cost tables non-const instead of just having writable storage for the "current strategy" like we do with other flags anyway. Your strings are definitely more readable than the in-memory representation I came up with. Perhaps we can even turn the cost tables into strings for easier maintenance? I guess they are bit confusing for people not familiar with a code. Honza > >>> > >>> > >>> > >>> > >>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davi...@google.com> > >>> wrote: > >>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with > >>> > FDO), libcall strategy is used with the size is > 8192. This value is > >>> > hard coded, which makes it hard to do performance tuning. This patch > >>> > adds two new parameters to do that. Potential usage includes > >>> > per-application libcall strategy min-size tuning based on summary data > >>> > with FDO (e.g, instruction workset size). > >>> > > >>> > Bootstrap and tested on x86_64/linux. Ok for trunk? > >>> > > >>> > thanks, > >>> > > >>> > David > >>> > > >>> > > >>> > 2013-08-02 Xinliang David Li <davi...@google.com> > >>> > > >>> > * params.def: New parameters. > >>> > * config/i386/i386.c (ix86_option_override_internal): > >>> > Override default libcall size limit with parameters. > >> > >>> Index: config/i386/stringop.def > >>> =================================================================== > >>> --- config/i386/stringop.def (revision 0) > >>> +++ config/i386/stringop.def (revision 0) > >>> @@ -0,0 +1,42 @@ > >>> +/* Definitions for option handling for IA-32. > >>> + Copyright (C) 2013 Free Software Foundation, Inc. > >>> + > >>> +This file is part of GCC. > >>> + > >>> +GCC is free software; you can redistribute it and/or modify > >>> +it under the terms of the GNU General Public License as published by > >>> +the Free Software Foundation; either version 3, or (at your option) > >>> +any later version. > >>> + > >>> +GCC is distributed in the hope that it will be useful, > >>> +but WITHOUT ANY WARRANTY; without even the implied warranty of > >>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >>> +GNU General Public License for more details. > >>> + > >>> +Under Section 7 of GPL version 3, you are granted additional > >>> +permissions described in the GCC Runtime Library Exception, version > >>> +3.1, as published by the Free Software Foundation. > >>> + > >>> +You should have received a copy of the GNU General Public License and > >>> +a copy of the GCC Runtime Library Exception along with this program; > >>> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > >>> +<http://www.gnu.org/licenses/>. */ > >>> + > >>> +DEF_ENUM > >>> +DEF_ALG (no_stringop, no_stringop) > >>> +DEF_ENUM > >>> +DEF_ALG (libcall, libcall) > >>> +DEF_ENUM > >>> +DEF_ALG (rep_prefix_1_byte, rep_byte) > >>> +DEF_ENUM > >>> +DEF_ALG (rep_prefix_4_byte, rep_4byte) > >>> +DEF_ENUM > >>> +DEF_ALG (rep_prefix_8_byte, rep_8byte) > >>> +DEF_ENUM > >>> +DEF_ALG (loop_1_byte, byte_loop) > >>> +DEF_ENUM > >>> +DEF_ALG (loop, loop) > >>> +DEF_ENUM > >>> +DEF_ALG (unrolled_loop, unrolled_loop) > >>> +DEF_ENUM > >>> +DEF_ALG (vector_loop, vector_loop) > >>> Index: config/i386/i386.opt > >>> =================================================================== > >>> --- config/i386/i386.opt (revision 201458) > >>> +++ config/i386/i386.opt (working copy) > >>> @@ -316,6 +316,14 @@ mstack-arg-probe > >>> Target Report Mask(STACK_PROBE) Save > >>> Enable stack probing > >>> > >>> +mmemcpy-strategy= > >>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy) > >>> +Specify memcpy expansion strategy when expected size is known > >>> + > >>> +mmemset-strategy= > >>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy) > >>> +Specify memset expansion strategy when expected size is known > >>> + > >>> mstringop-strategy= > >>> Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) > >>> Init(no_stringop) > >>> Chose strategy to generate stringop using > >>> Index: config/i386/stringop.opt > >>> =================================================================== > >>> --- config/i386/stringop.opt (revision 0) > >>> +++ config/i386/stringop.opt (revision 0) > >>> @@ -0,0 +1,36 @@ > >>> +/* Definitions for option handling for IA-32. > >>> + Copyright (C) 2013 Free Software Foundation, Inc. > >>> + > >>> +This file is part of GCC. > >>> + > >>> +GCC is free software; you can redistribute it and/or modify > >>> +it under the terms of the GNU General Public License as published by > >>> +the Free Software Foundation; either version 3, or (at your option) > >>> +any later version. > >>> + > >>> +GCC is distributed in the hope that it will be useful, > >>> +but WITHOUT ANY WARRANTY; without even the implied warranty of > >>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >>> +GNU General Public License for more details. > >>> + > >>> +Under Section 7 of GPL version 3, you are granted additional > >>> +permissions described in the GCC Runtime Library Exception, version > >>> +3.1, as published by the Free Software Foundation. > >>> + > >>> +You should have received a copy of the GNU General Public License and > >>> +a copy of the GCC Runtime Library Exception along with this program; > >>> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > >>> +<http://www.gnu.org/licenses/>. */ > >>> + > >>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte) > >>> + > >>> +#undef DEF_ENUM > >>> +#define DEF_ENUM EnumValue > >>> + > >>> +#undef DEF_ALG > >>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg) > >>> + > >>> +#include "stringop.def" > >>> + > >>> +#undef DEF_ENUM > >>> +#undef DEF_ALG > >>> Index: config/i386/i386.c > >>> =================================================================== > >>> --- config/i386/i386.c (revision 201458) > >>> +++ config/i386/i386.c (working copy) > >>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost = > >>> }; > >>> > >>> /* Processor costs (relative to an add) */ > >>> -static const > >>> +static > >>> struct processor_costs i386_cost = { /* 386 specific costs */ > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1), /* cost of a lea instruction */ > >>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = { /* > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs i486_cost = { /* 486 specific costs */ > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1), /* cost of a lea instruction */ > >>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = { /* > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs pentium_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1), /* cost of a lea instruction */ > >>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs pentiumpro_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1), /* cost of a lea instruction */ > >>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost = > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs geode_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1), /* cost of a lea instruction */ > >>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs k6_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (2), /* cost of a lea instruction */ > >>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs athlon_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (2), /* cost of a lea instruction */ > >>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs k8_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (2), /* cost of a lea instruction */ > >>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs pentium4_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (3), /* cost of a lea instruction */ > >>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs nocona_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1), /* cost of a lea instruction */ > >>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = { > >>> 1, /* cond_not_taken_branch_cost. */ > >>> }; > >>> > >>> -static const > >>> +static > >>> struct processor_costs atom_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ > >>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = { > >>> }; > >>> > >>> /* Generic64 should produce code tuned for Nocona and K8. */ > >>> -static const > >>> +static > >>> struct processor_costs generic64_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> /* On all chips taken into consideration lea is 2 cycles and more. > >>> With > >>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost = > >>> }; > >>> > >>> /* core_cost should produce code tuned for Core familly of CPUs. */ > >>> -static const > >>> +static > >>> struct processor_costs core_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> /* On all chips taken into consideration lea is 2 cycles and more. > >>> With > >>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = { > >>> > >>> /* Generic32 should produce code tuned for PPro, Pentium4, Nocona, > >>> Athlon and K8. */ > >>> -static const > >>> +static > >>> struct processor_costs generic32_cost = { > >>> COSTS_N_INSNS (1), /* cost of an add instruction */ > >>> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ > >>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void) > >>> > >>> return; > >>> } > >>> + > >>> +static const char *stringop_alg_names[] = { > >>> +#define DEF_ENUM > >>> +#define DEF_ALG(alg, name) #name, > >>> +#include "stringop.def" > >>> +#undef DEF_ENUM > >>> +#undef DEF_ALG > >>> +}; > >>> + > >>> +/* Parse parameter string passed to -mmemcpy-strategy= or > >>> -mmemset-strategy=. > >>> + The string is of the following form (or comma separated list of it): > >>> + > >>> + strategy_alg:max_size:[align|noalign] > >>> + > >>> + where the full size range for the strategy is either [0, max_size] or > >>> + [min_size, max_size], in which min_size is the max_size + 1 of the > >>> + preceding range. The last size range must have max_size == -1. > >>> + > >>> + Examples: > >>> + > >>> + 1. > >>> + -mmemcpy-strategy=libcall:-1:noalign > >>> + > >>> + this is equivalent to (for known size memcpy) > >>> -mstringop-strategy=libcall > >>> + > >>> + > >>> + 2. > >>> + > >>> -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign > >>> + > >>> + This is to tell the compiler to use the following strategy for > >>> memset > >>> + 1) when the expected size is between [1, 16], use rep_8byte > >>> strategy; > >>> + 2) when the size is between [17, 2048], use vector_loop; > >>> + 3) when the size is > 2048, use libcall. > >>> + > >>> +*/ > >>> + > >>> +struct stringop_size_range > >>> +{ > >>> + int min; > >>> + int max; > >>> + stringop_alg alg; > >>> + bool noalign; > >>> +}; > >>> + > >>> +static void > >>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset) > >>> +{ > >>> + const struct stringop_algs *default_algs; > >>> + stringop_size_range input_ranges[MAX_STRINGOP_ALGS]; > >>> + char *curr_range_str, *next_range_str; > >>> + int i = 0, n = 0; > >>> + > >>> + if (is_memset) > >>> + default_algs = &ix86_cost->memset[TARGET_64BIT != 0]; > >>> + else > >>> + default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0]; > >>> + > >>> + curr_range_str = strategy_str; > >>> + > >>> + do { > >>> + > >>> + int mins, maxs; > >>> + stringop_alg alg; > >>> + char alg_name[128]; > >>> + char align[16]; > >>> + > >>> + next_range_str = strchr (curr_range_str, ','); > >>> + if (next_range_str) > >>> + *next_range_str++ = '\0'; > >>> + > >>> + if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, > >>> align)) > >>> + { > >>> + warning (0, "Wrong arg %s to option %s", curr_range_str, > >>> + is_memset ? "-mmemset_strategy=" : > >>> "-mmemcpy_strategy="); > >>> + return; > >>> + } > >>> + > >>> + if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != > >>> -1)) > >>> + { > >>> + warning (0, "Size ranges of option %s should be increasing", > >>> + is_memset ? "-mmemset_strategy=" : > >>> "-mmemcpy_strategy="); > >>> + return; > >>> + } > >>> + > >>> + for (i = 0; i < last_alg; i++) > >>> + { > >>> + if (!strcmp (alg_name, stringop_alg_names[i])) > >>> + { > >>> + alg = (stringop_alg) i; > >>> + break; > >>> + } > >>> + } > >>> + > >>> + if (i == last_alg) > >>> + { > >>> + warning (0, "Wrong stringop strategy name %s specified for > >>> option %s", > >>> + alg_name, > >>> + is_memset ? "-mmemset_strategy=" : > >>> "-mmemcpy_strategy="); > >>> + return; > >>> + } > >>> + > >>> + input_ranges[n].min = mins; > >>> + input_ranges[n].max = maxs; > >>> + input_ranges[n].alg = alg; > >>> + if (!strcmp (align, "align")) > >>> + input_ranges[n].noalign = false; > >>> + else if (!strcmp (align, "noalign")) > >>> + input_ranges[n].noalign = true; > >>> + else > >>> + { > >>> + warning (0, "Unknown alignment %s specified for option %s", > >>> + align, is_memset ? "-mmemset_strategy=" : > >>> "-mmemcpy_strategy="); > >>> + return; > >>> + } > >>> + n++; > >>> + curr_range_str = next_range_str; > >>> + } while (curr_range_str); > >>> + > >>> + if (input_ranges[n - 1].max != -1) > >>> + { > >>> + warning (0, "The max value for the last size range should be -1" > >>> + " for option %s", > >>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); > >>> + return; > >>> + } > >>> + > >>> + if (n > MAX_STRINGOP_ALGS) > >>> + { > >>> + warning (0, "Too many size ranges specified in option %s", > >>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); > >>> + return; > >>> + } > >>> + > >>> + /* Now override the default algs array */ > >>> + for (i = 0; i < n; i++) > >>> + { > >>> + *const_cast<int *>(&default_algs->size[i].max) = > >>> input_ranges[i].max; > >>> + *const_cast<stringop_alg *>(&default_algs->size[i].alg) > >>> + = input_ranges[i].alg; > >>> + *const_cast<int *>(&default_algs->size[i].noalign) > >>> + = input_ranges[i].noalign; > >>> + } > >>> +} > >>> + > >>> > >>> /* Override various settings based on options. If MAIN_ARGS_P, the > >>> options are from the command line, otherwise they are from > >>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main > >>> /* Handle stack protector */ > >>> if (!global_options_set.x_ix86_stack_protector_guard) > >>> ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : > >>> SSP_TLS; > >>> + > >>> + /* Handle -mmemcpy-strategy= and -mmemset-strategy= */ > >>> + if (ix86_tune_memcpy_strategy) > >>> + { > >>> + char *str = xstrdup (ix86_tune_memcpy_strategy); > >>> + ix86_parse_stringop_strategy_string (str, false); > >>> + free (str); > >>> + } > >>> + > >>> + if (ix86_tune_memset_strategy) > >>> + { > >>> + char *str = xstrdup (ix86_tune_memset_strategy); > >>> + ix86_parse_stringop_strategy_string (str, true); > >>> + free (str); > >>> + } > >>> } > >>> > >>> /* Implement the TARGET_OPTION_OVERRIDE hook. */ > >>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt > >>> { > >>> case libcall: > >>> case no_stringop: > >>> + case last_alg: > >>> gcc_unreachable (); > >>> case loop_1_byte: > >>> need_zero_guard = true; > >>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt > >>> { > >>> case libcall: > >>> case no_stringop: > >>> + case last_alg: > >>> gcc_unreachable (); > >>> case loop_1_byte: > >>> case loop: > >>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e > >>> { > >>> case libcall: > >>> case no_stringop: > >>> + case last_alg: > >>> gcc_unreachable (); > >>> case loop: > >>> need_zero_guard = true; > >>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e > >>> { > >>> case libcall: > >>> case no_stringop: > >>> + case last_alg: > >>> gcc_unreachable (); > >>> case loop_1_byte: > >>> case loop: > >>> Index: config/i386/i386-opts.h > >>> =================================================================== > >>> --- config/i386/i386-opts.h (revision 201458) > >>> +++ config/i386/i386-opts.h (working copy) > >>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI > >>> /* Algorithm to expand string function with. */ > >>> enum stringop_alg > >>> { > >>> - no_stringop, > >>> - libcall, > >>> - rep_prefix_1_byte, > >>> - rep_prefix_4_byte, > >>> - rep_prefix_8_byte, > >>> - loop_1_byte, > >>> - loop, > >>> - unrolled_loop, > >>> - vector_loop > >>> +#undef DEF_ENUM > >>> +#define DEF_ENUM > >>> + > >>> +#undef DEF_ALG > >>> +#define DEF_ALG(alg, name) alg, > >>> + > >>> +#include "stringop.def" > >>> +last_alg > >>> + > >>> +#undef DEF_ENUM > >>> +#undef DEF_ALG > >>> }; > >>> > >>> /* Available call abi. */ > >>> Index: doc/invoke.texi > >>> =================================================================== > >>> --- doc/invoke.texi (revision 201458) > >>> +++ doc/invoke.texi (working copy) > >>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}. > >>> -mbmi2 -mrtm -mlwp -mthreads @gol > >>> -mno-align-stringops -minline-all-stringops @gol > >>> -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol > >>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} > >>> -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol > >>> -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol > >>> -mregparm=@var{num} -msseregparm @gol > >>> @@ -14598,6 +14599,24 @@ Expand into an inline loop. > >>> Always use a library call. > >>> @end table > >>> > >>> +@item -mmemcpy-strategy=@var{strategy} > >>> +@opindex mmemcpy-strategy=@var{strategy} > >>> +Override the internal decision heuristic to decide if > >>> @code{__builtin_memcpy} > >>> +should be inlined and what inline algorithm to use when the expected size > >>> +of the copy operation is known. @var{strategy} > >>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} > >>> triplets. > >>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} > >>> specifies > >>> +the max byte size with which inline algorithm @var{alg} is allowed. For > >>> the last > >>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the > >>> triplets > >>> +in the list must be specified in increasing order. The minimal byte size > >>> for > >>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + > >>> 1} of the > >>> +preceding range. > >>> + > >>> +@item -mmemset-strategy=@var{strategy} > >>> +@opindex mmemset-strategy=@var{strategy} > >>> +The option is similar to @option{-mmemcpy-strategy=} except that it is > >>> to control > >>> +@code{__builtin_memset} expansion. > >>> + > >>> @item -momit-leaf-frame-pointer > >>> @opindex momit-leaf-frame-pointer > >>> Don't keep the frame pointer in a register for leaf functions. This > >>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c > >>> =================================================================== > >>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) > >>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) > >>> @@ -0,0 +1,12 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" > >>> } */ > >>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } > >>> } } } */ > >>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } > >>> */ > >>> + > >>> +char a[2048]; > >>> +char b[2048]; > >>> +void t (void) > >>> +{ > >>> + __builtin_memcpy (a, b, 2048); > >>> +} > >>> + > >>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c > >>> =================================================================== > >>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) > >>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) > >>> @@ -0,0 +1,12 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-O2 -march=atom > >>> -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */ > >>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } > >>> } } } */ > >>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } > >>> */ > >>> + > >>> +char a[2048]; > >>> +char b[2048]; > >>> +void t (void) > >>> +{ > >>> + __builtin_memcpy (a, b, 2048); > >>> +} > >>> + > >>> Index: testsuite/gcc.target/i386/memset-strategy-1.c > >>> =================================================================== > >>> --- testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) > >>> +++ testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) > >>> @@ -0,0 +1,10 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */ > >>> +/* { dg-final { scan-assembler-times "memset" 2 } } */ > >>> + > >>> +char a[2048]; > >>> +void t (void) > >>> +{ > >>> + __builtin_memset (a, 1, 2048); > >>> +} > >>> + > >>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c > >>> =================================================================== > >>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) > >>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) > >>> @@ -0,0 +1,11 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-O2 -march=atom > >>> -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */ > >>> +/* { dg-final { scan-assembler-times "memcpy" 2 } } */ > >>> + > >>> +char a[2048]; > >>> +char b[2048]; > >>> +void t (void) > >>> +{ > >>> + __builtin_memcpy (a, b, 2048); > >>> +} > >>> + > >> > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation.