An options exception
Hi! I didn't find any precedent of the following before, so this can be a start for discussion. Options are known to be different between compilers and achieve options compatibility is somewhat complex because of this. GCC can be taken as a reference point, but since other comilers still can (and expected to) have their own proprietary options there potentially can be a situation that GCC redefines the same option that was previously (in time) introuced by other compiler. To evade such a `derail` can we elaborate a rule about new options introduction and make an escape prefix, such as -q. No options under this escape should be introduced for GCC. Then other comilers will never appear in a situation their options reused by GCC and cause a conflict. Sergos
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
Done. Sergos On Tue, Nov 26, 2013 at 1:46 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 26, 2013 at 10:43:32AM +0100, Richard Biener wrote: 2013-11-25 sergey.y.ostanevich sergos@gmail.com Please use your name with capital letters and spaces rather than all lowercase plus dots. Jakub 2013-11-25 Sergey Ostanevich sergos@gmail.com * c.opt: Introduced a new openmp-simd warning. * lang.opt: Ditto. * common.opt: Introduced a new option -fsimd-cost-model. * doc/invoke.texi: Introduced a new openmp-simd warning and a new -fsimd-cost-model option. * tree-vectorizer.h (unlimited_cost_model): Interface updated to rely on the particular loop info. * tree-vect-data-refs.c (vect_peeling_hash_insert): Ditto. (vect_peeling_hash_choose_best_peeling): Ditto. (vect_enhance_data_refs_alignment): Ditto. * tree-vect-slp.c (vect_slp_analyze_bb_1): Ditto. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto plus added openmp-simd warining. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index ac67885..2e9a3df 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -596,6 +596,10 @@ Wold-style-definition C ObjC Var(warn_old_style_definition) Warning Warn if an old-style parameter definition is used +Wopenmp-simd +C C++ Var(warn_openmp_simd) Warning LangEnabledBy(C C++,Wall) +Warn if a simd directive is overridden by the vectorizer cost model + Woverlength-strings C ObjC C++ ObjC++ Var(warn_overlength_strings) Warning LangEnabledBy(C ObjC C++ ObjC++,Wpedantic) Warn if a string is longer than the maximum portable length specified by the standard diff --git a/gcc/common.opt b/gcc/common.opt index a7af636..9ece683 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2300,6 +2300,10 @@ fvect-cost-model= Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) Specifies the cost model for vectorization +fsimd-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) +Specifies the vectorization cost model for code marked with a simd directive + Enum Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown vectorizer cost model %qs) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 466eee0..2ff413c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -256,7 +256,7 @@ Objective-C and Objective-C++ Dialects}. -Wlogical-op -Wlong-long @gol -Wmain -Wmaybe-uninitialized -Wmissing-braces -Wmissing-field-initializers @gol -Wmissing-include-dirs @gol --Wno-multichar -Wnonnull -Wno-overflow @gol +-Wno-multichar -Wnonnull -Wno-overflow -Wopenmp-simd @gol -Woverlength-strings -Wpacked -Wpacked-bitfield-compat -Wpadded @gol -Wparentheses -Wpedantic-ms-format -Wno-pedantic-ms-format @gol -Wpointer-arith -Wno-pointer-to-int-cast @gol @@ -3321,6 +3321,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}. -Wmaybe-uninitialized @gol -Wmissing-braces @r{(only for C/ObjC)} @gol -Wnonnull @gol +-Wopenmp-simd @gol -Wparentheses @gol -Wpointer-sign @gol -Wreorder @gol @@ -4815,6 +4816,12 @@ attribute. @opindex Woverflow Do not warn about compile-time overflow in constant expressions. +@item -Wopenmp-simd +@opindex Wopenm-simd +Warn if the vectorizer cost model overrides the OpenMP or the Cilk Plus +simd directive set by user. The @option{-fsimd-cost-model=unlimited} can +be used to relax the cost model. + @item -Woverride-init @r{(C and Objective-C only)} @opindex Woverride-init @opindex Wno-override-init @@ -8071,6 +8078,14 @@ is equal to the @code{dynamic} model. The default cost model depends on other optimization flags and is either @code{dynamic} or @code{cheap}. +@item -fsimd-cost-model=@var{model} +@opindex fsimd-cost-model +Alter the cost model used for vectorization of loops marked with the OpenMP +or Cilk Plus simd directive. The @var{model} argument should be one of +@code{unlimited}, @code{dynamic}, @code{cheap}. All values of @var{model} +have the same meaning as described in @option{fvect-cost-model} and by +default a cost model defined with @option{fvect-cost-model} is used. + @item -ftree-vrp @opindex ftree-vrp Perform Value Range Propagation on trees. This is similar to the diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index 5e09cbd..0d328c8 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -257,6 +257,10 @@ Wintrinsics-std Fortran Warning Warn on intrinsics not part of the selected standard +Wopenmp-simd +Fortran +; Documented in C + Wreal-q-constant Fortran Warning Warn about real-literal-constants with 'q' exponent-letter diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 8261645..d3e6e99 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1096,7 +1096,8
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
Updated patch with spaces, etc according to check_GNU_style.sh Put guard as per Tobias' request. Is it Ok? On Thu, Nov 21, 2013 at 6:18 PM, Sergey Ostanevich sergos@gmail.com wrote: Tobias, When I understand the patch correctly, the warning is shown in two cases: a) When the loop could be vectorized but the cost model prevented it b) When the loop couldn't be vectorized because of other reasons (e.g. not vectorizable because of conditional loop exits, incomplete vectorization support by the compiler etc.) Do I correctly understand the warning? I am asking because the *opt and *texi wording suggests that only (a) is the case. - I cannot test as the patch cannot be applied with heavy editing (removal of additional line breaks, taking care of tabs converted into spaces). I believe it's only for a) case, since warning stays along with the cost model report that says only about relative scalar and vector costs of iteration. The case of exits and vectorization capabilities is handled earlier, since we have some vector code here. Will try to attach the patch instead of copy-paste here. Regarding the warning, I think it sounds a bit colloquial and as if the location information is not available. What do you think of loop with simd directive not vectorized or concise not fully correct: simd loop not vectorized? took one of yours. Additionally, shouldn't that be guarded by if (warn_openmp_simd ? Otherwise the flag status isn't used at all in the whole patch. This is strange to me, since it worked as I pass the OPT_Wopenmp_simd to the warning_at (). It does: show warinig with -Wopenmp-simd doesn't show warning with -Wall -Wno-openmp-simd +Wopenmp-simd +C C++ Var(warn_openmp_simd) Warning EnabledBy(Wall) +Warn about simd directive is overridden by vectorizer cost model Wording wise, I'd prefer something like: Warn if an simd directive is overridden by the vectorizer cost model (Or is it a simd? Where are the native speakers when one needs them?) damn, right! I believe 'a' since simd starts with consonant. However, in light of my question above, shouldn't it be Warn if a loop with simd directive is not vectorized? +fsimd-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) +Specifies the vectorization cost model for code marked with simd directive I think an article is lacking before simd. done. +@item -Wopenmp-simd +@opindex Wopenm-simd +Warn if vectorizer cost model overrides simd directive from user. I think that can be expanded a bit. One could also mention OpenMP/Cilk Plus explicitly. Maybe like: Warn if the vectorizer cost model overrides the OpenMP and Cilk Plus simd directives of the user. done. Or if my reading above is correct, how about something like: Warn if a loop with OpenMP or Cilk Plus simd directive is not vectorized. If only the cost model prevented the vectorization, the @option{-fsimd-cost-model} option can be used to force the vectorization. Which brings me to my next point: -fvect-cost-model= is not documented. I think some words would be helpful, especially about the valid arguments, the default and how it interacts with -fvect-cost-model=. done. --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt +Wopenmp-simd +Fortran Warning +; Documented in C (Warning is also not needed as it is taken from c-family/*opt, but it shouldn't harm either.) done. Sergos * common.opt: Added new option -fsimd-cost-model. * tree-vectorizer.h (unlimited_cost_model): Interface update to rely on particular loop info. * tree-vect-data-refs.c (vect_peeling_hash_insert): Update to unlimited_cost_model call according to new interface. (vect_peeling_hash_choose_best_peeling): Ditto. (vect_enhance_data_refs_alignment): Ditto. * tree-vect-slp.c: Ditto. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto, plus issue a warning in case cost model overrides users' directive. * c.opt: add openmp-simd warning. * lang.opt: Ditto. * doc/invoke.texi: Added new openmp-simd warning. patch6 Description: Binary data
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
Tobias, When I understand the patch correctly, the warning is shown in two cases: a) When the loop could be vectorized but the cost model prevented it b) When the loop couldn't be vectorized because of other reasons (e.g. not vectorizable because of conditional loop exits, incomplete vectorization support by the compiler etc.) Do I correctly understand the warning? I am asking because the *opt and *texi wording suggests that only (a) is the case. - I cannot test as the patch cannot be applied with heavy editing (removal of additional line breaks, taking care of tabs converted into spaces). I believe it's only for a) case, since warning stays along with the cost model report that says only about relative scalar and vector costs of iteration. The case of exits and vectorization capabilities is handled earlier, since we have some vector code here. Will try to attach the patch instead of copy-paste here. Regarding the warning, I think it sounds a bit colloquial and as if the location information is not available. What do you think of loop with simd directive not vectorized or concise not fully correct: simd loop not vectorized? took one of yours. Additionally, shouldn't that be guarded by if (warn_openmp_simd ? Otherwise the flag status isn't used at all in the whole patch. This is strange to me, since it worked as I pass the OPT_Wopenmp_simd to the warning_at (). It does: show warinig with -Wopenmp-simd doesn't show warning with -Wall -Wno-openmp-simd +Wopenmp-simd +C C++ Var(warn_openmp_simd) Warning EnabledBy(Wall) +Warn about simd directive is overridden by vectorizer cost model Wording wise, I'd prefer something like: Warn if an simd directive is overridden by the vectorizer cost model (Or is it a simd? Where are the native speakers when one needs them?) damn, right! I believe 'a' since simd starts with consonant. However, in light of my question above, shouldn't it be Warn if a loop with simd directive is not vectorized? +fsimd-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) +Specifies the vectorization cost model for code marked with simd directive I think an article is lacking before simd. done. +@item -Wopenmp-simd +@opindex Wopenm-simd +Warn if vectorizer cost model overrides simd directive from user. I think that can be expanded a bit. One could also mention OpenMP/Cilk Plus explicitly. Maybe like: Warn if the vectorizer cost model overrides the OpenMP and Cilk Plus simd directives of the user. done. Or if my reading above is correct, how about something like: Warn if a loop with OpenMP or Cilk Plus simd directive is not vectorized. If only the cost model prevented the vectorization, the @option{-fsimd-cost-model} option can be used to force the vectorization. Which brings me to my next point: -fvect-cost-model= is not documented. I think some words would be helpful, especially about the valid arguments, the default and how it interacts with -fvect-cost-model=. done. --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt +Wopenmp-simd +Fortran Warning +; Documented in C (Warning is also not needed as it is taken from c-family/*opt, but it shouldn't harm either.) done. Sergos * common.opt: Added new option -fsimd-cost-model. * tree-vectorizer.h (unlimited_cost_model): Interface update to rely on particular loop info. * tree-vect-data-refs.c (vect_peeling_hash_insert): Update to unlimited_cost_model call according to new interface. (vect_peeling_hash_choose_best_peeling): Ditto. (vect_enhance_data_refs_alignment): Ditto. * tree-vect-slp.c: Ditto. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto, plus issue a warning in case cost model overrides users' directive. * c.opt: add openmp-simd warning. * lang.opt: Ditto. * doc/invoke.texi: Added new openmp-simd warning. patch3 Description: Binary data
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) possible_npeel_number = vf /nelements; /* Handle the aligned case. We may decide to align some other @@ -1437,7 +1438,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) if (DR_MISALIGNMENT (dr) == 0) { npeel_tmp = 0; - if (unlimited_cost_model ()) + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) possible_npeel_number++; } diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 86ebbd2..1fd28e3 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2696,7 +2696,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); /* Cost model disabled. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) { dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n); *ret_min_profitable_niters = 0; @@ -2929,6 +2929,13 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, /* vector version will never be profitable. */ else { + if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect) +{ + warning_at (LOOP_VINFO_LOC (loop_vinfo), OPT_Wopenmp_simd, + Vectorization did not happen for + the loop labeled as simd.); +} + if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, cost model: the vector iteration cost = %d diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 247bdfd..4b25964 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2171,7 +2171,7 @@ vect_slp_analyze_bb_1 (basic_block bb) } /* Cost model: check if the vectorization is worthwhile. */ - if (!unlimited_cost_model () + if (!unlimited_cost_model (NULL) !vect_bb_vectorization_profitable_p (bb_vinfo)) { if (dump_enabled_p ()) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a6c5b59..fd255db 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -919,9 +919,12 @@ known_alignment_for_access_p (struct data_reference *data_ref_info) /* Return true if the vect cost model is unlimited. */ static inline bool -unlimited_cost_model () +unlimited_cost_model (loop_p loop) { - return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED; + return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED + || (loop != NULL + loop-force_vect + flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED)); } /* Source location */ On Wed, Nov 20, 2013 at 12:14 AM, Tobias Burnus bur...@net-b.de wrote: I have some small comments to the patch: * You should also update gcc/doc/invoke.texi Sergey Ostanevich wrote: index 0026683..84911a0 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt ... +Wopenmp-simd +C C++ Var(openmp_simd) Warning +Warn about omp simd construct is overridden by cost model --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt ... +Wopenmp-simd +Fortran Warning +Warn about omp simd construct is overridden by cost model As the option files get merged, using Wopenmp-simd Fortran ; Documented in C is sufficient. +fsimd-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) Init(VECT_COST_MODEL_DEFAULT) +Specifies the cost model for vectorization in loops marked with omp simd I wonder whether we need to care about Cilk Plus' #pragma simd in this summary. + if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect) +{ + warning (OPT_Wopenmp_simd, Vectorization did not happen for the loop , + labeled as simd); +} + The warning line is too long. Tobias
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
Updated as per Richard and Jakub feedback - assuming the default for simd-cost-model is unlmited by default. Richard - was you Ok with it? Sergos * common.opt: Added new option -fsimd-cost-model. * tree-vectorizer.h (unlimited_cost_model): Interface update to rely on particular loop info. * tree-vect-data-refs.c (vect_peeling_hash_insert): Update to unlimited_cost_model call according to new interface. (vect_peeling_hash_choose_best_peeling): Ditto. (vect_enhance_data_refs_alignment): Ditto. * tree-vect-slp.c: Ditto. * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto, plus issue a warning in case cost model overrides users' directive. * c.opt: add openmp-simd warning. * lang.opt: Ditto. * doc/invoke.texi: Added new openmp-simd warning. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 0026683..6173013 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -592,6 +592,10 @@ Wold-style-definition C ObjC Var(warn_old_style_definition) Warning Warn if an old-style parameter definition is used +Wopenmp-simd +C C++ Var(warn_openmp_simd) Warning EnabledBy(Wall) +Warn about simd directive is overridden by vectorizer cost model + Woverlength-strings C ObjC C++ ObjC++ Var(warn_overlength_strings) Warning LangEnabledBy(C ObjC C++ ObjC++,Wpedantic) Warn if a string is longer than the maximum portable length specified by the standard diff --git a/gcc/common.opt b/gcc/common.opt index d5971df..6a40a5d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2296,10 +2296,17 @@ fvect-cost-model= Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) Specifies the cost model for vectorization +fsimd-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) +Specifies the vectorization cost model for code marked with simd directive + Enum Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown vectorizer cost model %qs) EnumValue +Enum(vect_cost_model) String(default) Value(VECT_COST_MODEL_DEFAULT) + +EnumValue Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED) EnumValue diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c250385..050bd44 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -256,7 +256,7 @@ Objective-C and Objective-C++ Dialects}. -Wlogical-op -Wlong-long @gol -Wmain -Wmaybe-uninitialized -Wmissing-braces -Wmissing-field-initializers @gol -Wmissing-include-dirs @gol --Wno-multichar -Wnonnull -Wno-overflow @gol +-Wno-multichar -Wnonnull -Wno-overflow -Wopenmp-simd @gol -Woverlength-strings -Wpacked -Wpacked-bitfield-compat -Wpadded @gol -Wparentheses -Wpedantic-ms-format -Wno-pedantic-ms-format @gol -Wpointer-arith -Wno-pointer-to-int-cast @gol @@ -3318,6 +3318,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}. -Wmaybe-uninitialized @gol -Wmissing-braces @r{(only for C/ObjC)} @gol -Wnonnull @gol +-Wopenmp-simd @gol -Wparentheses @gol -Wpointer-sign @gol -Wreorder @gol @@ -4804,6 +4805,10 @@ attribute. @opindex Woverflow Do not warn about compile-time overflow in constant expressions. +@item -Wopenmp-simd +@opindex Wopenm-simd +Warn if vectorizer cost model overrides simd directive from user. + @item -Woverride-init @r{(C and Objective-C only)} @opindex Woverride-init @opindex Wno-override-init diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index 5e09cbd..b43c48c 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -257,6 +257,10 @@ Wintrinsics-std Fortran Warning Warn on intrinsics not part of the selected standard +Wopenmp-simd +Fortran Warning +; Documented in C + Wreal-q-constant Fortran Warning Warn about real-literal-constants with 'q' exponent-letter diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 83d1f45..977db43 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1090,7 +1090,8 @@ vect_peeling_hash_insert (loop_vec_info loop_vinfo, struct data_reference *dr, *new_slot = slot; } - if (!supportable_dr_alignment unlimited_cost_model ()) + if (!supportable_dr_alignment + unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) slot-count += VECT_MAX_COST; } @@ -1200,7 +1201,7 @@ vect_peeling_hash_choose_best_peeling (loop_vec_info loop_vinfo, res.peel_info.dr = NULL; res.body_cost_vec = stmt_vector_for_cost (); - if (!unlimited_cost_model ()) + if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) { res.inside_cost = INT_MAX; res.outside_cost = INT_MAX; @@ -1429,7 +1430,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) vectorization factor. We do this automtically for cost model, since we calculate cost for every peeling option. */ -
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
bool -unlimited_cost_model () +unlimited_cost_model (loop_p loop) { - return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED; + return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED + || (loop != NULL + loop-force_vect + flag_simd_vect_cost_model == VECT_COST_MODEL_UNLIMITED)); } /* Source location */ On Mon, Nov 18, 2013 at 7:13 PM, Richard Biener rguent...@suse.de wrote: On Mon, 18 Nov 2013, Sergey Ostanevich wrote: I would agree that the example is just for the case cost model makes correct estimation But how can we assure ourself that it won't have any mistakes in the future? We call it bugs and not mistakes and we have bugzilla for it. Richard. I believe it'll be Ok to introduce an extra flag as Jakub proposed for the dedicated simd-forced vectorization to use unlimited cost model. This can be default for -fopenmp or there should be a warning issued that compiler overrides user's request of vectorization. In such a case user can enforce vectorization (even with mentioned results :) with this unlimited cost model for simd. On Fri, Nov 15, 2013 at 6:24 PM, Richard Biener rguent...@suse.de wrote: On Fri, 15 Nov 2013, Sergey Ostanevich wrote: Richard, here's an example that causes trigger for the cost model. I hardly believe that (AVX2) .L9: vmovups (%rsi), %xmm3 addl$1, %r8d addq$256, %rsi vinsertf128 $0x1, -240(%rsi), %ymm3, %ymm1 vmovups -224(%rsi), %xmm3 vinsertf128 $0x1, -208(%rsi), %ymm3, %ymm3 vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm2 vshufps $68, %ymm2, %ymm3, %ymm1 vshufps $238, %ymm2, %ymm3, %ymm2 vmovups -192(%rsi), %xmm3 vinsertf128 $1, %xmm2, %ymm1, %ymm2 vinsertf128 $0x1, -176(%rsi), %ymm3, %ymm1 vmovups -160(%rsi), %xmm3 vinsertf128 $0x1, -144(%rsi), %ymm3, %ymm3 vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm1 vshufps $68, %ymm1, %ymm3, %ymm4 vshufps $238, %ymm1, %ymm3, %ymm1 vmovups -128(%rsi), %xmm3 vinsertf128 $1, %xmm1, %ymm4, %ymm1 vshufps $136, %ymm1, %ymm2, %ymm1 vperm2f128 $3, %ymm1, %ymm1, %ymm2 vshufps $68, %ymm2, %ymm1, %ymm4 vshufps $238, %ymm2, %ymm1, %ymm2 vinsertf128 $0x1, -112(%rsi), %ymm3, %ymm1 vmovups -96(%rsi), %xmm3 vinsertf128 $1, %xmm2, %ymm4, %ymm4 vinsertf128 $0x1, -80(%rsi), %ymm3, %ymm3 vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm2 vshufps $68, %ymm2, %ymm3, %ymm1 vshufps $238, %ymm2, %ymm3, %ymm2 vmovups -64(%rsi), %xmm3 vinsertf128 $1, %xmm2, %ymm1, %ymm2 vinsertf128 $0x1, -48(%rsi), %ymm3, %ymm1 vmovups -32(%rsi), %xmm3 vinsertf128 $0x1, -16(%rsi), %ymm3, %ymm3 cmpl%r8d, %edi vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm1 vshufps $68, %ymm1, %ymm3, %ymm5 vshufps $238, %ymm1, %ymm3, %ymm1 vinsertf128 $1, %xmm1, %ymm5, %ymm1 vshufps $136, %ymm1, %ymm2, %ymm1 vperm2f128 $3, %ymm1, %ymm1, %ymm2 vshufps $68, %ymm2, %ymm1, %ymm3 vshufps $238, %ymm2, %ymm1, %ymm2 vinsertf128 $1, %xmm2, %ymm3, %ymm1 vshufps $136, %ymm1, %ymm4, %ymm1 vperm2f128 $3, %ymm1, %ymm1, %ymm2 vshufps $68, %ymm2, %ymm1, %ymm3 vshufps $238, %ymm2, %ymm1, %ymm2 vinsertf128 $1, %xmm2, %ymm3, %ymm2 vaddps %ymm2, %ymm0, %ymm0 ja .L9 is more efficient than .L3: vaddss (%rcx,%rax), %xmm0, %xmm0 addq$32, %rax cmpq%rdx, %rax jne .L3 ;) As soon as elemental functions will appear and we update the vectorizer so it can accept an elemental function inside the loop - we will have the same situation as we have it now: cost model will bail out with profitability estimation. Yes. Still we have no chance to get info on how efficient the bar() function when it is in vector form. Well I assume you mean that the speedup when vectorizing the elemental will offset whatever wreckage we cause with vectorizing the rest of the statements. I'd say you can at least compare to unrolling by the vectorization factor, building the vector inputs to the elemental from scalars, distributing the vector result from the elemental to scalars. I believe I should repeat: #pragma omp simd is intended for introduction of an instruction-level parallel region on developer's request, hence should be treated in same manner as #pragma omp parallel. Vectorizer cost model is an obstacle
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
On Tue, Nov 19, 2013 at 6:07 PM, Richard Biener rguent...@suse.de wrote: On Tue, 19 Nov 2013, Sergey Ostanevich wrote: :) agree to you, but as soon as you're a user who tries to introduce vector code and face a bug in cost model you'd like to have a workaround until the bug will be fixed and compiler will come to you with new OS distribution, don't you? I propose the following, yet SLP have to use a NULL as a loop info which looks somewhat hacky. I think this is overengineering. -fvect-cost-model will do as workaround. And -fsimd-vect-cost-model has what I consider duplicate - simd and vect. I just wanted to separate the autovectorized loops from ones user wants to vectorize. The -fvect-cost-model will force all at once. That's the reason to introcude the simd-vect, since pragma name is simd. Richard. Sergos * common.opt: Added new option -fsimd-vect-cost-model * tree-vectorizer.h (unlimited_cost_model): Interface update to rely on particular loop info * tree-vect-data-refs.c (vect_peeling_hash_insert): Update to unlimited_cost_model call according to new interface (vect_peeling_hash_choose_best_peeling): Ditto (vect_enhance_data_refs_alignment): Ditto * tree-vect-slp.c: Ditto * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto plus issue a warning in case cost model overrides users' directive diff --git a/gcc/common.opt b/gcc/common.opt index d5971df..87b3b37 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2296,6 +2296,10 @@ fvect-cost-model= Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) Specifies the cost model for vectorization +fsimd-vect-cost-model= +Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_vect_cost_model) Init(VECT_COST_MODEL_UNLIMITED) +Specifies the cost model for vectorization in loops marked with #pragma omp simd + Enum Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown vectorizer cost model %qs) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 83d1f45..e26f704 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1090,7 +1090,8 @@ vect_peeling_hash_insert (loop_vec_info loop_vinfo, struct data_reference *dr, *new_slot = slot; } - if (!supportable_dr_alignment unlimited_cost_model ()) + if (!supportable_dr_alignment + unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) slot-count += VECT_MAX_COST; } @@ -1200,7 +1201,7 @@ vect_peeling_hash_choose_best_peeling (loop_vec_info loop_vinfo, res.peel_info.dr = NULL; res.body_cost_vec = stmt_vector_for_cost (); - if (!unlimited_cost_model ()) + if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) { res.inside_cost = INT_MAX; res.outside_cost = INT_MAX; @@ -1429,7 +1430,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) vectorization factor. We do this automtically for cost model, since we calculate cost for every peeling option. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) possible_npeel_number = vf /nelements; /* Handle the aligned case. We may decide to align some other @@ -1437,7 +1438,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) if (DR_MISALIGNMENT (dr) == 0) { npeel_tmp = 0; - if (unlimited_cost_model ()) + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) possible_npeel_number++; } diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 86ebbd2..be66172 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2696,7 +2696,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); /* Cost model disabled. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) { dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n); *ret_min_profitable_niters = 0; @@ -2929,6 +2929,11 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, /* vector version will never be profitable. */ else { + if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect) +{ + pedwarn (vect_location, 0, Vectorization did not happen for the loop); +} + if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, cost model: the vector iteration cost = %d diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 247bdfd..4b25964 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
I propose the following, yet SLP have to use a NULL as a loop info which looks somewhat hacky. I think this is overengineering. -fvect-cost-model will do as workaround. And -fsimd-vect-cost-model has what I consider duplicate - simd and vect. I think it is a good idea, though I agree about s/simd-vect/simd/ and I'd use VECT_COST_MODEL_DEFAULT as the default, which would mean just use -fvect-cost-model. that's ok, since we'd have a way to force those 'simd' loops. @@ -2929,6 +2929,11 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, /* vector version will never be profitable. */ else { + if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect) +{ + pedwarn (vect_location, 0, Vectorization did not happen for the loop); +} pedwarn isn't really desirable for this, you want just warning, but some warning you can actually also turn off. -Wopenmp-simd (and we'd use it also when we ignore #pragma omp declare simd because it wasn't useful/desirable). consider a user is interested in enabling warning-as-error for this case? can we disable the pedwarn the same way? Sergos Jakub
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, cost model: the vector iteration cost = %d diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 247bdfd..4b25964 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2171,7 +2171,7 @@ vect_slp_analyze_bb_1 (basic_block bb) } /* Cost model: check if the vectorization is worthwhile. */ - if (!unlimited_cost_model () + if (!unlimited_cost_model (NULL) !vect_bb_vectorization_profitable_p (bb_vinfo)) { if (dump_enabled_p ()) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a6c5b59..fd255db 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -919,9 +919,12 @@ known_alignment_for_access_p (struct data_reference *data_ref_info) /* Return true if the vect cost model is unlimited. */ static inline bool -unlimited_cost_model () +unlimited_cost_model (loop_p loop) { - return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED; + return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED + || (loop != NULL + loop-force_vect + flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED)); } /* Source location */ On Tue, Nov 19, 2013 at 6:42 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 19, 2013 at 06:39:48PM +0400, Sergey Ostanevich wrote: pedwarn isn't really desirable for this, you want just warning, but some warning you can actually also turn off. -Wopenmp-simd (and we'd use it also when we ignore #pragma omp declare simd because it wasn't useful/desirable). consider a user is interested in enabling warning-as-error for this case? -Werror=openmp-simd will work then, this works for any named warnings. can we disable the pedwarn the same way? pedwarn is for pedantic warnings, no standard says that #pragma omp simd must be vectorized, or that #pragma omp simd or #pragma omp declare simd is anything but an optimization hint, so pedwarn isn't what you are looking for. Jakub
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
I would agree that the example is just for the case cost model makes correct estimation But how can we assure ourself that it won't have any mistakes in the future? I believe it'll be Ok to introduce an extra flag as Jakub proposed for the dedicated simd-forced vectorization to use unlimited cost model. This can be default for -fopenmp or there should be a warning issued that compiler overrides user's request of vectorization. In such a case user can enforce vectorization (even with mentioned results :) with this unlimited cost model for simd. On Fri, Nov 15, 2013 at 6:24 PM, Richard Biener rguent...@suse.de wrote: On Fri, 15 Nov 2013, Sergey Ostanevich wrote: Richard, here's an example that causes trigger for the cost model. I hardly believe that (AVX2) .L9: vmovups (%rsi), %xmm3 addl$1, %r8d addq$256, %rsi vinsertf128 $0x1, -240(%rsi), %ymm3, %ymm1 vmovups -224(%rsi), %xmm3 vinsertf128 $0x1, -208(%rsi), %ymm3, %ymm3 vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm2 vshufps $68, %ymm2, %ymm3, %ymm1 vshufps $238, %ymm2, %ymm3, %ymm2 vmovups -192(%rsi), %xmm3 vinsertf128 $1, %xmm2, %ymm1, %ymm2 vinsertf128 $0x1, -176(%rsi), %ymm3, %ymm1 vmovups -160(%rsi), %xmm3 vinsertf128 $0x1, -144(%rsi), %ymm3, %ymm3 vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm1 vshufps $68, %ymm1, %ymm3, %ymm4 vshufps $238, %ymm1, %ymm3, %ymm1 vmovups -128(%rsi), %xmm3 vinsertf128 $1, %xmm1, %ymm4, %ymm1 vshufps $136, %ymm1, %ymm2, %ymm1 vperm2f128 $3, %ymm1, %ymm1, %ymm2 vshufps $68, %ymm2, %ymm1, %ymm4 vshufps $238, %ymm2, %ymm1, %ymm2 vinsertf128 $0x1, -112(%rsi), %ymm3, %ymm1 vmovups -96(%rsi), %xmm3 vinsertf128 $1, %xmm2, %ymm4, %ymm4 vinsertf128 $0x1, -80(%rsi), %ymm3, %ymm3 vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm2 vshufps $68, %ymm2, %ymm3, %ymm1 vshufps $238, %ymm2, %ymm3, %ymm2 vmovups -64(%rsi), %xmm3 vinsertf128 $1, %xmm2, %ymm1, %ymm2 vinsertf128 $0x1, -48(%rsi), %ymm3, %ymm1 vmovups -32(%rsi), %xmm3 vinsertf128 $0x1, -16(%rsi), %ymm3, %ymm3 cmpl%r8d, %edi vshufps $136, %ymm3, %ymm1, %ymm3 vperm2f128 $3, %ymm3, %ymm3, %ymm1 vshufps $68, %ymm1, %ymm3, %ymm5 vshufps $238, %ymm1, %ymm3, %ymm1 vinsertf128 $1, %xmm1, %ymm5, %ymm1 vshufps $136, %ymm1, %ymm2, %ymm1 vperm2f128 $3, %ymm1, %ymm1, %ymm2 vshufps $68, %ymm2, %ymm1, %ymm3 vshufps $238, %ymm2, %ymm1, %ymm2 vinsertf128 $1, %xmm2, %ymm3, %ymm1 vshufps $136, %ymm1, %ymm4, %ymm1 vperm2f128 $3, %ymm1, %ymm1, %ymm2 vshufps $68, %ymm2, %ymm1, %ymm3 vshufps $238, %ymm2, %ymm1, %ymm2 vinsertf128 $1, %xmm2, %ymm3, %ymm2 vaddps %ymm2, %ymm0, %ymm0 ja .L9 is more efficient than .L3: vaddss (%rcx,%rax), %xmm0, %xmm0 addq$32, %rax cmpq%rdx, %rax jne .L3 ;) As soon as elemental functions will appear and we update the vectorizer so it can accept an elemental function inside the loop - we will have the same situation as we have it now: cost model will bail out with profitability estimation. Yes. Still we have no chance to get info on how efficient the bar() function when it is in vector form. Well I assume you mean that the speedup when vectorizing the elemental will offset whatever wreckage we cause with vectorizing the rest of the statements. I'd say you can at least compare to unrolling by the vectorization factor, building the vector inputs to the elemental from scalars, distributing the vector result from the elemental to scalars. I believe I should repeat: #pragma omp simd is intended for introduction of an instruction-level parallel region on developer's request, hence should be treated in same manner as #pragma omp parallel. Vectorizer cost model is an obstacle here, not a help. Surely not if there isn't an elemental call in it. With it the cost model of course will have not enough information to decide. But still, what's the difference to the case where we cannot vectorize the function? What happens if we cannot vectorize the elemental? Do we have to build scalar versions for all possible vector sizes? Richard. Regards, Sergos On Fri, Nov 15, 2013 at 1:08 AM, Richard Biener rguent...@suse.de wrote: Sergey Ostanevich sergos@gmail.com wrote: this is only for the whole file? I mean to have a particular loop vectorized in a file while all others - up to compiler's cost model
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
Richard, here's an example that causes trigger for the cost model. As soon as elemental functions will appear and we update the vectorizer so it can accept an elemental function inside the loop - we will have the same situation as we have it now: cost model will bail out with profitability estimation. Still we have no chance to get info on how efficient the bar() function when it is in vector form. I believe I should repeat: #pragma omp simd is intended for introduction of an instruction-level parallel region on developer's request, hence should be treated in same manner as #pragma omp parallel. Vectorizer cost model is an obstacle here, not a help. Regards, Sergos On Fri, Nov 15, 2013 at 1:08 AM, Richard Biener rguent...@suse.de wrote: Sergey Ostanevich sergos@gmail.com wrote: this is only for the whole file? I mean to have a particular loop vectorized in a file while all others - up to compiler's cost model. is there such a machinery? No, there is not. Richard. Sergos On Thu, Nov 14, 2013 at 12:39 PM, Richard Biener rguent...@suse.de wrote: On Wed, 13 Nov 2013, Sergey Ostanevich wrote: I will get some tests. As for cost analysis - simply consider the pragma as a request to vectorize. How can I - as a developer - enforce it beyond the pragma? You can disable the cost model via -fvect-cost-model=unlimited Richard. On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener rguent...@suse.de wrote: On Tue, 12 Nov 2013, Sergey Ostanevich wrote: The reason patch was in its original state is because we want to notify user that his assumption of profitability may be wrong. This is not a part of any spec and as far as I know ICC does not notify user about the case. Still it can be a good hint for those users who tries to get as much as possible performance. Richard's comment on the vectorization problems is about the same - to inform user that his attempt to force vectorization is failed. As for profitable or not - sometimes I believe it's impossible to be precise. For OMP we have case of a vector version of a function and we have no chance to figure out whether it is profitable to use it or to loose it. If we can't map the loop for any vector length other than 1 - I believe in this case we have to bail out and report. Is it about 'never profitable'? For example. I think we should report non-vectorized loops that are marked with force_vect anyway, with -Wdisabled-optimization. Another case is that a loop may be profitable to vectorize if the ISA supports a gather instruction but otherwise not. Or if the ISA supports efficient vector construction from N not loop invariant scalars (for vectorization of strided loads). Simply disregarding all of the cost analysis sounds completely bogus to me. I'd simply go for the diagnostic for now, not changing anything else. We want to have a good understanding about why the cost model is so bad that we have to force to ignore it for #pragma simd - thus we want testcases. Richard. On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote: On 11/12/13 3:16 PM, Jakub Jelinek wrote: On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote: ivdep just substitutes all cross-iteration data analysis, nothing related to cost model. ICC does not cancel its cost model in case of #pragma ivdep as for the safelen - OMP standart treats it as a limitation for the vector length. this means if no safelen is present an arbitrary vector length can be used. I was talking about GCC loop-safelen, which is INT_MAX for #pragma omp simd without safelen clause or #pragma simd without vectorlength clause. so I believe loop-force_vect is the only trigger to disregard the cost model Anyway, in that case I think the originally posted patch is wrong, if we want to treat force_vect as disregard all the cost model and force vectorization (well, the name of the field already kind of suggest that), then IMHO we should treat it the same as -fvect-cost-model=unlimited for those loops. Err - the user may have a specific sub-architecture in mind when using #pragma simd, if you say we should completely ignore the cost model then should we also sorry () if we cannot vectorize the loop (either because of GCC deficiencies or lack of sub-target support)? That said, at least in the cases that the cost model says the loop is never profitable to vectorize we should follow its advice. Richard. Thus (untested): 2013-11-12 Jakub Jelinek ja...@redhat.com * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use unlimited cost model also for force_vect loops. --- gcc/tree-vect-loop.c.jj 2013-11-12 12:09:40.0 +0100 +++ gcc/tree-vect-loop.c 2013-11-12 15:11:43.821404330 +0100 @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop void
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
this is only for the whole file? I mean to have a particular loop vectorized in a file while all others - up to compiler's cost model. is there such a machinery? Sergos On Thu, Nov 14, 2013 at 12:39 PM, Richard Biener rguent...@suse.de wrote: On Wed, 13 Nov 2013, Sergey Ostanevich wrote: I will get some tests. As for cost analysis - simply consider the pragma as a request to vectorize. How can I - as a developer - enforce it beyond the pragma? You can disable the cost model via -fvect-cost-model=unlimited Richard. On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener rguent...@suse.de wrote: On Tue, 12 Nov 2013, Sergey Ostanevich wrote: The reason patch was in its original state is because we want to notify user that his assumption of profitability may be wrong. This is not a part of any spec and as far as I know ICC does not notify user about the case. Still it can be a good hint for those users who tries to get as much as possible performance. Richard's comment on the vectorization problems is about the same - to inform user that his attempt to force vectorization is failed. As for profitable or not - sometimes I believe it's impossible to be precise. For OMP we have case of a vector version of a function and we have no chance to figure out whether it is profitable to use it or to loose it. If we can't map the loop for any vector length other than 1 - I believe in this case we have to bail out and report. Is it about 'never profitable'? For example. I think we should report non-vectorized loops that are marked with force_vect anyway, with -Wdisabled-optimization. Another case is that a loop may be profitable to vectorize if the ISA supports a gather instruction but otherwise not. Or if the ISA supports efficient vector construction from N not loop invariant scalars (for vectorization of strided loads). Simply disregarding all of the cost analysis sounds completely bogus to me. I'd simply go for the diagnostic for now, not changing anything else. We want to have a good understanding about why the cost model is so bad that we have to force to ignore it for #pragma simd - thus we want testcases. Richard. On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote: On 11/12/13 3:16 PM, Jakub Jelinek wrote: On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote: ivdep just substitutes all cross-iteration data analysis, nothing related to cost model. ICC does not cancel its cost model in case of #pragma ivdep as for the safelen - OMP standart treats it as a limitation for the vector length. this means if no safelen is present an arbitrary vector length can be used. I was talking about GCC loop-safelen, which is INT_MAX for #pragma omp simd without safelen clause or #pragma simd without vectorlength clause. so I believe loop-force_vect is the only trigger to disregard the cost model Anyway, in that case I think the originally posted patch is wrong, if we want to treat force_vect as disregard all the cost model and force vectorization (well, the name of the field already kind of suggest that), then IMHO we should treat it the same as -fvect-cost-model=unlimited for those loops. Err - the user may have a specific sub-architecture in mind when using #pragma simd, if you say we should completely ignore the cost model then should we also sorry () if we cannot vectorize the loop (either because of GCC deficiencies or lack of sub-target support)? That said, at least in the cases that the cost model says the loop is never profitable to vectorize we should follow its advice. Richard. Thus (untested): 2013-11-12 Jakub Jelinek ja...@redhat.com * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use unlimited cost model also for force_vect loops. --- gcc/tree-vect-loop.c.jj 2013-11-12 12:09:40.0 +0100 +++ gcc/tree-vect-loop.c 2013-11-12 15:11:43.821404330 +0100 @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); /* Cost model disabled. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model () || LOOP_VINFO_LOOP (loop_vinfo)-force_vect) { dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n); *ret_min_profitable_niters = 0; Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
I will get some tests. As for cost analysis - simply consider the pragma as a request to vectorize. How can I - as a developer - enforce it beyond the pragma? On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener rguent...@suse.de wrote: On Tue, 12 Nov 2013, Sergey Ostanevich wrote: The reason patch was in its original state is because we want to notify user that his assumption of profitability may be wrong. This is not a part of any spec and as far as I know ICC does not notify user about the case. Still it can be a good hint for those users who tries to get as much as possible performance. Richard's comment on the vectorization problems is about the same - to inform user that his attempt to force vectorization is failed. As for profitable or not - sometimes I believe it's impossible to be precise. For OMP we have case of a vector version of a function and we have no chance to figure out whether it is profitable to use it or to loose it. If we can't map the loop for any vector length other than 1 - I believe in this case we have to bail out and report. Is it about 'never profitable'? For example. I think we should report non-vectorized loops that are marked with force_vect anyway, with -Wdisabled-optimization. Another case is that a loop may be profitable to vectorize if the ISA supports a gather instruction but otherwise not. Or if the ISA supports efficient vector construction from N not loop invariant scalars (for vectorization of strided loads). Simply disregarding all of the cost analysis sounds completely bogus to me. I'd simply go for the diagnostic for now, not changing anything else. We want to have a good understanding about why the cost model is so bad that we have to force to ignore it for #pragma simd - thus we want testcases. Richard. On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote: On 11/12/13 3:16 PM, Jakub Jelinek wrote: On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote: ivdep just substitutes all cross-iteration data analysis, nothing related to cost model. ICC does not cancel its cost model in case of #pragma ivdep as for the safelen - OMP standart treats it as a limitation for the vector length. this means if no safelen is present an arbitrary vector length can be used. I was talking about GCC loop-safelen, which is INT_MAX for #pragma omp simd without safelen clause or #pragma simd without vectorlength clause. so I believe loop-force_vect is the only trigger to disregard the cost model Anyway, in that case I think the originally posted patch is wrong, if we want to treat force_vect as disregard all the cost model and force vectorization (well, the name of the field already kind of suggest that), then IMHO we should treat it the same as -fvect-cost-model=unlimited for those loops. Err - the user may have a specific sub-architecture in mind when using #pragma simd, if you say we should completely ignore the cost model then should we also sorry () if we cannot vectorize the loop (either because of GCC deficiencies or lack of sub-target support)? That said, at least in the cases that the cost model says the loop is never profitable to vectorize we should follow its advice. Richard. Thus (untested): 2013-11-12 Jakub Jelinek ja...@redhat.com * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use unlimited cost model also for force_vect loops. --- gcc/tree-vect-loop.c.jj 2013-11-12 12:09:40.0 +0100 +++ gcc/tree-vect-loop.c 2013-11-12 15:11:43.821404330 +0100 @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); /* Cost model disabled. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model () || LOOP_VINFO_LOOP (loop_vinfo)-force_vect) { dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n); *ret_min_profitable_niters = 0; Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
Richard, Jakub, You are right regarding the cost model if we talk about vectorizer alone. But the #pragma omp simd goes beyond the vectorizer - it introduces parallel context in a place user defines, similar to #pragma omp parallel. Are we applying any cost model for omp parallel region? You can consider this pragma as a helper for developer for 'easily' introduce parallelism in his code, hence any type of cost model - whatever quality it is - will plays against this paradigm, forcing user to play around our cost model to let it make the loop simd-parallel. Sergos On Thu, Oct 31, 2013 at 10:36 PM, Richard Biener rguent...@suse.de wrote: Jakub Jelinek ja...@redhat.com wrote: On Thu, Oct 31, 2013 at 07:02:28PM +0400, Yuri Rumyantsev wrote: Here is a simple fix which allows to vectorize loop marked with 'pragma omp simd' even if cost model tells us that vectorization is not profitable. I checked that on simple test-case is works as expected. Is it Ok for trunk? ChangeLog: 2013-10-31 Yuri Rumyantsev ysrum...@gmail.com * tree-vect-loop.c (vect_estimate_min_profitable_iters): Override cost estimation for loops marked as vectorizable. That looks too simplistics, IMHO it is undesirable to disregard the profitability checks together. For #pragma omp simd or #pragma simd loops, I can understand that we should admit our cost model is not very high quality and so in border cases consider vectorizing rather than not vectorizing, say for force_vect by increasing the scalar cost by some factor or decreasing vector cost by some factor, but disregarding it altogether doesn't look wise. The question is what factor should we use? 150% of scalar cost, something else? Please improve the cost-model instead. Thanks, Richard. Jakub
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
yes, ICC ignores cost analysis and follows user request on introduction of simd parallelism in the loop.they follow the omp parallel semantics. On Tue, Nov 12, 2013 at 3:05 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 12, 2013 at 02:48:46PM +0400, Sergey Ostanevich wrote: You are right regarding the cost model if we talk about vectorizer alone. But the #pragma omp simd goes beyond the vectorizer - it introduces parallel context in a place user defines, similar to #pragma omp parallel. Are we applying any cost model for omp parallel region? You can consider this pragma as a helper for developer for 'easily' introduce parallelism in his code, hence any type of cost model - whatever quality it is - will plays against this paradigm, forcing user to play around our cost model to let it make the loop simd-parallel. So what do other compilers do here? Does icc also totally ignore all cost analysis for #pragma omp simd or #pragma simd and vectorizes even when it would be obviously undesirable? Jakub
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
ivdep just substitutes all cross-iteration data analysis, nothing related to cost model. ICC does not cancel its cost model in case of #pragma ivdep as for the safelen - OMP standart treats it as a limitation for the vector length. this means if no safelen is present an arbitrary vector length can be used. so I believe loop-force_vect is the only trigger to disregard the cost model On Tue, Nov 12, 2013 at 4:48 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 12, 2013 at 04:45:17PM +0400, Sergey Ostanevich wrote: yes, ICC ignores cost analysis and follows user request on introduction of simd parallelism in the loop.they follow the omp parallel semantics. What about #pragma ivdep? I.e. if we decided to follow ICC here, should we ignore cost model just for loop-safelen loop-force_vect loops (or only loop-force_vect), or also for any other loop-safelen loops? Jakub
Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.
The reason patch was in its original state is because we want to notify user that his assumption of profitability may be wrong. This is not a part of any spec and as far as I know ICC does not notify user about the case. Still it can be a good hint for those users who tries to get as much as possible performance. Richard's comment on the vectorization problems is about the same - to inform user that his attempt to force vectorization is failed. As for profitable or not - sometimes I believe it's impossible to be precise. For OMP we have case of a vector version of a function and we have no chance to figure out whether it is profitable to use it or to loose it. If we can't map the loop for any vector length other than 1 - I believe in this case we have to bail out and report. Is it about 'never profitable'? On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote: On 11/12/13 3:16 PM, Jakub Jelinek wrote: On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote: ivdep just substitutes all cross-iteration data analysis, nothing related to cost model. ICC does not cancel its cost model in case of #pragma ivdep as for the safelen - OMP standart treats it as a limitation for the vector length. this means if no safelen is present an arbitrary vector length can be used. I was talking about GCC loop-safelen, which is INT_MAX for #pragma omp simd without safelen clause or #pragma simd without vectorlength clause. so I believe loop-force_vect is the only trigger to disregard the cost model Anyway, in that case I think the originally posted patch is wrong, if we want to treat force_vect as disregard all the cost model and force vectorization (well, the name of the field already kind of suggest that), then IMHO we should treat it the same as -fvect-cost-model=unlimited for those loops. Err - the user may have a specific sub-architecture in mind when using #pragma simd, if you say we should completely ignore the cost model then should we also sorry () if we cannot vectorize the loop (either because of GCC deficiencies or lack of sub-target support)? That said, at least in the cases that the cost model says the loop is never profitable to vectorize we should follow its advice. Richard. Thus (untested): 2013-11-12 Jakub Jelinek ja...@redhat.com * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use unlimited cost model also for force_vect loops. --- gcc/tree-vect-loop.c.jj 2013-11-12 12:09:40.0 +0100 +++ gcc/tree-vect-loop.c 2013-11-12 15:11:43.821404330 +0100 @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); /* Cost model disabled. */ - if (unlimited_cost_model ()) + if (unlimited_cost_model () || LOOP_VINFO_LOOP (loop_vinfo)-force_vect) { dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n); *ret_min_profitable_niters = 0; Jakub
Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 3)
ouch.. html got me! applying to the same as before commit 0e3dfadd374c3045a926afa6d06af276cee2108d Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Oct 18 08:36:28 2013 + trying '06 build. for take 2 on HSW there were mixed results for passed tests: 400.perlbench +0.25% 401.bzip2 -0.40% 429.mcf -1.67% 456.hmmer +0% 458.sjeng -1.65% 471.omnetpp -1.72% 483.xalancbmk -0.25% 410.bwaves+0% 433.milc -0.28% 444.namd -1.13% 450.soplex-1.01% 459.GemsFDTD -0.25% 470.lbm +1.44% 482.sphinx3 +0.61% Sergos On Tue, Oct 22, 2013 at 5:12 PM, Sergey Ostanevich sergos@gmail.com wrote: applying to the same as before commit 0e3dfadd374c3045a926afa6d06af276cee2108d Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Oct 18 08:36:28 2013 + trying '06 build. for take 2 on HSW there were mixed results for passed tests: 400.perlbench +0.25% 401.bzip2 -0.40% 429.mcf -1.67% 456.hmmer +0% 458.sjeng -1.65% 471.omnetpp -1.72% 483.xalancbmk -0.25% 410.bwaves+0% 433.milc -0.28% 444.namd -1.13% 450.soplex-1.01% 459.GemsFDTD -0.25% 470.lbm +1.44% 482.sphinx3 +0.61% Sergos On Tue, Oct 22, 2013 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! On Fri, Oct 18, 2013 at 05:45:15PM +0400, Sergey Ostanevich wrote: failed on 403 of '06: regclass.c: In function 'init_reg_sets': regclass.c:277:1: internal compiler error: tree check: expected ssa_name, have var_decl in copy_ssa_name_fn, at tree-ssanames.c:393 init_reg_sets () ^ That is because for -ftree-loop-if-convert-stores or masked loads/stores we need TODO_update_ssa_only_virtuals and loop versioning relies on no SSA updates being needed in the loop being versioned. Attached is updated patchset (the fix is in the second patch, the third one needed adjusting because of the fix and also apparently Andrew's header reshuffling requires now additional include in the third patch). Will bootstrap/regtest this after my current bootstrap/regtest finishes. Jakub
Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 3)
I use -march=core-avx2 -static -Ofast -flto -funroll-loops with all 3 patches applied to git version I mentioned the same is in progress for 'take 3' On Tue, Oct 22, 2013 at 5:26 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 22, 2013 at 05:16:29PM +0400, Sergey Ostanevich wrote: ouch.. html got me! applying to the same as before commit 0e3dfadd374c3045a926afa6d06af276cee2108d Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Oct 18 08:36:28 2013 + trying '06 build. for take 2 on HSW there were mixed results for passed tests: 400.perlbench +0.25% 401.bzip2 -0.40% 429.mcf -1.67% 456.hmmer +0% 458.sjeng -1.65% 471.omnetpp -1.72% 483.xalancbmk -0.25% 410.bwaves+0% 433.milc -0.28% 444.namd -1.13% 450.soplex-1.01% 459.GemsFDTD -0.25% 470.lbm +1.44% 482.sphinx3 +0.61% Thanks. If this is with AVX or AVX2, there can be multiple things. One is whether if-conversion in it's current form is really so harmful for generated code quality (we have various testcases where what we get with it is very bad, but perhaps there are still testcases where it improves non-vectorized loops), another one is when the masked loads/stores/gather is actually beneficial and whether we don't want some better tuning for it. The patchset as whole does both things, so determining what is what is hard. If only the first two patches are applied and not the third, then it should give a picture on whether it is a win to have if-conversion done in it's current form only for vectorized bbs. And, if all 3 patches are applied, but the condition in tree_if_conversion: if ((flag_tree_loop_vectorize || loop-force_vect) (flag_tree_loop_if_convert == -1 || any_mask_load_store) !version_loop_for_if_conversion (loop, version_outer_loop, any_mask_load_store)) goto cleanup; is changed into just if (any_mask_load_store !version_loop_for_if_conversion (loop, version_outer_loop, true)) goto cleanup; then if-conversion would happen as before, just in cases where previously we wouldn't if-convert because of conditional loads/stores, but now we could, those changes would be limited to vectorized loops only. Jakub
Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 3)
still fails on 403 et al. regrename.c: In function 'regrename_optimize': regrename.c:200:1: internal compiler error: Segmentation fault regrename_optimize () ^ 0x96698f crash_signal ../../gcc/toplev.c:335 0x9a2657 ssa_default_def(function*, tree_node*) ../../gcc/tree-dfa.c:310 0x9a2c68 get_or_create_ssa_default_def(function*, tree_node*) ../../gcc/tree-dfa.c:362 0x9d0773 get_reaching_def ../../gcc/tree-into-ssa.c:1188 0x9d0773 get_reaching_def ../../gcc/tree-into-ssa.c:1175 0x9d24f8 rewrite_update_phi_arguments ../../gcc/tree-into-ssa.c:2017 0x9d24f8 rewrite_update_dom_walker::before_dom_children(basic_block_def*) ../../gcc/tree-into-ssa.c:2133 0x9d24f8 rewrite_update_dom_walker::before_dom_children(basic_block_def*) ../../gcc/tree-into-ssa.c:2069 0xddcbea dom_walker::walk(basic_block_def*) ../../gcc/domwalk.c:176 0x9ceba2 rewrite_blocks ../../gcc/tree-into-ssa.c:2188 0x9d604e update_ssa(unsigned int) ../../gcc/tree-into-ssa.c:3311 0x9b729f version_loop_for_if_conversion ../../gcc/tree-if-conv.c:1962 0x9bad10 tree_if_conversion ../../gcc/tree-if-conv.c:2042 0x9bad10 main_tree_if_conversion ../../gcc/tree-if-conv.c:2097 0x9bad10 execute ../../gcc/tree-if-conv.c:2147 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. specmake: *** [regrename.o] Error 1 On Tue, Oct 22, 2013 at 6:08 PM, Sergey Ostanevich sergos@gmail.com wrote: I use -march=core-avx2 -static -Ofast -flto -funroll-loops with all 3 patches applied to git version I mentioned the same is in progress for 'take 3' On Tue, Oct 22, 2013 at 5:26 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 22, 2013 at 05:16:29PM +0400, Sergey Ostanevich wrote: ouch.. html got me! applying to the same as before commit 0e3dfadd374c3045a926afa6d06af276cee2108d Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Oct 18 08:36:28 2013 + trying '06 build. for take 2 on HSW there were mixed results for passed tests: 400.perlbench +0.25% 401.bzip2 -0.40% 429.mcf -1.67% 456.hmmer +0% 458.sjeng -1.65% 471.omnetpp -1.72% 483.xalancbmk -0.25% 410.bwaves+0% 433.milc -0.28% 444.namd -1.13% 450.soplex-1.01% 459.GemsFDTD -0.25% 470.lbm +1.44% 482.sphinx3 +0.61% Thanks. If this is with AVX or AVX2, there can be multiple things. One is whether if-conversion in it's current form is really so harmful for generated code quality (we have various testcases where what we get with it is very bad, but perhaps there are still testcases where it improves non-vectorized loops), another one is when the masked loads/stores/gather is actually beneficial and whether we don't want some better tuning for it. The patchset as whole does both things, so determining what is what is hard. If only the first two patches are applied and not the third, then it should give a picture on whether it is a win to have if-conversion done in it's current form only for vectorized bbs. And, if all 3 patches are applied, but the condition in tree_if_conversion: if ((flag_tree_loop_vectorize || loop-force_vect) (flag_tree_loop_if_convert == -1 || any_mask_load_store) !version_loop_for_if_conversion (loop, version_outer_loop, any_mask_load_store)) goto cleanup; is changed into just if (any_mask_load_store !version_loop_for_if_conversion (loop, version_outer_loop, true)) goto cleanup; then if-conversion would happen as before, just in cases where previously we wouldn't if-convert because of conditional loads/stores, but now we could, those changes would be limited to vectorized loops only. Jakub
Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 2)
failed on 403 of '06: regclass.c: In function 'init_reg_sets': regclass.c:277:1: internal compiler error: tree check: expected ssa_name, have var_decl in copy_ssa_name_fn, at tree-ssanames.c:393 init_reg_sets () ^ 0xb74124 tree_check_failed(tree_node const*, char const*, int, char const*, ...) ../../gcc/tree.c:9301 0xb0ade2 tree_check ../../gcc/tree.h:2674 0xb0ade2 copy_ssa_name_fn(function*, tree_node*, gimple_statement_d*) ../../gcc/tree-ssanames.c:393 0xb0ae6c duplicate_ssa_name_fn(function*, tree_node*, gimple_statement_d*) ../../gcc/tree-ssanames.c:452 0x9ce82d duplicate_ssa_name ../../gcc/tree-ssanames.h:121 0x9ce82d create_new_def_for(tree_node*, gimple_statement_d*, tree_node**) ../../gcc/tree-into-ssa.c:2836 0x986a4e gimple_duplicate_bb ../../gcc/tree-cfg.c:5496 0x6425c9 duplicate_block(basic_block_def*, edge_def*, basic_block_def*) ../../gcc/cfghooks.c:1040 0x642bfe copy_bbs(basic_block_def**, unsigned int, basic_block_def**, edge_def**, unsigned int, edge_def**, loop*, basic_block_def*, bool) ../../gcc/cfghooks.c:1320 0x649d69 duplicate_loop_to_header_edge(loop*, edge_def*, unsigned int, simple_bitmap_def*, edge_def*, vecedge_def*, va_heap, vl_ptr*, int) ../../gcc/cfgloopmanip.c:1314 0xa83d09 gimple_duplicate_loop_to_header_edge(loop*, edge_def*, unsigned int, simple_bitmap_def*, edge_def*, vecedge_def*, va_heap, vl_ptr*, int) ../../gcc/tree-ssa-loop-manip.c:761 0x64b760 loop_version(loop*, void*, basic_block_def**, unsigned int, unsigned int, unsigned int, bool) ../../gcc/cfgloopmanip.c:1706 0x9b71ae version_loop_for_if_conversion ../../gcc/tree-if-conv.c:1943 0x9baa3e tree_if_conversion ../../gcc/tree-if-conv.c:2069 0x9baa3e main_tree_if_conversion ../../gcc/tree-if-conv.c:2089 0x9baa3e execute ../../gcc/tree-if-conv.c:2139 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. specmake: *** [regclass.o] Error 1 On Thu, Oct 17, 2013 at 8:55 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! On Tue, Oct 15, 2013 at 02:32:25PM +0200, Jakub Jelinek wrote: Especially on i?86/x86_64 if-conversion pass seems to be often a pessimization, but the vectorization relies on it and without it we can't vectorize a lot of the loops. Here is an updated patchset: - first patch is an updated version of the http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01058.html patch - second patch is one variant (easier to write and perhaps maintain, but perhaps with higher compile time and memory requirements) of the follow up to support if-converted inner loop in outer loop vectorization; I have started also working on variant where just the vectorizer groks outer loop with LOOP_VECTORIZED call and two inner loops, but am only about half way through the needed changes and would prefer to wait how these patches actually work out together - third patch is an updated version of the http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html patch, this time without if-unconversion pass, but instead performed only in the versioned loops for vectorization purposes All 3 patches bootstrapped/regtested together on x86_64-linux and i686-linux, the first patch and first+second patch additionally tested alone with make -C gcc -j4 -k check RUNTESTFLAGS=vect.exp make -C gcc check-gfortran RUNTESTFLAGS=dg.exp='assumed_rank* pr32533*' execute.exp='forall* intrinsic_mmloc* pr54767.f90' make -C x86*/libgomp check RUNTESTFLAGS=fortran.exp=omp_parse* (the latter two are set of tests that failed at some point during the development of the patchset). If anyone has spare cycles to e.g. SPEC2k{,6} test it on his favourite platforms (both for resulting code performance and for compile time and/or compile memory usage), it would be greatly appreciated. The only targets with masked load/store support are right now i?86/x86_64 with -mavx and higher (and for masked gather load support only -mavx2 and higher), so the last patch will probably be only beneficial to those for the time being. Jakub
Re: [RFC] By default if-convert only basic blocks that will be vectorized
Jakub, Richard, I believe this patch is a good opportunity to improve the vectorization capabilities. I have the following question related to it: whether we plan to treat the #pragma omp simd as a directive to vectorize the underlying loop, hence dropping any assessment regarding profitablity? Regards, Sergos On Tue, Oct 15, 2013 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! Especially on i?86/x86_64 if-conversion pass seems to be often a pessimization, but the vectorization relies on it and without it we can't vectorize a lot of the loops. Here is a prototype of a patch that will by default (unless explicit -ftree-loop-if-convert) only if-convert loops internally for vectorization, so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized basic blocks, but will not appear if vectorization fails, or in the scalar loop if vectorization is conditional, or in the prologue or epilogue loops around the vectorized loop. Instead of moving the ifcvt pass inside of the vectorizer, this patch during ifcvt performs loop versioning depending on a special internal call, only if the internal call returns true we go to the if-converted original loop, otherwise the non-if-converted copy of the original loop is performed. And the vectorizer is taught to fold this internal call into true resp. false depending on if the loop was vectorized or not, and vectorizer loop versioning, peeling for alignment and for bound are adjusted to also copy from the non-if-converted loop rather than if-converted one. Besides fixing the various PRs where if-conversion pessimizes code I'd like to also move forward with this with conditional loads and stores, http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html where the if-unconversion approach looked like a failure. This patch doesn't yet handle if-converted inner loop in outer loop vectorization, something on my todo list (so several vect-cond-*.c tests FAIL because they are no longer vectorized) plus I had to change two SLP vectorization tests that silently relied on loop if-conversion being performed to actually optimize the basic block (if the same thing didn't appear in a loop, it wouldn't be optimized at all). On the newly added testcase on x86_64, there are before this patch 18 scalar conditional moves, with the patch just 2 (both in the checking routine). Comments? --- gcc/internal-fn.def.jj 2013-10-11 14:32:57.079909782 +0200 +++ gcc/internal-fn.def 2013-10-11 17:23:58.705526840 +0200 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) --- gcc/tree-vect-loop-manip.c.jj 2013-09-30 22:13:47.0 +0200 +++ gcc/tree-vect-loop-manip.c 2013-10-15 12:57:54.854970913 +0200 @@ -374,24 +374,31 @@ LOOP- loop1 static void slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop, + struct loop *scalar_loop, bool is_new_loop, basic_block *new_exit_bb) { - gimple orig_phi, new_phi; + gimple orig_phi, new_phi, scalar_phi = NULL; gimple update_phi, update_phi2; tree guard_arg, loop_arg; basic_block new_merge_bb = guard_edge-dest; edge e = EDGE_SUCC (new_merge_bb, 0); basic_block update_bb = e-dest; basic_block orig_bb = loop-header; - edge new_exit_e; + edge new_exit_e, scalar_e = NULL; tree current_new_name; - gimple_stmt_iterator gsi_orig, gsi_update; + gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none (); /* Create new bb between loop and new_merge_bb. */ *new_exit_bb = split_edge (single_exit (loop)); new_exit_e = EDGE_SUCC (*new_exit_bb, 0); + if (scalar_loop != NULL !is_new_loop) +{ + gsi_scalar = gsi_start_phis (scalar_loop-header); + scalar_e = EDGE_SUCC (scalar_loop-latch, 0); +} + for (gsi_orig = gsi_start_phis (orig_bb), gsi_update = gsi_start_phis (update_bb); !gsi_end_p (gsi_orig) !gsi_end_p (gsi_update); @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge tree new_res; orig_phi = gsi_stmt (gsi_orig); update_phi = gsi_stmt (gsi_update); + if (scalar_e != NULL) + { + scalar_phi = gsi_stmt (gsi_scalar); + gsi_next (gsi_scalar); + } /** 1. Handle new-merge-point phis **/ @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge current_new_name = loop_arg; else { - current_new_name = get_current_def (loop_arg); + if (scalar_e) + { + current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e); + current_new_name =
Re: [i386, patch, RFC] HLE support in GCC
On Tue, Apr 17, 2012 at 6:41 PM, Andi Kleen a...@firstfloor.org wrote: I also have a question regarding AS compatibility. In case one built GCC using AS with support of HLE then using this GCC on a machine with old AS will cause fail because of usupported prefix. Can we support it I don't think that's a supported use case for gcc. It also doesn't work with .cfi* intrinsics and some other things. Well, it's hard to speculate here. What I rely upon is the fact that GCC I have on my Fedora is from gcc-4.6.0-10.fc15.x86_64.rpm and the latter contains no AS within. There should be dependencies so that AS will be updated alongside with GCC? Otherwise upon update to new GCC I can see fails in my project build. compile time rather configure time? The only way to do that would be to always generate .byte, but the people who read the assembler output would hate you for it. Totally agree, it's the best way to hurt your karma. :) Although detection of AS capabilities is available at compile time and can be used to evade compfail - at least in case assembler is involved (no -S provided) Sergos
Re: [i386, patch, RFC] HLE support in GCC
Any other inputs? I would suggest to use snprintf b/gcc/config/i386/i386-c.c to avoid possible buffer overrun. I also have a question regarding AS compatibility. In case one built GCC using AS with support of HLE then using this GCC on a machine with old AS will cause fail because of usupported prefix. Can we support it compile time rather configure time? regards, Sergos
Re: [PATCH] PR target/50038 fix: redundant zero extensions removal
Eric, I will follow up while Ilya is on vacation. I can see only one patch along the dicussion so I will use it, making changes to follow phase renaming and documentation? I am covered by FSF agreement too, on the same Intel's list as Ilya. regards, Sergos On Fri, Nov 11, 2011 at 2:12 PM, Eric Botcazou ebotca...@adacore.com wrote: I have already signed copyright agreement with the FSF. Will I need the separate one for this particular commit? No, if your contributions are already covered by a copyright agreement with the FSF, nothing more needs to be done. -- Eric Botcazou
Re: [PATCH i386] PR47698 no CMOV for volatile mem
On Wed, Nov 2, 2011 at 3:42 PM, Richard Guenther rguent...@suse.de wrote: On Sat, 29 Oct 2011, Sergey Ostanevich wrote: On Fri, Oct 28, 2011 at 7:25 PM, Sergey Ostanevich sergos@gmail.com wrote: On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther rguent...@suse.de wrote: On Fri, 28 Oct 2011, Sergey Ostanevich wrote: On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de wrote: On Thu, 27 Oct 2011, Uros Bizjak wrote: Hello! Here's a patch for PR47698, which is about CMOV should not be generated for memory address marked as volatile. Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu. PR rtl-optimization/47698 * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation for volatile mem PR rtl-optimization/47698 * gcc.target/i386/47698.c: New test Please use punctuation marks and correct capitalization in ChangeLog entries. OTOH, do we want to fix this per-target, or in the middle-end? The middle-end pattern documentation does not say operands 2 and 3 are not evaluated if they do not end up being stored, so a middle-end fix is more appropriate. Richard. I have two observations: - the code for CMOV is under #ifdef in the mddle-end, which is explicitly marked as have to be removed (ifcvt.c:1446) - I have no clear evidence all platforms that support conditional move have the same semantics that lead to the PR I think the best way to address both concerns is to implement code that relies on а new hookup volatile-safe CMOV that is false by default. I suppose it's never safe for all architectures that support memory operands in the source operand. Richard. ok, at least there should be no big problem of missing optimization around volatile memory. apparently the problem is here: ifcvt.c:2539 there is a test for side effects of source (which is 'a' in this case) 2539 if (! noce_operand_ok (a) || ! noce_operand_ok (b)) (gdb) p debug_rtx(a) (mem/v/c/i:DI (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio) [2 mmio+0 S8 A64]) but inside noce_operand_ok() there is a wrong order of tests: 2332 if (MEM_P (op)) 2333 return ! side_effects_p (XEXP (op, 0)); 2334 2335 if (side_effects_p (op)) 2336 return FALSE; 2337 where XEXP removes the memory reference leaving just symbol reference, that has no volatile attribute #0 side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152 (gdb) p debug_rtx(x) (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio) Is the following fix is Ok? I'm testing it so far. Sergos diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 784e2e8..3b05c2a 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op) { /* We special-case memories, so handle any of them with no address side effects. */ - if (MEM_P (op)) - return ! side_effects_p (XEXP (op, 0)); - if (side_effects_p (op)) return FALSE; + if (MEM_P (op)) + return ! side_effects_p (XEXP (op, 0)); + return ! may_trap_p (op); } diff --git a/gcc/testsuite/gcc.target/i386/47698.c b/gcc/testsuite/gcc.target/i386/47698.c new file mode 100644 index 000..2c75109 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/47698.c @@ -0,0 +1,10 @@ +/* { dg-options -Os } */ +/* { dg-final { scan-assembler-not cmov } } */ + +extern volatile unsigned long mmio; +unsigned long foo(int cond) +{ + if (cond) + return mmio; + return 0; +} bootstrapped and passed make check successfully on x86_64-unknown-linux-gnu Ok. Thanks, Richard. Could someone please commit it for me? Sergos /gcc 2011-11-06 Sergey Ostanevich sergos@gmail.com PR rtl-optimization/47698 * ifconv.c (noce_operand_ok): prevent CMOV generation for volatile mem /testsuites 2011-11-06 Sergey Ostanevich sergos@gmail.com PR rtl-optimization/47698 * gcc.target/i386/47698.c: New test 47698.patch Description: Binary data
Re: [PATCH i386] PR47698 no CMOV for volatile mem
On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de wrote: On Thu, 27 Oct 2011, Uros Bizjak wrote: Hello! Here's a patch for PR47698, which is about CMOV should not be generated for memory address marked as volatile. Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu. PR rtl-optimization/47698 * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation for volatile mem PR rtl-optimization/47698 * gcc.target/i386/47698.c: New test Please use punctuation marks and correct capitalization in ChangeLog entries. OTOH, do we want to fix this per-target, or in the middle-end? The middle-end pattern documentation does not say operands 2 and 3 are not evaluated if they do not end up being stored, so a middle-end fix is more appropriate. Richard. I have two observations: - the code for CMOV is under #ifdef in the mddle-end, which is explicitly marked as have to be removed (ifcvt.c:1446) - I have no clear evidence all platforms that support conditional move have the same semantics that lead to the PR I think the best way to address both concerns is to implement code that relies on а new hookup volatile-safe CMOV that is false by default. regards, Sergos
Re: [PATCH i386] PR47698 no CMOV for volatile mem
On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther rguent...@suse.de wrote: On Fri, 28 Oct 2011, Sergey Ostanevich wrote: On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de wrote: On Thu, 27 Oct 2011, Uros Bizjak wrote: Hello! Here's a patch for PR47698, which is about CMOV should not be generated for memory address marked as volatile. Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu. PR rtl-optimization/47698 * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation for volatile mem PR rtl-optimization/47698 * gcc.target/i386/47698.c: New test Please use punctuation marks and correct capitalization in ChangeLog entries. OTOH, do we want to fix this per-target, or in the middle-end? The middle-end pattern documentation does not say operands 2 and 3 are not evaluated if they do not end up being stored, so a middle-end fix is more appropriate. Richard. I have two observations: - the code for CMOV is under #ifdef in the mddle-end, which is explicitly marked as have to be removed (ifcvt.c:1446) - I have no clear evidence all platforms that support conditional move have the same semantics that lead to the PR I think the best way to address both concerns is to implement code that relies on а new hookup volatile-safe CMOV that is false by default. I suppose it's never safe for all architectures that support memory operands in the source operand. Richard. ok, at least there should be no big problem of missing optimization around volatile memory. apparently the problem is here: ifcvt.c:2539 there is a test for side effects of source (which is 'a' in this case) 2539 if (! noce_operand_ok (a) || ! noce_operand_ok (b)) (gdb) p debug_rtx(a) (mem/v/c/i:DI (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio) [2 mmio+0 S8 A64]) but inside noce_operand_ok() there is a wrong order of tests: 2332 if (MEM_P (op)) 2333return ! side_effects_p (XEXP (op, 0)); 2334 2335 if (side_effects_p (op)) 2336return FALSE; 2337 where XEXP removes the memory reference leaving just symbol reference, that has no volatile attribute #0 side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152 (gdb) p debug_rtx(x) (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio) Is the following fix is Ok? I'm testing it so far. Sergos diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 784e2e8..3b05c2a 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op) { /* We special-case memories, so handle any of them with no address side effects. */ - if (MEM_P (op)) -return ! side_effects_p (XEXP (op, 0)); - if (side_effects_p (op)) return FALSE; + if (MEM_P (op)) +return ! side_effects_p (XEXP (op, 0)); + return ! may_trap_p (op); } diff --git a/gcc/testsuite/gcc.target/i386/47698.c b/gcc/testsuite/gcc.target/i386/47698.c new file mode 100644 index 000..2c75109 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/47698.c @@ -0,0 +1,10 @@ +/* { dg-options -Os } */ +/* { dg-final { scan-assembler-not cmov } } */ + +extern volatile unsigned long mmio; +unsigned long foo(int cond) +{ + if (cond) + return mmio; +return 0; +} 47698.patch Description: Binary data
Re: [PATCH i386] PR47698 no CMOV for volatile mem
On Fri, Oct 28, 2011 at 7:25 PM, Sergey Ostanevich sergos@gmail.com wrote: On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther rguent...@suse.de wrote: On Fri, 28 Oct 2011, Sergey Ostanevich wrote: On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de wrote: On Thu, 27 Oct 2011, Uros Bizjak wrote: Hello! Here's a patch for PR47698, which is about CMOV should not be generated for memory address marked as volatile. Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu. PR rtl-optimization/47698 * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation for volatile mem PR rtl-optimization/47698 * gcc.target/i386/47698.c: New test Please use punctuation marks and correct capitalization in ChangeLog entries. OTOH, do we want to fix this per-target, or in the middle-end? The middle-end pattern documentation does not say operands 2 and 3 are not evaluated if they do not end up being stored, so a middle-end fix is more appropriate. Richard. I have two observations: - the code for CMOV is under #ifdef in the mddle-end, which is explicitly marked as have to be removed (ifcvt.c:1446) - I have no clear evidence all platforms that support conditional move have the same semantics that lead to the PR I think the best way to address both concerns is to implement code that relies on а new hookup volatile-safe CMOV that is false by default. I suppose it's never safe for all architectures that support memory operands in the source operand. Richard. ok, at least there should be no big problem of missing optimization around volatile memory. apparently the problem is here: ifcvt.c:2539 there is a test for side effects of source (which is 'a' in this case) 2539 if (! noce_operand_ok (a) || ! noce_operand_ok (b)) (gdb) p debug_rtx(a) (mem/v/c/i:DI (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio) [2 mmio+0 S8 A64]) but inside noce_operand_ok() there is a wrong order of tests: 2332 if (MEM_P (op)) 2333 return ! side_effects_p (XEXP (op, 0)); 2334 2335 if (side_effects_p (op)) 2336 return FALSE; 2337 where XEXP removes the memory reference leaving just symbol reference, that has no volatile attribute #0 side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152 (gdb) p debug_rtx(x) (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio) Is the following fix is Ok? I'm testing it so far. Sergos diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 784e2e8..3b05c2a 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op) { /* We special-case memories, so handle any of them with no address side effects. */ - if (MEM_P (op)) - return ! side_effects_p (XEXP (op, 0)); - if (side_effects_p (op)) return FALSE; + if (MEM_P (op)) + return ! side_effects_p (XEXP (op, 0)); + return ! may_trap_p (op); } diff --git a/gcc/testsuite/gcc.target/i386/47698.c b/gcc/testsuite/gcc.target/i386/47698.c new file mode 100644 index 000..2c75109 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/47698.c @@ -0,0 +1,10 @@ +/* { dg-options -Os } */ +/* { dg-final { scan-assembler-not cmov } } */ + +extern volatile unsigned long mmio; +unsigned long foo(int cond) +{ + if (cond) + return mmio; + return 0; +} bootstrapped and passed make check successfully on x86_64-unknown-linux-gnu Sergos
[PATCH i386] PR47698 no CMOV for volatile mem
Hi! Here's a patch for PR47698, which is about CMOV should not be generated for memory address marked as volatile. Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu. Is it Ok? regards, Sergos /gcc 2011-10-27 Sergey Ostanevich PR rtl-optimization/47698 * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation for volatile mem /testsuite 2011-10-27 Sergey Ostanevich PR rtl-optimization/47698 * gcc.target/i386/47698.c: New test diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c6e09ae..afe5de3 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18312,6 +18312,12 @@ ix86_expand_int_movcc (rtx operands[]) rtx op0 = XEXP (operands[1], 0); rtx op1 = XEXP (operands[1], 1); + /* MOVCC semantics implies that source is always read which is wrong + for devices I/O that are defined using volatile in C. PR47698 */ + + if (MEM_P (operands[2]) MEM_VOLATILE_P (operands[2])) +return false; + start_sequence (); compare_op = ix86_expand_compare (code, op0, op1); compare_seq = get_insns (); diff --git a/gcc/testsuite/gcc.target/i386/47698.c b/gcc/testsuite/gcc.target/i386/47698.c new file mode 100644 index 000..2c75109 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/47698.c @@ -0,0 +1,10 @@ +/* { dg-options -Os } */ +/* { dg-final { scan-assembler-not cmov } } */ + +extern volatile unsigned long mmio; +unsigned long foo(int cond) +{ + if (cond) + return mmio; +return 0; +} 47698.patch Description: Binary data
Re: [PATCH PR50572] Tune loop alignment for Atom
Please provide a patch which can be applied. Cut/paste doesn't create a working patch. Please attach it. -- H.J. Will that works? Sergos. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6c73404..e21cf86 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-10-20 Sergey Ostanevich sergos@gmail.com + + * config/i386/i386.c (processor_target_table): Change Atom + align_loops_max_skip to 15. + 2011-10-17 Michael Spertus mike_sper...@symantec.com * gcc/c-family/c-common.c (c_common_reswords): Add __bases, diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {bdver1_cost, 32, 24, 32, 7, 32}, {bdver2_cost, 32, 24, 32, 7, 32}, {btver1_cost, 32, 24, 32, 7, 32}, - {atom_cost, 16, 7, 16, 7, 16} + {atom_cost, 16, 15, 16, 7, 16} }; static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
[PATCH PR50572] Tune loop alignment for Atom
Hi! Here is a patch to address PR50572. Successfully bootstrapped and passed make check on x86_64-linux. regards, Sergos 2011-10-19 Sergey Ostanevich sergos@gmail.com * gcc/config/i386/i386.c (ix86_option_override_internal): use loop align by 16 bytes for Atom platform diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..7a93144 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3497,6 +3497,8 @@ ix86_option_override_internal (bool main_args_p) { align_loops = processor_target_table[ix86_tune].align_loop; align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip; + if (ix86_tune == PROCESSOR_ATOM optimize 1) +align_loops_max_skip = 15; } if (align_jumps == 0) {
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 4:14 PM, Uros Bizjak ubiz...@gmail.com wrote: You can just change the default in processor_target_table. Uros. Will it be applicable during optimizations for size? It will hurt, although not much (see PR). New patch is below. Ok for trunk as obvious? Sergos 2011-10-19 Sergey Ostanevich sergos@gmail.com * gcc/config/i386/i386.c (ix86_option_override_internal): use loop align by 16 bytes for Atom platform diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {bdver1_cost, 32, 24, 32, 7, 32}, {bdver2_cost, 32, 24, 32, 7, 32}, {btver1_cost, 32, 24, 32, 7, 32}, - {atom_cost, 16, 7, 16, 7, 16} + {atom_cost, 16, 15, 16, 7, 16} };
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 4:46 PM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich sergos@gmail.com wrote: You can just change the default in processor_target_table. Uros. Will it be applicable during optimizations for size? It will hurt, although not much (see PR). Looking at the code, I'd say that we don't handle -Os in different way. New patch is below. Ok for trunk as obvious? Sergos 2011-10-19 Sergey Ostanevich sergos@gmail.com * gcc/config/i386/i386.c (ix86_option_override_internal): use loop align by 16 bytes for Atom platform Please update ChangeLog, like: * gcc/config/i386/i386.c (processor_target_table): Change Atom align_loop_max_skip to 15. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {bdver1_cost, 32, 24, 32, 7, 32}, {bdver2_cost, 32, 24, 32, 7, 32}, {btver1_cost, 32, 24, 32, 7, 32}, - {atom_cost, 16, 7, 16, 7, 16} + {atom_cost, 16, 15, 16, 7, 16} }; OK. Thanks, Uros. Thanks for comments! I double checked: for -Os there's no .p2align appeared. For -O2 I see .p2align 4,,15 instead of .p2align 4,,7, as expected. Can someone commit it please? Regards, Sergos 2011-10-18 Sergey Ostanevich sergos@gmail.com * gcc/config/i386/i386.c (processor_target_table): Change Atom align_loops_max_skip to 15. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {bdver1_cost, 32, 24, 32, 7, 32}, {bdver2_cost, 32, 24, 32, 7, 32}, {btver1_cost, 32, 24, 32, 7, 32}, - {atom_cost, 16, 7, 16, 7, 16} + {atom_cost, 16, 15, 16, 7, 16} }; static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
Re: RFA: patch to solve PR48455 and improve code size for -Os
16.08.2011, в 23:07, Vladimir Makarov vmaka...@redhat.com написал(а): After a lot of thinking and some experiments, I did not find a better solution to considerably (like on 0.2% - 0.3% on ARM SPEC2000) improve code size than use of non-regional RA for -Os. did you consider change of compile time because of single region RA? So the patch makes one region allocation is default for -Os. I don't use function optimize_function_for_size_p because I'd like to keep possibility still set up regions independedly of used -O option. This final patch removes ARM code size regression reported on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48455. The patch was successfully bootstrapped on x86-64 and ppc64 (actually the generated code is not changed for -O2). Is it ok to commit? 2011-08-16 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/48455 * doc/invoke.texi (-fira-region): Document default values. * flags-types.h (enum ira_region): Add new value IRA_REGION_AUTODETECT. * common.opt (fira-region): Set up initial value to IRA_REGION_AUTODETECT. * toplev.c (process_options): Set up flag_ira_region depending on -O options. * ira.c (ira.c): Remove optimize guard for ira_build. code-size.patch