An options exception

2014-04-14 Thread Sergey Ostanevich
Hi!

I didn't find any precedent of the following before, so this can be a start for
discussion.

Options are known to be different between compilers and achieve options
compatibility is somewhat complex because of this. GCC can be taken as a
reference point, but since other comilers still can (and expected to) have their
own proprietary options there potentially can be a situation that GCC redefines
the same option that was previously (in time) introuced by other compiler.

To evade such a `derail` can we elaborate a rule about new options introduction
and make an escape prefix, such as -q. No options under this escape should be
introduced for GCC. Then other comilers will never appear in a situation their
options reused by GCC and cause a conflict.

Sergos


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-26 Thread Sergey Ostanevich
Done.

Sergos

On Tue, Nov 26, 2013 at 1:46 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 26, 2013 at 10:43:32AM +0100, Richard Biener wrote:
 2013-11-25  sergey.y.ostanevich  sergos@gmail.com

 Please use your name with capital letters and spaces rather than
 all lowercase plus dots.

 Jakub
2013-11-25  Sergey Ostanevich  sergos@gmail.com

* c.opt: Introduced a new openmp-simd warning.
* lang.opt: Ditto.
* common.opt: Introduced a new option -fsimd-cost-model.
* doc/invoke.texi: Introduced a new openmp-simd warning and
a new -fsimd-cost-model option.
* tree-vectorizer.h (unlimited_cost_model): Interface updated
to rely on the particular loop info.
* tree-vect-data-refs.c (vect_peeling_hash_insert): Ditto.
(vect_peeling_hash_choose_best_peeling): Ditto.
(vect_enhance_data_refs_alignment): Ditto.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Ditto.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto
plus added openmp-simd warining.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ac67885..2e9a3df 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -596,6 +596,10 @@ Wold-style-definition
 C ObjC Var(warn_old_style_definition) Warning
 Warn if an old-style parameter definition is used
 
+Wopenmp-simd
+C C++ Var(warn_openmp_simd) Warning LangEnabledBy(C C++,Wall)
+Warn if a simd directive is overridden by the vectorizer cost model
+
 Woverlength-strings
 C ObjC C++ ObjC++ Var(warn_overlength_strings) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wpedantic)
 Warn if a string is longer than the maximum portable length specified by the 
standard
diff --git a/gcc/common.opt b/gcc/common.opt
index a7af636..9ece683 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2300,6 +2300,10 @@ fvect-cost-model=
 Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) 
Init(VECT_COST_MODEL_DEFAULT)
 Specifies the cost model for vectorization
  
+fsimd-cost-model=
+Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) 
Init(VECT_COST_MODEL_UNLIMITED)
+Specifies the vectorization cost model for code marked with a simd directive
+
 Enum
 Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
vectorizer cost model %qs)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 466eee0..2ff413c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -256,7 +256,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wlogical-op -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmissing-braces  -Wmissing-field-initializers 
@gol
 -Wmissing-include-dirs @gol
--Wno-multichar  -Wnonnull  -Wno-overflow @gol
+-Wno-multichar  -Wnonnull  -Wno-overflow -Wopenmp-simd @gol
 -Woverlength-strings  -Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
 -Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
 -Wpointer-arith  -Wno-pointer-to-int-cast @gol
@@ -3321,6 +3321,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect 
Options}.
 -Wmaybe-uninitialized @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
 -Wnonnull  @gol
+-Wopenmp-simd @gol
 -Wparentheses  @gol
 -Wpointer-sign  @gol
 -Wreorder   @gol
@@ -4815,6 +4816,12 @@ attribute.
 @opindex Woverflow
 Do not warn about compile-time overflow in constant expressions.
 
+@item -Wopenmp-simd
+@opindex Wopenm-simd
+Warn if the vectorizer cost model overrides the OpenMP or the Cilk Plus
+simd directive set by user.  The @option{-fsimd-cost-model=unlimited} can
+be used to relax the cost model.
+
 @item -Woverride-init @r{(C and Objective-C only)}
 @opindex Woverride-init
 @opindex Wno-override-init
@@ -8071,6 +8078,14 @@ is equal to the @code{dynamic} model.
 The default cost model depends on other optimization flags and is
 either @code{dynamic} or @code{cheap}.
 
+@item -fsimd-cost-model=@var{model}
+@opindex fsimd-cost-model
+Alter the cost model used for vectorization of loops marked with the OpenMP
+or Cilk Plus simd directive.  The @var{model} argument should be one of
+@code{unlimited}, @code{dynamic}, @code{cheap}.  All values of @var{model}
+have the same meaning as described in @option{fvect-cost-model} and by
+default a cost model defined with @option{fvect-cost-model} is used.
+
 @item -ftree-vrp
 @opindex ftree-vrp
 Perform Value Range Propagation on trees.  This is similar to the
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 5e09cbd..0d328c8 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -257,6 +257,10 @@ Wintrinsics-std
 Fortran Warning
 Warn on intrinsics not part of the selected standard
 
+Wopenmp-simd
+Fortran
+; Documented in C
+
 Wreal-q-constant
 Fortran Warning
 Warn about real-literal-constants with 'q' exponent-letter
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8261645..d3e6e99 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1096,7 +1096,8

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-25 Thread Sergey Ostanevich
Updated patch with spaces, etc according to check_GNU_style.sh

Put guard as per Tobias' request.

Is it Ok?



On Thu, Nov 21, 2013 at 6:18 PM, Sergey Ostanevich sergos@gmail.com wrote:
 Tobias,


 When I understand the patch correctly, the warning is shown in two cases:
 a) When the loop could be vectorized but the cost model prevented it
 b) When the loop couldn't be vectorized because of other reasons (e.g. not
 vectorizable because of conditional loop exits, incomplete vectorization
 support by the compiler etc.)

 Do I correctly understand the warning? I am asking because the *opt and
 *texi wording suggests that only (a) is the case. - I cannot test as the
 patch cannot be applied with heavy editing (removal of additional line
 breaks, taking care of tabs converted into spaces).

 I believe it's only for a) case, since warning stays along with the cost
 model report that says only about relative scalar and vector costs of
 iteration. The case of exits and vectorization capabilities is handled 
 earlier,
 since we have some vector code here.

 Will try to attach the patch instead of copy-paste here.


 Regarding the warning, I think it sounds a bit colloquial and as if the
 location information is not available. What do you think of loop with simd
 directive not vectorized or concise not fully correct: simd loop not
 vectorized?

 took one of yours.


 Additionally, shouldn't that be guarded by if (warn_openmp_simd ?
 Otherwise the flag status isn't used at all in the whole patch.

 This is strange to me, since it worked as I pass the OPT_Wopenmp_simd
 to the warning_at (). It does:
show warinig with -Wopenmp-simd
doesn't show warning with -Wall -Wno-openmp-simd


 +Wopenmp-simd
 +C C++ Var(warn_openmp_simd) Warning EnabledBy(Wall)
 +Warn about simd directive is overridden by vectorizer cost model


 Wording wise, I'd prefer something like:
 Warn if an simd directive is overridden by the vectorizer cost model

 (Or is it a simd? Where are the native speakers when one needs them?)

 damn, right! I believe 'a' since simd starts with consonant.


 However, in light of my question above, shouldn't it be Warn if a loop with
 simd directive is not vectorized?



 +fsimd-cost-model=
 +Common Joined RejectNegative Enum(vect_cost_model)
 Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED)
 +Specifies the vectorization cost model for code marked with simd
 directive


 I think an article is lacking before simd.

 done.



 +@item -Wopenmp-simd
 +@opindex Wopenm-simd
 +Warn if vectorizer cost model overrides simd directive from user.


 I think that can be expanded a bit. One could also mention OpenMP/Cilk Plus
 explicitly. Maybe like:  Warn if the vectorizer cost model overrides the
 OpenMP and Cilk Plus simd directives of the user.


 done.

 Or if my reading above is correct, how about something like: Warn if a loop
 with OpenMP or Cilk Plus simd directive is not vectorized. If only the cost
 model prevented the vectorization, the @option{-fsimd-cost-model} option can
 be used to force the vectorization.

 Which brings me to my next point: -fvect-cost-model= is not documented. I
 think some words would be helpful, especially about the valid arguments, the
 default and how it interacts with -fvect-cost-model=.

 done.



 --- a/gcc/fortran/lang.opt
 +++ b/gcc/fortran/lang.opt

 +Wopenmp-simd
 +Fortran Warning
 +; Documented in C

 (Warning is also not needed as it is taken from c-family/*opt, but it
 shouldn't harm either.)

 done.

 Sergos

 * common.opt: Added new option -fsimd-cost-model.
 * tree-vectorizer.h (unlimited_cost_model): Interface update
 to rely on particular loop info.
 * tree-vect-data-refs.c (vect_peeling_hash_insert): Update to
 unlimited_cost_model call according to new interface.
 (vect_peeling_hash_choose_best_peeling): Ditto.
 (vect_enhance_data_refs_alignment): Ditto.
 * tree-vect-slp.c: Ditto.
 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto,
 plus issue a warning in case cost model overrides users' directive.
 * c.opt: add openmp-simd warning.
 * lang.opt: Ditto.
 * doc/invoke.texi: Added new openmp-simd warning.


patch6
Description: Binary data


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-21 Thread Sergey Ostanevich
Tobias,


 When I understand the patch correctly, the warning is shown in two cases:
 a) When the loop could be vectorized but the cost model prevented it
 b) When the loop couldn't be vectorized because of other reasons (e.g. not
 vectorizable because of conditional loop exits, incomplete vectorization
 support by the compiler etc.)

 Do I correctly understand the warning? I am asking because the *opt and
 *texi wording suggests that only (a) is the case. - I cannot test as the
 patch cannot be applied with heavy editing (removal of additional line
 breaks, taking care of tabs converted into spaces).

I believe it's only for a) case, since warning stays along with the cost
model report that says only about relative scalar and vector costs of
iteration. The case of exits and vectorization capabilities is handled earlier,
since we have some vector code here.

Will try to attach the patch instead of copy-paste here.


 Regarding the warning, I think it sounds a bit colloquial and as if the
 location information is not available. What do you think of loop with simd
 directive not vectorized or concise not fully correct: simd loop not
 vectorized?

took one of yours.


 Additionally, shouldn't that be guarded by if (warn_openmp_simd ?
 Otherwise the flag status isn't used at all in the whole patch.

This is strange to me, since it worked as I pass the OPT_Wopenmp_simd
to the warning_at (). It does:
   show warinig with -Wopenmp-simd
   doesn't show warning with -Wall -Wno-openmp-simd


 +Wopenmp-simd
 +C C++ Var(warn_openmp_simd) Warning EnabledBy(Wall)
 +Warn about simd directive is overridden by vectorizer cost model


 Wording wise, I'd prefer something like:
 Warn if an simd directive is overridden by the vectorizer cost model

 (Or is it a simd? Where are the native speakers when one needs them?)

damn, right! I believe 'a' since simd starts with consonant.


 However, in light of my question above, shouldn't it be Warn if a loop with
 simd directive is not vectorized?



 +fsimd-cost-model=
 +Common Joined RejectNegative Enum(vect_cost_model)
 Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED)
 +Specifies the vectorization cost model for code marked with simd
 directive


 I think an article is lacking before simd.

done.



 +@item -Wopenmp-simd
 +@opindex Wopenm-simd
 +Warn if vectorizer cost model overrides simd directive from user.


 I think that can be expanded a bit. One could also mention OpenMP/Cilk Plus
 explicitly. Maybe like:  Warn if the vectorizer cost model overrides the
 OpenMP and Cilk Plus simd directives of the user.


done.

 Or if my reading above is correct, how about something like: Warn if a loop
 with OpenMP or Cilk Plus simd directive is not vectorized. If only the cost
 model prevented the vectorization, the @option{-fsimd-cost-model} option can
 be used to force the vectorization.

 Which brings me to my next point: -fvect-cost-model= is not documented. I
 think some words would be helpful, especially about the valid arguments, the
 default and how it interacts with -fvect-cost-model=.

done.



 --- a/gcc/fortran/lang.opt
 +++ b/gcc/fortran/lang.opt

 +Wopenmp-simd
 +Fortran Warning
 +; Documented in C

 (Warning is also not needed as it is taken from c-family/*opt, but it
 shouldn't harm either.)

done.

Sergos

* common.opt: Added new option -fsimd-cost-model.
* tree-vectorizer.h (unlimited_cost_model): Interface update
to rely on particular loop info.
* tree-vect-data-refs.c (vect_peeling_hash_insert): Update to
unlimited_cost_model call according to new interface.
(vect_peeling_hash_choose_best_peeling): Ditto.
(vect_enhance_data_refs_alignment): Ditto.
* tree-vect-slp.c: Ditto.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto,
plus issue a warning in case cost model overrides users' directive.
* c.opt: add openmp-simd warning.
* lang.opt: Ditto.
* doc/invoke.texi: Added new openmp-simd warning.


patch3
Description: Binary data


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-20 Thread Sergey Ostanevich
.  */
-  if (unlimited_cost_model ())
+  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 possible_npeel_number = vf /nelements;

   /* Handle the aligned case. We may decide to align some other
@@ -1437,7 +1438,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
loop_vinfo)
   if (DR_MISALIGNMENT (dr) == 0)
 {
   npeel_tmp = 0;
-  if (unlimited_cost_model ())
+  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 possible_npeel_number++;
 }

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 86ebbd2..1fd28e3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2696,7 +2696,7 @@ vect_estimate_min_profitable_iters
(loop_vec_info loop_vinfo,
   void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);

   /* Cost model disabled.  */
-  if (unlimited_cost_model ())
+  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 {
   dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n);
   *ret_min_profitable_niters = 0;
@@ -2929,6 +2929,13 @@ vect_estimate_min_profitable_iters
(loop_vec_info loop_vinfo,
   /* vector version will never be profitable.  */
   else
 {
+  if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect)
+{
+  warning_at (LOOP_VINFO_LOC (loop_vinfo), OPT_Wopenmp_simd,
+  Vectorization did not happen for 
+  the loop labeled as simd.);
+}
+
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
  cost model: the vector iteration cost = %d 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 247bdfd..4b25964 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2171,7 +2171,7 @@ vect_slp_analyze_bb_1 (basic_block bb)
 }

   /* Cost model: check if the vectorization is worthwhile.  */
-  if (!unlimited_cost_model ()
+  if (!unlimited_cost_model (NULL)
!vect_bb_vectorization_profitable_p (bb_vinfo))
 {
   if (dump_enabled_p ())
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a6c5b59..fd255db 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -919,9 +919,12 @@ known_alignment_for_access_p (struct
data_reference *data_ref_info)

 /* Return true if the vect cost model is unlimited.  */
 static inline bool
-unlimited_cost_model ()
+unlimited_cost_model (loop_p loop)
 {
-  return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED;
+  return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED
+  || (loop != NULL
+   loop-force_vect
+   flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED));
 }

 /* Source location */


On Wed, Nov 20, 2013 at 12:14 AM, Tobias Burnus bur...@net-b.de wrote:
 I have some small comments to the patch:

 * You should also update gcc/doc/invoke.texi


 Sergey Ostanevich wrote:

 index 0026683..84911a0 100644
 --- a/gcc/c-family/c.opt
 +++ b/gcc/c-family/c.opt

 ...

 +Wopenmp-simd
 +C C++ Var(openmp_simd) Warning
 +Warn about omp simd construct is overridden by cost model


 --- a/gcc/fortran/lang.opt
 +++ b/gcc/fortran/lang.opt

 ...

 +Wopenmp-simd
 +Fortran Warning
 +Warn about omp simd construct is overridden by cost model


 As the option files get merged, using

 Wopenmp-simd
 Fortran
 ; Documented in C

 is sufficient.



 +fsimd-cost-model=
 +Common Joined RejectNegative Enum(vect_cost_model)
 Var(flag_simd_cost_model) Init(VECT_COST_MODEL_DEFAULT)
 +Specifies the cost model for vectorization in loops marked with omp simd


 I wonder whether we need to care about Cilk Plus' #pragma simd in this
 summary.




 +  if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect)
 +{
 +  warning (OPT_Wopenmp_simd, Vectorization did not happen
 for the loop ,
 +   labeled as simd);
 +}
 +


 The warning line is too long.


 Tobias


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-20 Thread Sergey Ostanevich
Updated as per Richard and Jakub feedback - assuming the default
for simd-cost-model is unlmited by default.
Richard - was you Ok with it?

Sergos

* common.opt: Added new option -fsimd-cost-model.
* tree-vectorizer.h (unlimited_cost_model): Interface update
to rely on particular loop info.
* tree-vect-data-refs.c (vect_peeling_hash_insert): Update to
unlimited_cost_model call according to new interface.
(vect_peeling_hash_choose_best_peeling): Ditto.
(vect_enhance_data_refs_alignment): Ditto.
* tree-vect-slp.c: Ditto.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto,
plus issue a warning in case cost model overrides users' directive.
* c.opt: add openmp-simd warning.
* lang.opt: Ditto.
* doc/invoke.texi: Added new openmp-simd warning.



diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 0026683..6173013 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -592,6 +592,10 @@ Wold-style-definition
 C ObjC Var(warn_old_style_definition) Warning
 Warn if an old-style parameter definition is used

+Wopenmp-simd
+C C++ Var(warn_openmp_simd) Warning EnabledBy(Wall)
+Warn about simd directive is overridden by vectorizer cost model
+
 Woverlength-strings
 C ObjC C++ ObjC++ Var(warn_overlength_strings) Warning
LangEnabledBy(C ObjC C++ ObjC++,Wpedantic)
 Warn if a string is longer than the maximum portable length specified
by the standard
diff --git a/gcc/common.opt b/gcc/common.opt
index d5971df..6a40a5d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2296,10 +2296,17 @@ fvect-cost-model=
 Common Joined RejectNegative Enum(vect_cost_model)
Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
 Specifies the cost model for vectorization

+fsimd-cost-model=
+Common Joined RejectNegative Enum(vect_cost_model)
Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED)
+Specifies the vectorization cost model for code marked with simd directive
+
 Enum
 Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown
vectorizer cost model %qs)

 EnumValue
+Enum(vect_cost_model) String(default) Value(VECT_COST_MODEL_DEFAULT)
+
+EnumValue
 Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)

 EnumValue
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c250385..050bd44 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -256,7 +256,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wlogical-op -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmissing-braces
-Wmissing-field-initializers @gol
 -Wmissing-include-dirs @gol
--Wno-multichar  -Wnonnull  -Wno-overflow @gol
+-Wno-multichar  -Wnonnull  -Wno-overflow -Wopenmp-simd @gol
 -Woverlength-strings  -Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
 -Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
 -Wpointer-arith  -Wno-pointer-to-int-cast @gol
@@ -3318,6 +3318,7 @@ Options} and @ref{Objective-C and Objective-C++
Dialect Options}.
 -Wmaybe-uninitialized @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
 -Wnonnull  @gol
+-Wopenmp-simd @gol
 -Wparentheses  @gol
 -Wpointer-sign  @gol
 -Wreorder   @gol
@@ -4804,6 +4805,10 @@ attribute.
 @opindex Woverflow
 Do not warn about compile-time overflow in constant expressions.

+@item -Wopenmp-simd
+@opindex Wopenm-simd
+Warn if vectorizer cost model overrides simd directive from user.
+
 @item -Woverride-init @r{(C and Objective-C only)}
 @opindex Woverride-init
 @opindex Wno-override-init
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 5e09cbd..b43c48c 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -257,6 +257,10 @@ Wintrinsics-std
 Fortran Warning
 Warn on intrinsics not part of the selected standard

+Wopenmp-simd
+Fortran Warning
+; Documented in C
+
 Wreal-q-constant
 Fortran Warning
 Warn about real-literal-constants with 'q' exponent-letter
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 83d1f45..977db43 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1090,7 +1090,8 @@ vect_peeling_hash_insert (loop_vec_info
loop_vinfo, struct data_reference *dr,
   *new_slot = slot;
 }

-  if (!supportable_dr_alignment  unlimited_cost_model ())
+  if (!supportable_dr_alignment
+   unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 slot-count += VECT_MAX_COST;
 }

@@ -1200,7 +1201,7 @@ vect_peeling_hash_choose_best_peeling
(loop_vec_info loop_vinfo,
res.peel_info.dr = NULL;
res.body_cost_vec = stmt_vector_for_cost ();

-   if (!unlimited_cost_model ())
+   if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
  {
res.inside_cost = INT_MAX;
res.outside_cost = INT_MAX;
@@ -1429,7 +1430,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
loop_vinfo)
  vectorization factor.
  We do this automtically for cost model, since we
calculate cost
  for every peeling option.  */
-   

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-19 Thread Sergey Ostanevich
 bool
-unlimited_cost_model ()
+unlimited_cost_model (loop_p loop)
 {
-  return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED;
+  return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED
+  || (loop != NULL
+   loop-force_vect
+   flag_simd_vect_cost_model == VECT_COST_MODEL_UNLIMITED));
 }

 /* Source location */

On Mon, Nov 18, 2013 at 7:13 PM, Richard Biener rguent...@suse.de wrote:
 On Mon, 18 Nov 2013, Sergey Ostanevich wrote:

 I would agree that the example is just for the case cost model makes
 correct estimation But how can we assure ourself that it won't have any
 mistakes in the future?

 We call it bugs and not mistakes and we have bugzilla for it.

 Richard.

 I believe it'll be Ok to introduce an extra flag as Jakub proposed for the
 dedicated simd-forced vectorization to use unlimited cost model. This
 can be default for -fopenmp or there should be a warning issued that
 compiler overrides user's request of vectorization. In such a case user
 can enforce vectorization (even with mentioned results :) with this
 unlimited cost model for simd.



 On Fri, Nov 15, 2013 at 6:24 PM, Richard Biener rguent...@suse.de wrote:
  On Fri, 15 Nov 2013, Sergey Ostanevich wrote:
 
  Richard,
 
  here's an example that causes trigger for the cost model.
 
  I hardly believe that (AVX2)
 
  .L9:
  vmovups (%rsi), %xmm3
  addl$1, %r8d
  addq$256, %rsi
  vinsertf128 $0x1, -240(%rsi), %ymm3, %ymm1
  vmovups -224(%rsi), %xmm3
  vinsertf128 $0x1, -208(%rsi), %ymm3, %ymm3
  vshufps $136, %ymm3, %ymm1, %ymm3
  vperm2f128  $3, %ymm3, %ymm3, %ymm2
  vshufps $68, %ymm2, %ymm3, %ymm1
  vshufps $238, %ymm2, %ymm3, %ymm2
  vmovups -192(%rsi), %xmm3
  vinsertf128 $1, %xmm2, %ymm1, %ymm2
  vinsertf128 $0x1, -176(%rsi), %ymm3, %ymm1
  vmovups -160(%rsi), %xmm3
  vinsertf128 $0x1, -144(%rsi), %ymm3, %ymm3
  vshufps $136, %ymm3, %ymm1, %ymm3
  vperm2f128  $3, %ymm3, %ymm3, %ymm1
  vshufps $68, %ymm1, %ymm3, %ymm4
  vshufps $238, %ymm1, %ymm3, %ymm1
  vmovups -128(%rsi), %xmm3
  vinsertf128 $1, %xmm1, %ymm4, %ymm1
  vshufps $136, %ymm1, %ymm2, %ymm1
  vperm2f128  $3, %ymm1, %ymm1, %ymm2
  vshufps $68, %ymm2, %ymm1, %ymm4
  vshufps $238, %ymm2, %ymm1, %ymm2
  vinsertf128 $0x1, -112(%rsi), %ymm3, %ymm1
  vmovups -96(%rsi), %xmm3
  vinsertf128 $1, %xmm2, %ymm4, %ymm4
  vinsertf128 $0x1, -80(%rsi), %ymm3, %ymm3
  vshufps $136, %ymm3, %ymm1, %ymm3
  vperm2f128  $3, %ymm3, %ymm3, %ymm2
  vshufps $68, %ymm2, %ymm3, %ymm1
  vshufps $238, %ymm2, %ymm3, %ymm2
  vmovups -64(%rsi), %xmm3
  vinsertf128 $1, %xmm2, %ymm1, %ymm2
  vinsertf128 $0x1, -48(%rsi), %ymm3, %ymm1
  vmovups -32(%rsi), %xmm3
  vinsertf128 $0x1, -16(%rsi), %ymm3, %ymm3
  cmpl%r8d, %edi
  vshufps $136, %ymm3, %ymm1, %ymm3
  vperm2f128  $3, %ymm3, %ymm3, %ymm1
  vshufps $68, %ymm1, %ymm3, %ymm5
  vshufps $238, %ymm1, %ymm3, %ymm1
  vinsertf128 $1, %xmm1, %ymm5, %ymm1
  vshufps $136, %ymm1, %ymm2, %ymm1
  vperm2f128  $3, %ymm1, %ymm1, %ymm2
  vshufps $68, %ymm2, %ymm1, %ymm3
  vshufps $238, %ymm2, %ymm1, %ymm2
  vinsertf128 $1, %xmm2, %ymm3, %ymm1
  vshufps $136, %ymm1, %ymm4, %ymm1
  vperm2f128  $3, %ymm1, %ymm1, %ymm2
  vshufps $68, %ymm2, %ymm1, %ymm3
  vshufps $238, %ymm2, %ymm1, %ymm2
  vinsertf128 $1, %xmm2, %ymm3, %ymm2
  vaddps  %ymm2, %ymm0, %ymm0
  ja  .L9
 
  is more efficient than
 
  .L3:
  vaddss  (%rcx,%rax), %xmm0, %xmm0
  addq$32, %rax
  cmpq%rdx, %rax
  jne .L3
 
  ;)
 
  As soon as
  elemental functions will appear and we update the vectorizer so it can 
  accept
  an elemental function inside the loop - we will have the same
  situation as we have
  it now: cost model will bail out with profitability estimation.
 
  Yes.
 
  Still we have no chance to get info on how efficient the bar() function 
  when it
  is in vector form.
 
  Well I assume you mean that the speedup when vectorizing the elemental
  will offset whatever wreckage we cause with vectorizing the rest of the
  statements.  I'd say you can at least compare to unrolling by
  the vectorization factor, building the vector inputs to the elemental
  from scalars, distributing the vector result from the elemental to
  scalars.
 
  I believe I should repeat: #pragma omp simd is intended for introduction 
  of an
  instruction-level parallel region on developer's request, hence should
  be treated
  in same manner as #pragma omp parallel. Vectorizer cost model is an 
  obstacle

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-19 Thread Sergey Ostanevich
On Tue, Nov 19, 2013 at 6:07 PM, Richard Biener rguent...@suse.de wrote:
 On Tue, 19 Nov 2013, Sergey Ostanevich wrote:

 :) agree to you, but as soon as you're a user who tries to introduce
 vector code and face a bug in cost model you'd like to have a
 workaround until the bug will be fixed and compiler will come to you
 with new OS distribution, don't you?

 I propose the following, yet SLP have to use a NULL as a loop info
 which looks somewhat hacky.

 I think this is overengineering.  -fvect-cost-model will do as
 workaround.  And -fsimd-vect-cost-model has what I consider
 duplicate - simd and vect.

I just wanted to separate the autovectorized loops from ones user
wants to vectorize. The -fvect-cost-model will force all at once.
That's the reason to introcude the simd-vect, since pragma name
is simd.


 Richard.

 Sergos


 * common.opt: Added new option -fsimd-vect-cost-model
 * tree-vectorizer.h (unlimited_cost_model): Interface update
 to rely on particular loop info
 * tree-vect-data-refs.c (vect_peeling_hash_insert): Update to
 unlimited_cost_model call according to new interface
 (vect_peeling_hash_choose_best_peeling): Ditto
 (vect_enhance_data_refs_alignment): Ditto
 * tree-vect-slp.c: Ditto
 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Ditto
 plus issue a warning in case cost model overrides users' directive



 diff --git a/gcc/common.opt b/gcc/common.opt
 index d5971df..87b3b37 100644
 --- a/gcc/common.opt
 +++ b/gcc/common.opt
 @@ -2296,6 +2296,10 @@ fvect-cost-model=
  Common Joined RejectNegative Enum(vect_cost_model)
 Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
  Specifies the cost model for vectorization

 +fsimd-vect-cost-model=
 +Common Joined RejectNegative Enum(vect_cost_model)
 Var(flag_simd_vect_cost_model) Init(VECT_COST_MODEL_UNLIMITED)
 +Specifies the cost model for vectorization in loops marked with
 #pragma omp simd
 +
  Enum
  Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown
 vectorizer cost model %qs)

 diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
 index 83d1f45..e26f704 100644
 --- a/gcc/tree-vect-data-refs.c
 +++ b/gcc/tree-vect-data-refs.c
 @@ -1090,7 +1090,8 @@ vect_peeling_hash_insert (loop_vec_info
 loop_vinfo, struct data_reference *dr,
*new_slot = slot;
  }

 -  if (!supportable_dr_alignment  unlimited_cost_model ())
 +  if (!supportable_dr_alignment
 +   unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
  slot-count += VECT_MAX_COST;
  }

 @@ -1200,7 +1201,7 @@ vect_peeling_hash_choose_best_peeling
 (loop_vec_info loop_vinfo,
 res.peel_info.dr = NULL;
 res.body_cost_vec = stmt_vector_for_cost ();

 -   if (!unlimited_cost_model ())
 +   if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
   {
 res.inside_cost = INT_MAX;
 res.outside_cost = INT_MAX;
 @@ -1429,7 +1430,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
 loop_vinfo)
   vectorization factor.
   We do this automtically for cost model, since we
 calculate cost
   for every peeling option.  */
 -  if (unlimited_cost_model ())
 +  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
  possible_npeel_number = vf /nelements;

/* Handle the aligned case. We may decide to align some other
 @@ -1437,7 +1438,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
 loop_vinfo)
if (DR_MISALIGNMENT (dr) == 0)
  {
npeel_tmp = 0;
 -  if (unlimited_cost_model ())
 +  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
  possible_npeel_number++;
  }

 diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
 index 86ebbd2..be66172 100644
 --- a/gcc/tree-vect-loop.c
 +++ b/gcc/tree-vect-loop.c
 @@ -2696,7 +2696,7 @@ vect_estimate_min_profitable_iters
 (loop_vec_info loop_vinfo,
void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);

/* Cost model disabled.  */
 -  if (unlimited_cost_model ())
 +  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
  {
dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n);
*ret_min_profitable_niters = 0;
 @@ -2929,6 +2929,11 @@ vect_estimate_min_profitable_iters
 (loop_vec_info loop_vinfo,
/* vector version will never be profitable.  */
else
  {
 +  if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect)
 +{
 +  pedwarn (vect_location, 0, Vectorization did not happen
 for the loop);
 +}
 +
if (dump_enabled_p ())
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
   cost model: the vector iteration cost = %d 
 diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
 index 247bdfd..4b25964 100644
 --- a/gcc/tree-vect-slp.c
 +++ b/gcc/tree-vect-slp.c

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-19 Thread Sergey Ostanevich
  I propose the following, yet SLP have to use a NULL as a loop info
  which looks somewhat hacky.

 I think this is overengineering.  -fvect-cost-model will do as
 workaround.  And -fsimd-vect-cost-model has what I consider
 duplicate - simd and vect.

 I think it is a good idea, though I agree about s/simd-vect/simd/ and
 I'd use VECT_COST_MODEL_DEFAULT as the default, which would mean
 just use -fvect-cost-model.

that's ok, since we'd have a way to force those 'simd' loops.


  @@ -2929,6 +2929,11 @@ vect_estimate_min_profitable_iters
  (loop_vec_info loop_vinfo,
 /* vector version will never be profitable.  */
 else
   {
  +  if (LOOP_VINFO_LOOP (loop_vinfo)-force_vect)
  +{
  +  pedwarn (vect_location, 0, Vectorization did not happen
  for the loop);
  +}

 pedwarn isn't really desirable for this, you want just warning,
 but some warning you can actually also turn off.
 -Wopenmp-simd (and we'd use it also when we ignore #pragma omp declare simd
 because it wasn't useful/desirable).

consider a user is interested in enabling warning-as-error for this case?
can we disable the pedwarn the same way?

Sergos


 Jakub


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-19 Thread Sergey Ostanevich
 ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 cost model: the vector iteration cost = %d 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 247bdfd..4b25964 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2171,7 +2171,7 @@ vect_slp_analyze_bb_1 (basic_block bb)
 }

   /* Cost model: check if the vectorization is worthwhile.  */
-  if (!unlimited_cost_model ()
+  if (!unlimited_cost_model (NULL)
!vect_bb_vectorization_profitable_p (bb_vinfo))
 {
   if (dump_enabled_p ())
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a6c5b59..fd255db 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -919,9 +919,12 @@ known_alignment_for_access_p (struct
data_reference *data_ref_info)

 /* Return true if the vect cost model is unlimited.  */
 static inline bool
-unlimited_cost_model ()
+unlimited_cost_model (loop_p loop)
 {
-  return flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED;
+  return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED
+  || (loop != NULL
+   loop-force_vect
+   flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED));
 }

 /* Source location */

On Tue, Nov 19, 2013 at 6:42 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 19, 2013 at 06:39:48PM +0400, Sergey Ostanevich wrote:
  pedwarn isn't really desirable for this, you want just warning,
  but some warning you can actually also turn off.
  -Wopenmp-simd (and we'd use it also when we ignore #pragma omp declare simd
  because it wasn't useful/desirable).

 consider a user is interested in enabling warning-as-error for this case?

 -Werror=openmp-simd will work then, this works for any named warnings.

 can we disable the pedwarn the same way?

 pedwarn is for pedantic warnings, no standard says that #pragma omp simd
 must be vectorized, or that #pragma omp simd or #pragma omp declare simd
 is anything but an optimization hint, so pedwarn isn't what you are looking
 for.

 Jakub


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-18 Thread Sergey Ostanevich
I would agree that the example is just for the case cost model makes
correct estimation But how can we assure ourself that it won't have any
mistakes in the future?

I believe it'll be Ok to introduce an extra flag as Jakub proposed for the
dedicated simd-forced vectorization to use unlimited cost model. This
can be default for -fopenmp or there should be a warning issued that
compiler overrides user's request of vectorization. In such a case user
can enforce vectorization (even with mentioned results :) with this
unlimited cost model for simd.



On Fri, Nov 15, 2013 at 6:24 PM, Richard Biener rguent...@suse.de wrote:
 On Fri, 15 Nov 2013, Sergey Ostanevich wrote:

 Richard,

 here's an example that causes trigger for the cost model.

 I hardly believe that (AVX2)

 .L9:
 vmovups (%rsi), %xmm3
 addl$1, %r8d
 addq$256, %rsi
 vinsertf128 $0x1, -240(%rsi), %ymm3, %ymm1
 vmovups -224(%rsi), %xmm3
 vinsertf128 $0x1, -208(%rsi), %ymm3, %ymm3
 vshufps $136, %ymm3, %ymm1, %ymm3
 vperm2f128  $3, %ymm3, %ymm3, %ymm2
 vshufps $68, %ymm2, %ymm3, %ymm1
 vshufps $238, %ymm2, %ymm3, %ymm2
 vmovups -192(%rsi), %xmm3
 vinsertf128 $1, %xmm2, %ymm1, %ymm2
 vinsertf128 $0x1, -176(%rsi), %ymm3, %ymm1
 vmovups -160(%rsi), %xmm3
 vinsertf128 $0x1, -144(%rsi), %ymm3, %ymm3
 vshufps $136, %ymm3, %ymm1, %ymm3
 vperm2f128  $3, %ymm3, %ymm3, %ymm1
 vshufps $68, %ymm1, %ymm3, %ymm4
 vshufps $238, %ymm1, %ymm3, %ymm1
 vmovups -128(%rsi), %xmm3
 vinsertf128 $1, %xmm1, %ymm4, %ymm1
 vshufps $136, %ymm1, %ymm2, %ymm1
 vperm2f128  $3, %ymm1, %ymm1, %ymm2
 vshufps $68, %ymm2, %ymm1, %ymm4
 vshufps $238, %ymm2, %ymm1, %ymm2
 vinsertf128 $0x1, -112(%rsi), %ymm3, %ymm1
 vmovups -96(%rsi), %xmm3
 vinsertf128 $1, %xmm2, %ymm4, %ymm4
 vinsertf128 $0x1, -80(%rsi), %ymm3, %ymm3
 vshufps $136, %ymm3, %ymm1, %ymm3
 vperm2f128  $3, %ymm3, %ymm3, %ymm2
 vshufps $68, %ymm2, %ymm3, %ymm1
 vshufps $238, %ymm2, %ymm3, %ymm2
 vmovups -64(%rsi), %xmm3
 vinsertf128 $1, %xmm2, %ymm1, %ymm2
 vinsertf128 $0x1, -48(%rsi), %ymm3, %ymm1
 vmovups -32(%rsi), %xmm3
 vinsertf128 $0x1, -16(%rsi), %ymm3, %ymm3
 cmpl%r8d, %edi
 vshufps $136, %ymm3, %ymm1, %ymm3
 vperm2f128  $3, %ymm3, %ymm3, %ymm1
 vshufps $68, %ymm1, %ymm3, %ymm5
 vshufps $238, %ymm1, %ymm3, %ymm1
 vinsertf128 $1, %xmm1, %ymm5, %ymm1
 vshufps $136, %ymm1, %ymm2, %ymm1
 vperm2f128  $3, %ymm1, %ymm1, %ymm2
 vshufps $68, %ymm2, %ymm1, %ymm3
 vshufps $238, %ymm2, %ymm1, %ymm2
 vinsertf128 $1, %xmm2, %ymm3, %ymm1
 vshufps $136, %ymm1, %ymm4, %ymm1
 vperm2f128  $3, %ymm1, %ymm1, %ymm2
 vshufps $68, %ymm2, %ymm1, %ymm3
 vshufps $238, %ymm2, %ymm1, %ymm2
 vinsertf128 $1, %xmm2, %ymm3, %ymm2
 vaddps  %ymm2, %ymm0, %ymm0
 ja  .L9

 is more efficient than

 .L3:
 vaddss  (%rcx,%rax), %xmm0, %xmm0
 addq$32, %rax
 cmpq%rdx, %rax
 jne .L3

 ;)

 As soon as
 elemental functions will appear and we update the vectorizer so it can accept
 an elemental function inside the loop - we will have the same
 situation as we have
 it now: cost model will bail out with profitability estimation.

 Yes.

 Still we have no chance to get info on how efficient the bar() function when 
 it
 is in vector form.

 Well I assume you mean that the speedup when vectorizing the elemental
 will offset whatever wreckage we cause with vectorizing the rest of the
 statements.  I'd say you can at least compare to unrolling by
 the vectorization factor, building the vector inputs to the elemental
 from scalars, distributing the vector result from the elemental to
 scalars.

 I believe I should repeat: #pragma omp simd is intended for introduction of 
 an
 instruction-level parallel region on developer's request, hence should
 be treated
 in same manner as #pragma omp parallel. Vectorizer cost model is an obstacle
 here, not a help.

 Surely not if there isn't an elemental call in it.  With it the
 cost model of course will have not enough information to decide.

 But still, what's the difference to the case where we cannot vectorize
 the function?  What happens if we cannot vectorize the elemental?
 Do we have to build scalar versions for all possible vector sizes?

 Richard.

 Regards,
 Sergos


 On Fri, Nov 15, 2013 at 1:08 AM, Richard Biener rguent...@suse.de wrote:
  Sergey Ostanevich sergos@gmail.com wrote:
 this is only for the whole file? I mean to have a particular loop
 vectorized in a
 file while all others - up to compiler's cost model

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-15 Thread Sergey Ostanevich
Richard,

here's an example that causes trigger for the cost model. As soon as
elemental functions will appear and we update the vectorizer so it can accept
an elemental function inside the loop - we will have the same
situation as we have
it now: cost model will bail out with profitability estimation.
Still we have no chance to get info on how efficient the bar() function when it
is in vector form.

I believe I should repeat: #pragma omp simd is intended for introduction of an
instruction-level parallel region on developer's request, hence should
be treated
in same manner as #pragma omp parallel. Vectorizer cost model is an obstacle
here, not a help.

Regards,
Sergos


On Fri, Nov 15, 2013 at 1:08 AM, Richard Biener rguent...@suse.de wrote:
 Sergey Ostanevich sergos@gmail.com wrote:
this is only for the whole file? I mean to have a particular loop
vectorized in a
file while all others - up to compiler's cost model. is there such a
machinery?

 No, there is not.

 Richard.

Sergos

On Thu, Nov 14, 2013 at 12:39 PM, Richard Biener rguent...@suse.de
wrote:
 On Wed, 13 Nov 2013, Sergey Ostanevich wrote:

 I will get some tests.
 As for cost analysis - simply consider the pragma as a request to
 vectorize. How can I - as a developer - enforce it beyond the
pragma?

 You can disable the cost model via -fvect-cost-model=unlimited

 Richard.

 On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener rguent...@suse.de
wrote:
  On Tue, 12 Nov 2013, Sergey Ostanevich wrote:
 
  The reason patch was in its original state is because we want
  to notify user that his assumption of profitability may be wrong.
  This is not a part of any spec and as far as I know ICC does not
  notify user about the case. Still it can be a good hint for those
  users who tries to get as much as possible performance.
 
  Richard's comment on the vectorization problems is about the same
-
  to inform user that his attempt to force vectorization is failed.
 
  As for profitable or not - sometimes I believe it's impossible to
be
  precise. For OMP we have case of a vector version of a function
  and we have no chance to figure out whether it is profitable to
use
  it or to loose it. If we can't map the loop for any vector length
  other than 1 - I believe in this case we have to bail out and
report.
  Is it about 'never profitable'?
 
  For example.  I think we should report non-vectorized loops
  that are marked with force_vect anyway, with
-Wdisabled-optimization.
  Another case is that a loop may be profitable to vectorize if
  the ISA supports a gather instruction but otherwise not.  Or if
the
  ISA supports efficient vector construction from N not loop
  invariant scalars (for vectorization of strided loads).
 
  Simply disregarding all of the cost analysis sounds completely
  bogus to me.
 
  I'd simply go for the diagnostic for now, not changing anything
else.
  We want to have a good understanding about why the cost model is
  so bad that we have to force to ignore it for #pragma simd - thus
we
  want testcases.
 
  Richard.
 
 
  On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener
rguent...@suse.de wrote:
   On 11/12/13 3:16 PM, Jakub Jelinek wrote:
   On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich
wrote:
   ivdep just substitutes all cross-iteration data analysis,
   nothing related to cost model. ICC does not cancel its
   cost model in case of #pragma ivdep
  
   as for the safelen - OMP standart treats it as a limitation
   for the vector length. this means if no safelen is present
   an arbitrary vector length can be used.
  
   I was talking about GCC loop-safelen, which is INT_MAX for
#pragma omp simd
   without safelen clause or #pragma simd without vectorlength
clause.
  
   so I believe loop-force_vect is the only trigger to
disregard
   the cost model
  
   Anyway, in that case I think the originally posted patch is
wrong,
   if we want to treat force_vect as disregard all the cost model
and
   force vectorization (well, the name of the field already kind
of suggest
   that), then IMHO we should treat it the same as
-fvect-cost-model=unlimited
   for those loops.
  
   Err - the user may have a specific sub-architecture in mind
when using
   #pragma simd, if you say we should completely ignore the cost
model
   then should we also sorry () if we cannot vectorize the loop
(either
   because of GCC deficiencies or lack of sub-target support)?
  
   That said, at least in the cases that the cost model says the
loop
   is never profitable to vectorize we should follow its advice.
  
   Richard.
  
   Thus (untested):
  
   2013-11-12  Jakub Jelinek  ja...@redhat.com
  
 * tree-vect-loop.c (vect_estimate_min_profitable_iters):
Use
 unlimited cost model also for force_vect loops.
  
   --- gcc/tree-vect-loop.c.jj   2013-11-12 12:09:40.0
+0100
   +++ gcc/tree-vect-loop.c  2013-11-12 15:11:43.821404330
+0100
   @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop
  void

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-14 Thread Sergey Ostanevich
this is only for the whole file? I mean to have a particular loop
vectorized in a
file while all others - up to compiler's cost model. is there such a machinery?

Sergos

On Thu, Nov 14, 2013 at 12:39 PM, Richard Biener rguent...@suse.de wrote:
 On Wed, 13 Nov 2013, Sergey Ostanevich wrote:

 I will get some tests.
 As for cost analysis - simply consider the pragma as a request to
 vectorize. How can I - as a developer - enforce it beyond the pragma?

 You can disable the cost model via -fvect-cost-model=unlimited

 Richard.

 On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener rguent...@suse.de wrote:
  On Tue, 12 Nov 2013, Sergey Ostanevich wrote:
 
  The reason patch was in its original state is because we want
  to notify user that his assumption of profitability may be wrong.
  This is not a part of any spec and as far as I know ICC does not
  notify user about the case. Still it can be a good hint for those
  users who tries to get as much as possible performance.
 
  Richard's comment on the vectorization problems is about the same -
  to inform user that his attempt to force vectorization is failed.
 
  As for profitable or not - sometimes I believe it's impossible to be
  precise. For OMP we have case of a vector version of a function
  and we have no chance to figure out whether it is profitable to use
  it or to loose it. If we can't map the loop for any vector length
  other than 1 - I believe in this case we have to bail out and report.
  Is it about 'never profitable'?
 
  For example.  I think we should report non-vectorized loops
  that are marked with force_vect anyway, with -Wdisabled-optimization.
  Another case is that a loop may be profitable to vectorize if
  the ISA supports a gather instruction but otherwise not.  Or if the
  ISA supports efficient vector construction from N not loop
  invariant scalars (for vectorization of strided loads).
 
  Simply disregarding all of the cost analysis sounds completely
  bogus to me.
 
  I'd simply go for the diagnostic for now, not changing anything else.
  We want to have a good understanding about why the cost model is
  so bad that we have to force to ignore it for #pragma simd - thus we
  want testcases.
 
  Richard.
 
 
  On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote:
   On 11/12/13 3:16 PM, Jakub Jelinek wrote:
   On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote:
   ivdep just substitutes all cross-iteration data analysis,
   nothing related to cost model. ICC does not cancel its
   cost model in case of #pragma ivdep
  
   as for the safelen - OMP standart treats it as a limitation
   for the vector length. this means if no safelen is present
   an arbitrary vector length can be used.
  
   I was talking about GCC loop-safelen, which is INT_MAX for #pragma 
   omp simd
   without safelen clause or #pragma simd without vectorlength clause.
  
   so I believe loop-force_vect is the only trigger to disregard
   the cost model
  
   Anyway, in that case I think the originally posted patch is wrong,
   if we want to treat force_vect as disregard all the cost model and
   force vectorization (well, the name of the field already kind of 
   suggest
   that), then IMHO we should treat it the same as 
   -fvect-cost-model=unlimited
   for those loops.
  
   Err - the user may have a specific sub-architecture in mind when using
   #pragma simd, if you say we should completely ignore the cost model
   then should we also sorry () if we cannot vectorize the loop (either
   because of GCC deficiencies or lack of sub-target support)?
  
   That said, at least in the cases that the cost model says the loop
   is never profitable to vectorize we should follow its advice.
  
   Richard.
  
   Thus (untested):
  
   2013-11-12  Jakub Jelinek  ja...@redhat.com
  
 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use
 unlimited cost model also for force_vect loops.
  
   --- gcc/tree-vect-loop.c.jj   2013-11-12 12:09:40.0 +0100
   +++ gcc/tree-vect-loop.c  2013-11-12 15:11:43.821404330 +0100
   @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop
  void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
  
  /* Cost model disabled.  */
   -  if (unlimited_cost_model ())
   +  if (unlimited_cost_model () || LOOP_VINFO_LOOP 
   (loop_vinfo)-force_vect)
{
  dump_printf_loc (MSG_NOTE, vect_location, cost model 
   disabled.\n);
  *ret_min_profitable_niters = 0;
  
 Jakub
  
  
 
 
 
  --
  Richard Biener rguent...@suse.de
  SUSE / SUSE Labs
  SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
  GF: Jeff Hawn, Jennifer Guild, Felix Imend



 --
 Richard Biener rguent...@suse.de
 SUSE / SUSE Labs
 SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-13 Thread Sergey Ostanevich
I will get some tests.
As for cost analysis - simply consider the pragma as a request to
vectorize. How can I - as a developer - enforce it beyond the pragma?

On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener rguent...@suse.de wrote:
 On Tue, 12 Nov 2013, Sergey Ostanevich wrote:

 The reason patch was in its original state is because we want
 to notify user that his assumption of profitability may be wrong.
 This is not a part of any spec and as far as I know ICC does not
 notify user about the case. Still it can be a good hint for those
 users who tries to get as much as possible performance.

 Richard's comment on the vectorization problems is about the same -
 to inform user that his attempt to force vectorization is failed.

 As for profitable or not - sometimes I believe it's impossible to be
 precise. For OMP we have case of a vector version of a function
 and we have no chance to figure out whether it is profitable to use
 it or to loose it. If we can't map the loop for any vector length
 other than 1 - I believe in this case we have to bail out and report.
 Is it about 'never profitable'?

 For example.  I think we should report non-vectorized loops
 that are marked with force_vect anyway, with -Wdisabled-optimization.
 Another case is that a loop may be profitable to vectorize if
 the ISA supports a gather instruction but otherwise not.  Or if the
 ISA supports efficient vector construction from N not loop
 invariant scalars (for vectorization of strided loads).

 Simply disregarding all of the cost analysis sounds completely
 bogus to me.

 I'd simply go for the diagnostic for now, not changing anything else.
 We want to have a good understanding about why the cost model is
 so bad that we have to force to ignore it for #pragma simd - thus we
 want testcases.

 Richard.


 On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote:
  On 11/12/13 3:16 PM, Jakub Jelinek wrote:
  On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote:
  ivdep just substitutes all cross-iteration data analysis,
  nothing related to cost model. ICC does not cancel its
  cost model in case of #pragma ivdep
 
  as for the safelen - OMP standart treats it as a limitation
  for the vector length. this means if no safelen is present
  an arbitrary vector length can be used.
 
  I was talking about GCC loop-safelen, which is INT_MAX for #pragma omp 
  simd
  without safelen clause or #pragma simd without vectorlength clause.
 
  so I believe loop-force_vect is the only trigger to disregard
  the cost model
 
  Anyway, in that case I think the originally posted patch is wrong,
  if we want to treat force_vect as disregard all the cost model and
  force vectorization (well, the name of the field already kind of suggest
  that), then IMHO we should treat it the same as 
  -fvect-cost-model=unlimited
  for those loops.
 
  Err - the user may have a specific sub-architecture in mind when using
  #pragma simd, if you say we should completely ignore the cost model
  then should we also sorry () if we cannot vectorize the loop (either
  because of GCC deficiencies or lack of sub-target support)?
 
  That said, at least in the cases that the cost model says the loop
  is never profitable to vectorize we should follow its advice.
 
  Richard.
 
  Thus (untested):
 
  2013-11-12  Jakub Jelinek  ja...@redhat.com
 
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Use
unlimited cost model also for force_vect loops.
 
  --- gcc/tree-vect-loop.c.jj   2013-11-12 12:09:40.0 +0100
  +++ gcc/tree-vect-loop.c  2013-11-12 15:11:43.821404330 +0100
  @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop
 void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
 /* Cost model disabled.  */
  -  if (unlimited_cost_model ())
  +  if (unlimited_cost_model () || LOOP_VINFO_LOOP 
  (loop_vinfo)-force_vect)
   {
 dump_printf_loc (MSG_NOTE, vect_location, cost model 
  disabled.\n);
 *ret_min_profitable_niters = 0;
 
Jakub
 
 



 --
 Richard Biener rguent...@suse.de
 SUSE / SUSE Labs
 SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
 GF: Jeff Hawn, Jennifer Guild, Felix Imend


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-12 Thread Sergey Ostanevich
Richard, Jakub,

You are right regarding the cost model if we talk about vectorizer alone.
But the #pragma omp simd goes beyond the vectorizer - it introduces
parallel context in a place user defines, similar to #pragma omp parallel.
Are we applying any cost model for omp parallel region?

You can consider this pragma as a helper for developer for 'easily'
introduce parallelism in his code, hence any type of cost model -
whatever quality it is - will plays against this paradigm, forcing user
to play around our cost model to let it make the loop simd-parallel.

Sergos

On Thu, Oct 31, 2013 at 10:36 PM, Richard Biener rguent...@suse.de wrote:
 Jakub Jelinek ja...@redhat.com wrote:
On Thu, Oct 31, 2013 at 07:02:28PM +0400, Yuri Rumyantsev wrote:
 Here is a simple fix which allows to vectorize loop marked with
 'pragma omp simd' even if cost model tells us that vectorization is
 not profitable.
 I checked that on simple test-case is works as expected.

 Is it Ok for trunk?

 ChangeLog:

 2013-10-31  Yuri Rumyantsev  ysrum...@gmail.com

 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Override
 cost estimation for loops marked as vectorizable.

That looks too simplistics, IMHO it is undesirable to disregard the
profitability checks together.  For #pragma omp simd or #pragma simd
loops, I can understand that we should admit our cost model is not very
high
quality and so in border cases consider vectorizing rather than not
vectorizing, say for force_vect by increasing the scalar cost by some
factor or decreasing vector cost by some factor, but disregarding it
altogether doesn't look wise.  The question is what factor should we
use?
150% of scalar cost, something else?

 Please improve the cost-model instead.

 Thanks,
 Richard.

   Jakub




Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-12 Thread Sergey Ostanevich
yes, ICC ignores cost analysis and follows user request on introduction of
simd parallelism in the loop.they follow the omp parallel semantics.

On Tue, Nov 12, 2013 at 3:05 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 12, 2013 at 02:48:46PM +0400, Sergey Ostanevich wrote:
 You are right regarding the cost model if we talk about vectorizer alone.
 But the #pragma omp simd goes beyond the vectorizer - it introduces
 parallel context in a place user defines, similar to #pragma omp parallel.
 Are we applying any cost model for omp parallel region?

 You can consider this pragma as a helper for developer for 'easily'
 introduce parallelism in his code, hence any type of cost model -
 whatever quality it is - will plays against this paradigm, forcing user
 to play around our cost model to let it make the loop simd-parallel.

 So what do other compilers do here?  Does icc also totally ignore all
 cost analysis for #pragma omp simd or #pragma simd and vectorizes even when
 it would be obviously undesirable?

 Jakub


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-12 Thread Sergey Ostanevich
ivdep just substitutes all cross-iteration data analysis,
nothing related to cost model. ICC does not cancel its
cost model in case of #pragma ivdep

as for the safelen - OMP standart treats it as a limitation
for the vector length. this means if no safelen is present
an arbitrary vector length can be used.
so I believe loop-force_vect is the only trigger to disregard
the cost model

On Tue, Nov 12, 2013 at 4:48 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 12, 2013 at 04:45:17PM +0400, Sergey Ostanevich wrote:
 yes, ICC ignores cost analysis and follows user request on introduction of
 simd parallelism in the loop.they follow the omp parallel semantics.

 What about #pragma ivdep?  I.e. if we decided to follow ICC here, should
 we ignore cost model just for loop-safelen  loop-force_vect loops
 (or only loop-force_vect), or also for any other loop-safelen loops?

 Jakub


Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-12 Thread Sergey Ostanevich
The reason patch was in its original state is because we want
to notify user that his assumption of profitability may be wrong.
This is not a part of any spec and as far as I know ICC does not
notify user about the case. Still it can be a good hint for those
users who tries to get as much as possible performance.

Richard's comment on the vectorization problems is about the same -
to inform user that his attempt to force vectorization is failed.

As for profitable or not - sometimes I believe it's impossible to be
precise. For OMP we have case of a vector version of a function
and we have no chance to figure out whether it is profitable to use
it or to loose it. If we can't map the loop for any vector length
other than 1 - I believe in this case we have to bail out and report.
Is it about 'never profitable'?


On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener rguent...@suse.de wrote:
 On 11/12/13 3:16 PM, Jakub Jelinek wrote:
 On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote:
 ivdep just substitutes all cross-iteration data analysis,
 nothing related to cost model. ICC does not cancel its
 cost model in case of #pragma ivdep

 as for the safelen - OMP standart treats it as a limitation
 for the vector length. this means if no safelen is present
 an arbitrary vector length can be used.

 I was talking about GCC loop-safelen, which is INT_MAX for #pragma omp simd
 without safelen clause or #pragma simd without vectorlength clause.

 so I believe loop-force_vect is the only trigger to disregard
 the cost model

 Anyway, in that case I think the originally posted patch is wrong,
 if we want to treat force_vect as disregard all the cost model and
 force vectorization (well, the name of the field already kind of suggest
 that), then IMHO we should treat it the same as -fvect-cost-model=unlimited
 for those loops.

 Err - the user may have a specific sub-architecture in mind when using
 #pragma simd, if you say we should completely ignore the cost model
 then should we also sorry () if we cannot vectorize the loop (either
 because of GCC deficiencies or lack of sub-target support)?

 That said, at least in the cases that the cost model says the loop
 is never profitable to vectorize we should follow its advice.

 Richard.

 Thus (untested):

 2013-11-12  Jakub Jelinek  ja...@redhat.com

   * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use
   unlimited cost model also for force_vect loops.

 --- gcc/tree-vect-loop.c.jj   2013-11-12 12:09:40.0 +0100
 +++ gcc/tree-vect-loop.c  2013-11-12 15:11:43.821404330 +0100
 @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop
void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);

/* Cost model disabled.  */
 -  if (unlimited_cost_model ())
 +  if (unlimited_cost_model () || LOOP_VINFO_LOOP (loop_vinfo)-force_vect)
  {
dump_printf_loc (MSG_NOTE, vect_location, cost model disabled.\n);
*ret_min_profitable_niters = 0;

   Jakub




Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 3)

2013-10-22 Thread Sergey Ostanevich
ouch.. html got me!

applying to the same as before

commit 0e3dfadd374c3045a926afa6d06af276cee2108d
Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Oct 18 08:36:28 2013 +

trying '06 build.


for take 2 on HSW there were mixed results for passed tests:

400.perlbench +0.25%
401.bzip2 -0.40%
429.mcf   -1.67%
456.hmmer +0%
458.sjeng -1.65%
471.omnetpp   -1.72%
483.xalancbmk -0.25%
410.bwaves+0%
433.milc  -0.28%
444.namd  -1.13%
450.soplex-1.01%
459.GemsFDTD  -0.25%
470.lbm   +1.44%
482.sphinx3   +0.61%

Sergos

On Tue, Oct 22, 2013 at 5:12 PM, Sergey Ostanevich sergos@gmail.com wrote:
 applying to the same as before

 commit 0e3dfadd374c3045a926afa6d06af276cee2108d
 Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Fri Oct 18 08:36:28 2013 +

 trying '06 build.


 for take 2 on HSW there were mixed results for passed tests:

 400.perlbench +0.25%
 401.bzip2 -0.40%
 429.mcf   -1.67%
 456.hmmer +0%
 458.sjeng -1.65%
 471.omnetpp   -1.72%
 483.xalancbmk -0.25%
 410.bwaves+0%
 433.milc  -0.28%
 444.namd  -1.13%
 450.soplex-1.01%
 459.GemsFDTD  -0.25%
 470.lbm   +1.44%
 482.sphinx3   +0.61%

 Sergos


 On Tue, Oct 22, 2013 at 2:56 PM, Jakub Jelinek ja...@redhat.com wrote:

 Hi!

 On Fri, Oct 18, 2013 at 05:45:15PM +0400, Sergey Ostanevich wrote:
  failed on 403 of '06:
  regclass.c: In function 'init_reg_sets':
  regclass.c:277:1: internal compiler error: tree check: expected
  ssa_name, have var_decl in copy_ssa_name_fn, at tree-ssanames.c:393
   init_reg_sets ()
   ^

 That is because for -ftree-loop-if-convert-stores or masked loads/stores
 we need TODO_update_ssa_only_virtuals and loop versioning relies on no
 SSA updates being needed in the loop being versioned.

 Attached is updated patchset (the fix is in the second patch, the third
 one needed adjusting because of the fix and also apparently Andrew's
 header reshuffling requires now additional include in the third patch).

 Will bootstrap/regtest this after my current bootstrap/regtest finishes.

 Jakub




Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 3)

2013-10-22 Thread Sergey Ostanevich
I use
-march=core-avx2 -static -Ofast -flto -funroll-loops
with all 3 patches applied to git version I mentioned

the same is in progress for 'take 3'

On Tue, Oct 22, 2013 at 5:26 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 22, 2013 at 05:16:29PM +0400, Sergey Ostanevich wrote:
 ouch.. html got me!

 applying to the same as before

 commit 0e3dfadd374c3045a926afa6d06af276cee2108d
 Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Fri Oct 18 08:36:28 2013 +

 trying '06 build.


 for take 2 on HSW there were mixed results for passed tests:

 400.perlbench +0.25%
 401.bzip2 -0.40%
 429.mcf   -1.67%
 456.hmmer +0%
 458.sjeng -1.65%
 471.omnetpp   -1.72%
 483.xalancbmk -0.25%
 410.bwaves+0%
 433.milc  -0.28%
 444.namd  -1.13%
 450.soplex-1.01%
 459.GemsFDTD  -0.25%
 470.lbm   +1.44%
 482.sphinx3   +0.61%

 Thanks.

 If this is with AVX or AVX2, there can be multiple things.

 One is whether if-conversion in it's current form is really so harmful
 for generated code quality (we have various testcases where what we get with
 it is very bad, but perhaps there are still testcases where it improves
 non-vectorized loops), another one is when the masked loads/stores/gather is
 actually beneficial and whether we don't want some better tuning for it.
 The patchset as whole does both things, so determining what is what is hard.

 If only the first two patches are applied and not the third, then it should
 give a picture on whether it is a win to have if-conversion done in it's
 current form only for vectorized bbs.

 And, if all 3 patches are applied, but the condition in tree_if_conversion:
if ((flag_tree_loop_vectorize || loop-force_vect)
 (flag_tree_loop_if_convert == -1 || any_mask_load_store)
 !version_loop_for_if_conversion (loop, version_outer_loop,
any_mask_load_store))
  goto cleanup;
 is changed into just
   if (any_mask_load_store
!version_loop_for_if_conversion (loop, version_outer_loop, true))
 goto cleanup;
 then if-conversion would happen as before, just in cases where previously
 we wouldn't if-convert because of conditional loads/stores, but now we
 could, those changes would be limited to vectorized loops only.

 Jakub


Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 3)

2013-10-22 Thread Sergey Ostanevich
still fails on 403 et al.

regrename.c: In function 'regrename_optimize':
regrename.c:200:1: internal compiler error: Segmentation fault
 regrename_optimize ()
 ^
0x96698f crash_signal
../../gcc/toplev.c:335
0x9a2657 ssa_default_def(function*, tree_node*)
../../gcc/tree-dfa.c:310
0x9a2c68 get_or_create_ssa_default_def(function*, tree_node*)
../../gcc/tree-dfa.c:362
0x9d0773 get_reaching_def
../../gcc/tree-into-ssa.c:1188
0x9d0773 get_reaching_def
../../gcc/tree-into-ssa.c:1175
0x9d24f8 rewrite_update_phi_arguments
../../gcc/tree-into-ssa.c:2017
0x9d24f8 rewrite_update_dom_walker::before_dom_children(basic_block_def*)
../../gcc/tree-into-ssa.c:2133
0x9d24f8 rewrite_update_dom_walker::before_dom_children(basic_block_def*)
../../gcc/tree-into-ssa.c:2069
0xddcbea dom_walker::walk(basic_block_def*)
../../gcc/domwalk.c:176
0x9ceba2 rewrite_blocks
../../gcc/tree-into-ssa.c:2188
0x9d604e update_ssa(unsigned int)
../../gcc/tree-into-ssa.c:3311
0x9b729f version_loop_for_if_conversion
../../gcc/tree-if-conv.c:1962
0x9bad10 tree_if_conversion
../../gcc/tree-if-conv.c:2042
0x9bad10 main_tree_if_conversion
../../gcc/tree-if-conv.c:2097
0x9bad10 execute
../../gcc/tree-if-conv.c:2147
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
specmake: *** [regrename.o] Error 1

On Tue, Oct 22, 2013 at 6:08 PM, Sergey Ostanevich sergos@gmail.com wrote:
 I use
 -march=core-avx2 -static -Ofast -flto -funroll-loops
 with all 3 patches applied to git version I mentioned

 the same is in progress for 'take 3'

 On Tue, Oct 22, 2013 at 5:26 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 22, 2013 at 05:16:29PM +0400, Sergey Ostanevich wrote:
 ouch.. html got me!

 applying to the same as before

 commit 0e3dfadd374c3045a926afa6d06af276cee2108d
 Author: rguenth rguenth@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Fri Oct 18 08:36:28 2013 +

 trying '06 build.


 for take 2 on HSW there were mixed results for passed tests:

 400.perlbench +0.25%
 401.bzip2 -0.40%
 429.mcf   -1.67%
 456.hmmer +0%
 458.sjeng -1.65%
 471.omnetpp   -1.72%
 483.xalancbmk -0.25%
 410.bwaves+0%
 433.milc  -0.28%
 444.namd  -1.13%
 450.soplex-1.01%
 459.GemsFDTD  -0.25%
 470.lbm   +1.44%
 482.sphinx3   +0.61%

 Thanks.

 If this is with AVX or AVX2, there can be multiple things.

 One is whether if-conversion in it's current form is really so harmful
 for generated code quality (we have various testcases where what we get with
 it is very bad, but perhaps there are still testcases where it improves
 non-vectorized loops), another one is when the masked loads/stores/gather is
 actually beneficial and whether we don't want some better tuning for it.
 The patchset as whole does both things, so determining what is what is hard.

 If only the first two patches are applied and not the third, then it should
 give a picture on whether it is a win to have if-conversion done in it's
 current form only for vectorized bbs.

 And, if all 3 patches are applied, but the condition in tree_if_conversion:
if ((flag_tree_loop_vectorize || loop-force_vect)
 (flag_tree_loop_if_convert == -1 || any_mask_load_store)
 !version_loop_for_if_conversion (loop, version_outer_loop,
any_mask_load_store))
  goto cleanup;
 is changed into just
   if (any_mask_load_store
!version_loop_for_if_conversion (loop, version_outer_loop, true))
 goto cleanup;
 then if-conversion would happen as before, just in cases where previously
 we wouldn't if-convert because of conditional loads/stores, but now we
 could, those changes would be limited to vectorized loops only.

 Jakub


Re: [RFC] By default if-convert only basic blocks that will be vectorized (take 2)

2013-10-18 Thread Sergey Ostanevich
failed on 403 of '06:
regclass.c: In function 'init_reg_sets':
regclass.c:277:1: internal compiler error: tree check: expected
ssa_name, have var_decl in copy_ssa_name_fn, at tree-ssanames.c:393
 init_reg_sets ()
 ^
0xb74124 tree_check_failed(tree_node const*, char const*, int, char const*, ...)
../../gcc/tree.c:9301
0xb0ade2 tree_check
../../gcc/tree.h:2674
0xb0ade2 copy_ssa_name_fn(function*, tree_node*, gimple_statement_d*)
../../gcc/tree-ssanames.c:393
0xb0ae6c duplicate_ssa_name_fn(function*, tree_node*, gimple_statement_d*)
../../gcc/tree-ssanames.c:452
0x9ce82d duplicate_ssa_name
../../gcc/tree-ssanames.h:121
0x9ce82d create_new_def_for(tree_node*, gimple_statement_d*, tree_node**)
../../gcc/tree-into-ssa.c:2836
0x986a4e gimple_duplicate_bb
../../gcc/tree-cfg.c:5496
0x6425c9 duplicate_block(basic_block_def*, edge_def*, basic_block_def*)
../../gcc/cfghooks.c:1040
0x642bfe copy_bbs(basic_block_def**, unsigned int, basic_block_def**,
edge_def**, unsigned int, edge_def**, loop*, basic_block_def*, bool)
../../gcc/cfghooks.c:1320
0x649d69 duplicate_loop_to_header_edge(loop*, edge_def*, unsigned int,
simple_bitmap_def*, edge_def*, vecedge_def*, va_heap, vl_ptr*, int)
../../gcc/cfgloopmanip.c:1314
0xa83d09 gimple_duplicate_loop_to_header_edge(loop*, edge_def*,
unsigned int, simple_bitmap_def*, edge_def*, vecedge_def*, va_heap,
vl_ptr*, int)
../../gcc/tree-ssa-loop-manip.c:761
0x64b760 loop_version(loop*, void*, basic_block_def**, unsigned int,
unsigned int, unsigned int, bool)
../../gcc/cfgloopmanip.c:1706
0x9b71ae version_loop_for_if_conversion
../../gcc/tree-if-conv.c:1943
0x9baa3e tree_if_conversion
../../gcc/tree-if-conv.c:2069
0x9baa3e main_tree_if_conversion
../../gcc/tree-if-conv.c:2089
0x9baa3e execute
../../gcc/tree-if-conv.c:2139
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
specmake: *** [regclass.o] Error 1

On Thu, Oct 17, 2013 at 8:55 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 On Tue, Oct 15, 2013 at 02:32:25PM +0200, Jakub Jelinek wrote:
 Especially on i?86/x86_64 if-conversion pass seems to be often
 a pessimization, but the vectorization relies on it and without it we can't
 vectorize a lot of the loops.

 Here is an updated patchset:
 - first patch is an updated version of the
   http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01058.html
   patch
 - second patch is one variant (easier to write and perhaps maintain,
   but perhaps with higher compile time and memory requirements) of the
   follow up to support if-converted inner loop in outer loop vectorization;
   I have started also working on variant where just the vectorizer
   groks outer loop with LOOP_VECTORIZED call and two inner loops,
   but am only about half way through the needed changes and would
   prefer to wait how these patches actually work out together
 - third patch is an updated version of the
   http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html patch, this
   time without if-unconversion pass, but instead performed only in the
   versioned loops for vectorization purposes

 All 3 patches bootstrapped/regtested together on x86_64-linux and
 i686-linux, the first patch and first+second patch additionally tested
 alone with
 make -C gcc -j4 -k check RUNTESTFLAGS=vect.exp
 make -C gcc check-gfortran RUNTESTFLAGS=dg.exp='assumed_rank* pr32533*' 
 execute.exp='forall* intrinsic_mmloc* pr54767.f90'
 make -C x86*/libgomp check RUNTESTFLAGS=fortran.exp=omp_parse*
 (the latter two are set of tests that failed at some point during
 the development of the patchset).

 If anyone has spare cycles to e.g. SPEC2k{,6} test it on his favourite
 platforms (both for resulting code performance and for compile time
 and/or compile memory usage), it would be greatly appreciated.

 The only targets with masked load/store support are right now i?86/x86_64
 with -mavx and higher (and for masked gather load support only -mavx2
 and higher), so the last patch will probably be only beneficial to those for
 the time being.

 Jakub


Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread Sergey Ostanevich
Jakub, Richard,

I believe this patch is a good opportunity to improve the
vectorization capabilities.
I have the following question related to it: whether we plan to treat
the #pragma omp simd as a
directive to vectorize the underlying loop, hence dropping any
assessment regarding profitablity?


Regards,
Sergos

On Tue, Oct 15, 2013 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 Especially on i?86/x86_64 if-conversion pass seems to be often
 a pessimization, but the vectorization relies on it and without it we can't
 vectorize a lot of the loops.

 Here is a prototype of a patch that will by default (unless explicit
 -ftree-loop-if-convert) only if-convert loops internally for vectorization,
 so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
 basic blocks, but will not appear if vectorization fails, or in the
 scalar loop if vectorization is conditional, or in the prologue or epilogue
 loops around the vectorized loop.

 Instead of moving the ifcvt pass inside of the vectorizer, this patch
 during ifcvt performs loop versioning depending on a special internal
 call, only if the internal call returns true we go to the if-converted
 original loop, otherwise the non-if-converted copy of the original loop
 is performed.  And the vectorizer is taught to fold this internal call
 into true resp. false depending on if the loop was vectorized or not, and
 vectorizer loop versioning, peeling for alignment and for bound are adjusted
 to also copy from the non-if-converted loop rather than if-converted one.

 Besides fixing the various PRs where if-conversion pessimizes code I'd like
 to also move forward with this with conditional loads and stores,
 http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
 where the if-unconversion approach looked like a failure.

 This patch doesn't yet handle if-converted inner loop in outer loop
 vectorization, something on my todo list (so several vect-cond-*.c tests
 FAIL because they are no longer vectorized) plus I had to change two
 SLP vectorization tests that silently relied on loop if-conversion being
 performed to actually optimize the basic block (if the same thing didn't
 appear in a loop, it wouldn't be optimized at all).

 On the newly added testcase on x86_64, there are before this patch
 18 scalar conditional moves, with the patch just 2 (both in the checking
 routine).

 Comments?

 --- gcc/internal-fn.def.jj  2013-10-11 14:32:57.079909782 +0200
 +++ gcc/internal-fn.def 2013-10-11 17:23:58.705526840 +0200
 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
  DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 --- gcc/tree-vect-loop-manip.c.jj   2013-09-30 22:13:47.0 +0200
 +++ gcc/tree-vect-loop-manip.c  2013-10-15 12:57:54.854970913 +0200
 @@ -374,24 +374,31 @@ LOOP-  loop1

  static void
  slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
 +   struct loop *scalar_loop,
  bool is_new_loop, basic_block 
 *new_exit_bb)
  {
 -  gimple orig_phi, new_phi;
 +  gimple orig_phi, new_phi, scalar_phi = NULL;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge-dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e-dest;
basic_block orig_bb = loop-header;
 -  edge new_exit_e;
 +  edge new_exit_e, scalar_e = NULL;
tree current_new_name;
 -  gimple_stmt_iterator gsi_orig, gsi_update;
 +  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();

/* Create new bb between loop and new_merge_bb.  */
*new_exit_bb = split_edge (single_exit (loop));

new_exit_e = EDGE_SUCC (*new_exit_bb, 0);

 +  if (scalar_loop != NULL  !is_new_loop)
 +{
 +  gsi_scalar = gsi_start_phis (scalar_loop-header);
 +  scalar_e = EDGE_SUCC (scalar_loop-latch, 0);
 +}
 +
for (gsi_orig = gsi_start_phis (orig_bb),
 gsi_update = gsi_start_phis (update_bb);
 !gsi_end_p (gsi_orig)  !gsi_end_p (gsi_update);
 @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
tree new_res;
orig_phi = gsi_stmt (gsi_orig);
update_phi = gsi_stmt (gsi_update);
 +  if (scalar_e != NULL)
 +   {
 + scalar_phi = gsi_stmt (gsi_scalar);
 + gsi_next (gsi_scalar);
 +   }

/** 1. Handle new-merge-point phis  **/

 @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge
  current_new_name = loop_arg;
else
  {
 -  current_new_name = get_current_def (loop_arg);
 + if (scalar_e)
 +   {
 + current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e);
 + current_new_name = 

Re: [i386, patch, RFC] HLE support in GCC

2012-04-18 Thread Sergey Ostanevich
On Tue, Apr 17, 2012 at 6:41 PM, Andi Kleen a...@firstfloor.org wrote:
 I also have a question regarding AS compatibility. In case one built
 GCC using AS with support of HLE then using this GCC on a machine with
 old AS will cause fail because of usupported prefix. Can we support it

 I don't think that's a supported use case for gcc.
 It also doesn't work with .cfi* intrinsics and some other things.

Well, it's hard to speculate here. What I rely upon is the fact that
GCC I have on my Fedora is from gcc-4.6.0-10.fc15.x86_64.rpm and the
latter contains no AS within.
There should be dependencies so that AS will be updated alongside with
GCC? Otherwise upon update to new GCC I can see fails in my project
build.

 compile time rather configure time?

 The only way to do that would be to always generate .byte,
 but the people who read the assembler output would hate you
 for it.

Totally agree, it's the best way to hurt your karma. :)
Although detection of AS capabilities is available at compile time and
can be used to evade compfail - at least in case assembler is involved
(no -S provided)

Sergos


Re: [i386, patch, RFC] HLE support in GCC

2012-04-17 Thread Sergey Ostanevich

 Any other inputs?


I would suggest to use snprintf b/gcc/config/i386/i386-c.c to avoid
possible buffer overrun.

I also have a question regarding AS compatibility. In case one built
GCC using AS with support of HLE then using this GCC on a machine with
old AS will cause fail because of usupported prefix. Can we support it
compile time rather configure time?

regards,
Sergos


Re: [PATCH] PR target/50038 fix: redundant zero extensions removal

2011-11-16 Thread Sergey Ostanevich
Eric,

I will follow up while Ilya is on vacation.
I can see only one patch along the dicussion so I will use it, making
changes to follow phase renaming and documentation?

I am covered by FSF agreement too, on the same Intel's list as Ilya.

regards,
Sergos


On Fri, Nov 11, 2011 at 2:12 PM, Eric Botcazou ebotca...@adacore.com wrote:
 I have already signed copyright agreement with the FSF. Will I need
 the separate one for this particular commit?

 No, if your contributions are already covered by a copyright agreement with 
 the
 FSF, nothing more needs to be done.

 --
 Eric Botcazou



Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-11-06 Thread Sergey Ostanevich
On Wed, Nov 2, 2011 at 3:42 PM, Richard Guenther rguent...@suse.de wrote:
 On Sat, 29 Oct 2011, Sergey Ostanevich wrote:

 On Fri, Oct 28, 2011 at 7:25 PM, Sergey Ostanevich sergos@gmail.com 
 wrote:
  On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther rguent...@suse.de 
  wrote:
  On Fri, 28 Oct 2011, Sergey Ostanevich wrote:
 
  On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de 
  wrote:
   On Thu, 27 Oct 2011, Uros Bizjak wrote:
  
   Hello!
  
Here's a patch for PR47698, which is about CMOV should not be
generated for memory address marked as volatile.
Successfully bootstrapped and passed make check on 
x86_64-unknown-linux-gnu.
  
  
         PR rtl-optimization/47698
         * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV 
   generation
         for volatile mem
  
         PR rtl-optimization/47698
         * gcc.target/i386/47698.c: New test
  
   Please use punctuation marks and correct capitalization in ChangeLog 
   entries.
  
   OTOH, do we want to fix this per-target, or in the middle-end?
  
   The middle-end pattern documentation does not say operands 2 and 3
   are not evaluated if they do not end up being stored, so a middle-end
   fix is more appropriate.
  
   Richard.
  
 
  I have two observations:
 
  - the code for CMOV is under #ifdef in the mddle-end, which is
  explicitly marked as have to be removed (ifcvt.c:1446)
  - I have no clear evidence all platforms that support conditional move
  have the same semantics that lead to the PR
 
  I think the best way to address both concerns is to implement code
  that relies on а new hookup volatile-safe CMOV that is false by
  default.
 
  I suppose it's never safe for all architectures that support
  memory operands in the source operand.
 
  Richard.
 
  ok, at least there should be no big problem of missing optimization
  around volatile memory.
 
  apparently the problem is here:
 
  ifcvt.c:2539 there is a test for side effects of source (which is 'a'
  in this case)
 
  2539      if (! noce_operand_ok (a) || ! noce_operand_ok (b))
  (gdb) p debug_rtx(a)
  (mem/v/c/i:DI (symbol_ref:DI (mmio) [flags 0x40] var_decl
  0x71339140 mmio) [2 mmio+0 S8 A64])
 
  but inside noce_operand_ok() there is a wrong order of tests:
 
  2332      if (MEM_P (op))
  2333        return ! side_effects_p (XEXP (op, 0));
  2334
  2335      if (side_effects_p (op))
  2336        return FALSE;
  2337
 
  where XEXP removes the memory reference leaving just symbol reference,
  that has no volatile attribute
  #0  side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152
  (gdb) p debug_rtx(x)
  (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio)
 
  Is the following fix is Ok?
  I'm testing it so far.
 
  Sergos
 
  diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
  index 784e2e8..3b05c2a 100644
  --- a/gcc/ifcvt.c
  +++ b/gcc/ifcvt.c
  @@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op)
   {
    /* We special-case memories, so handle any of them with
       no address side effects.  */
  -  if (MEM_P (op))
  -    return ! side_effects_p (XEXP (op, 0));
  -
    if (side_effects_p (op))
      return FALSE;
 
  +  if (MEM_P (op))
  +    return ! side_effects_p (XEXP (op, 0));
  +
    return ! may_trap_p (op);
   }
 
  diff --git a/gcc/testsuite/gcc.target/i386/47698.c
  b/gcc/testsuite/gcc.target/i386/47698.c
  new file mode 100644
  index 000..2c75109
  --- /dev/null
  +++ b/gcc/testsuite/gcc.target/i386/47698.c
  @@ -0,0 +1,10 @@
  +/* { dg-options -Os } */
  +/* { dg-final { scan-assembler-not cmov } } */
  +
  +extern volatile unsigned long mmio;
  +unsigned long foo(int cond)
  +{
  +      if (cond)
  +              return mmio;
  +        return 0;
  +}
 

 bootstrapped and passed make check successfully on x86_64-unknown-linux-gnu

 Ok.

 Thanks,
 Richard.

Could someone please commit it for me?

Sergos

/gcc

 2011-11-06 Sergey Ostanevich sergos@gmail.com

 PR rtl-optimization/47698
 * ifconv.c (noce_operand_ok): prevent CMOV generation
 for volatile mem

/testsuites

 2011-11-06 Sergey Ostanevich sergos@gmail.com

 PR rtl-optimization/47698
 * gcc.target/i386/47698.c: New test


47698.patch
Description: Binary data


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Sergey Ostanevich
On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de wrote:
 On Thu, 27 Oct 2011, Uros Bizjak wrote:

 Hello!

  Here's a patch for PR47698, which is about CMOV should not be
  generated for memory address marked as volatile.
  Successfully bootstrapped and passed make check on 
  x86_64-unknown-linux-gnu.


       PR rtl-optimization/47698
       * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
       for volatile mem

       PR rtl-optimization/47698
       * gcc.target/i386/47698.c: New test

 Please use punctuation marks and correct capitalization in ChangeLog entries.

 OTOH, do we want to fix this per-target, or in the middle-end?

 The middle-end pattern documentation does not say operands 2 and 3
 are not evaluated if they do not end up being stored, so a middle-end
 fix is more appropriate.

 Richard.


I have two observations:

- the code for CMOV is under #ifdef in the mddle-end, which is
explicitly marked as have to be removed (ifcvt.c:1446)
- I have no clear evidence all platforms that support conditional move
have the same semantics that lead to the PR

I think the best way to address both concerns is to implement code
that relies on а new hookup volatile-safe CMOV that is false by
default.

regards,
Sergos


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Sergey Ostanevich
On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther rguent...@suse.de wrote:
 On Fri, 28 Oct 2011, Sergey Ostanevich wrote:

 On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de wrote:
  On Thu, 27 Oct 2011, Uros Bizjak wrote:
 
  Hello!
 
   Here's a patch for PR47698, which is about CMOV should not be
   generated for memory address marked as volatile.
   Successfully bootstrapped and passed make check on 
   x86_64-unknown-linux-gnu.
 
 
        PR rtl-optimization/47698
        * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
        for volatile mem
 
        PR rtl-optimization/47698
        * gcc.target/i386/47698.c: New test
 
  Please use punctuation marks and correct capitalization in ChangeLog 
  entries.
 
  OTOH, do we want to fix this per-target, or in the middle-end?
 
  The middle-end pattern documentation does not say operands 2 and 3
  are not evaluated if they do not end up being stored, so a middle-end
  fix is more appropriate.
 
  Richard.
 

 I have two observations:

 - the code for CMOV is under #ifdef in the mddle-end, which is
 explicitly marked as have to be removed (ifcvt.c:1446)
 - I have no clear evidence all platforms that support conditional move
 have the same semantics that lead to the PR

 I think the best way to address both concerns is to implement code
 that relies on а new hookup volatile-safe CMOV that is false by
 default.

 I suppose it's never safe for all architectures that support
 memory operands in the source operand.

 Richard.

ok, at least there should be no big problem of missing optimization
around volatile memory.

apparently the problem is here:

ifcvt.c:2539 there is a test for side effects of source (which is 'a'
in this case)

2539  if (! noce_operand_ok (a) || ! noce_operand_ok (b))
(gdb) p debug_rtx(a)
(mem/v/c/i:DI (symbol_ref:DI (mmio) [flags 0x40] var_decl
0x71339140 mmio) [2 mmio+0 S8 A64])

but inside noce_operand_ok() there is a wrong order of tests:

2332  if (MEM_P (op))
2333return ! side_effects_p (XEXP (op, 0));
2334
2335  if (side_effects_p (op))
2336return FALSE;
2337

where XEXP removes the memory reference leaving just symbol reference,
that has no volatile attribute
#0  side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152
(gdb) p debug_rtx(x)
(symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio)

Is the following fix is Ok?
I'm testing it so far.

Sergos

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 784e2e8..3b05c2a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op)
 {
   /* We special-case memories, so handle any of them with
  no address side effects.  */
-  if (MEM_P (op))
-return ! side_effects_p (XEXP (op, 0));
-
   if (side_effects_p (op))
 return FALSE;

+  if (MEM_P (op))
+return ! side_effects_p (XEXP (op, 0));
+
   return ! may_trap_p (op);
 }

diff --git a/gcc/testsuite/gcc.target/i386/47698.c
b/gcc/testsuite/gcc.target/i386/47698.c
new file mode 100644
index 000..2c75109
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/47698.c
@@ -0,0 +1,10 @@
+/* { dg-options -Os } */
+/* { dg-final { scan-assembler-not cmov } } */
+
+extern volatile unsigned long mmio;
+unsigned long foo(int cond)
+{
+  if (cond)
+  return mmio;
+return 0;
+}


47698.patch
Description: Binary data


Re: [PATCH i386] PR47698 no CMOV for volatile mem

2011-10-28 Thread Sergey Ostanevich
On Fri, Oct 28, 2011 at 7:25 PM, Sergey Ostanevich sergos@gmail.com wrote:
 On Fri, Oct 28, 2011 at 4:52 PM, Richard Guenther rguent...@suse.de wrote:
 On Fri, 28 Oct 2011, Sergey Ostanevich wrote:

 On Fri, Oct 28, 2011 at 12:16 PM, Richard Guenther rguent...@suse.de 
 wrote:
  On Thu, 27 Oct 2011, Uros Bizjak wrote:
 
  Hello!
 
   Here's a patch for PR47698, which is about CMOV should not be
   generated for memory address marked as volatile.
   Successfully bootstrapped and passed make check on 
   x86_64-unknown-linux-gnu.
 
 
        PR rtl-optimization/47698
        * config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV 
  generation
        for volatile mem
 
        PR rtl-optimization/47698
        * gcc.target/i386/47698.c: New test
 
  Please use punctuation marks and correct capitalization in ChangeLog 
  entries.
 
  OTOH, do we want to fix this per-target, or in the middle-end?
 
  The middle-end pattern documentation does not say operands 2 and 3
  are not evaluated if they do not end up being stored, so a middle-end
  fix is more appropriate.
 
  Richard.
 

 I have two observations:

 - the code for CMOV is under #ifdef in the mddle-end, which is
 explicitly marked as have to be removed (ifcvt.c:1446)
 - I have no clear evidence all platforms that support conditional move
 have the same semantics that lead to the PR

 I think the best way to address both concerns is to implement code
 that relies on а new hookup volatile-safe CMOV that is false by
 default.

 I suppose it's never safe for all architectures that support
 memory operands in the source operand.

 Richard.

 ok, at least there should be no big problem of missing optimization
 around volatile memory.

 apparently the problem is here:

 ifcvt.c:2539 there is a test for side effects of source (which is 'a'
 in this case)

 2539      if (! noce_operand_ok (a) || ! noce_operand_ok (b))
 (gdb) p debug_rtx(a)
 (mem/v/c/i:DI (symbol_ref:DI (mmio) [flags 0x40] var_decl
 0x71339140 mmio) [2 mmio+0 S8 A64])

 but inside noce_operand_ok() there is a wrong order of tests:

 2332      if (MEM_P (op))
 2333        return ! side_effects_p (XEXP (op, 0));
 2334
 2335      if (side_effects_p (op))
 2336        return FALSE;
 2337

 where XEXP removes the memory reference leaving just symbol reference,
 that has no volatile attribute
 #0  side_effects_p (x=0x7149c660) at ../../gcc/rtlanal.c:2152
 (gdb) p debug_rtx(x)
 (symbol_ref:DI (mmio) [flags 0x40] var_decl 0x71339140 mmio)

 Is the following fix is Ok?
 I'm testing it so far.

 Sergos

 diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
 index 784e2e8..3b05c2a 100644
 --- a/gcc/ifcvt.c
 +++ b/gcc/ifcvt.c
 @@ -2329,12 +2329,12 @@ noce_operand_ok (const_rtx op)
  {
   /* We special-case memories, so handle any of them with
      no address side effects.  */
 -  if (MEM_P (op))
 -    return ! side_effects_p (XEXP (op, 0));
 -
   if (side_effects_p (op))
     return FALSE;

 +  if (MEM_P (op))
 +    return ! side_effects_p (XEXP (op, 0));
 +
   return ! may_trap_p (op);
  }

 diff --git a/gcc/testsuite/gcc.target/i386/47698.c
 b/gcc/testsuite/gcc.target/i386/47698.c
 new file mode 100644
 index 000..2c75109
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/47698.c
 @@ -0,0 +1,10 @@
 +/* { dg-options -Os } */
 +/* { dg-final { scan-assembler-not cmov } } */
 +
 +extern volatile unsigned long mmio;
 +unsigned long foo(int cond)
 +{
 +      if (cond)
 +              return mmio;
 +        return 0;
 +}


bootstrapped and passed make check successfully on x86_64-unknown-linux-gnu

Sergos


[PATCH i386] PR47698 no CMOV for volatile mem

2011-10-27 Thread Sergey Ostanevich
Hi!

Here's a patch for PR47698, which is about CMOV should not be
generated for memory address marked as volatile.
Successfully bootstrapped and passed make check on x86_64-unknown-linux-gnu.

Is it Ok?

regards,
Sergos


/gcc

2011-10-27 Sergey Ostanevich

PR rtl-optimization/47698
* config/i386/i386.c (ix86_expand_int_movcc) prevent CMOV generation
for volatile mem

/testsuite

2011-10-27 Sergey Ostanevich

PR rtl-optimization/47698
* gcc.target/i386/47698.c: New test


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c6e09ae..afe5de3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18312,6 +18312,12 @@ ix86_expand_int_movcc (rtx operands[])
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);

+  /* MOVCC semantics implies that source is always read which is wrong
+ for devices I/O that are defined using volatile in C. PR47698 */
+
+  if (MEM_P (operands[2])  MEM_VOLATILE_P (operands[2]))
+return false;
+
   start_sequence ();
   compare_op = ix86_expand_compare (code, op0, op1);
   compare_seq = get_insns ();
diff --git a/gcc/testsuite/gcc.target/i386/47698.c
b/gcc/testsuite/gcc.target/i386/47698.c
new file mode 100644
index 000..2c75109
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/47698.c
@@ -0,0 +1,10 @@
+/* { dg-options -Os } */
+/* { dg-final { scan-assembler-not cmov } } */
+
+extern volatile unsigned long mmio;
+unsigned long foo(int cond)
+{
+  if (cond)
+  return mmio;
+return 0;
+}


47698.patch
Description: Binary data


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-20 Thread Sergey Ostanevich
 Please provide a patch which can be applied.  Cut/paste doesn't create
 a working patch.  Please attach it.

 --
 H.J.


Will that works?
Sergos.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6c73404..e21cf86 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2011-10-20  Sergey Ostanevich  sergos@gmail.com
+
+   * config/i386/i386.c (processor_target_table): Change Atom
+   align_loops_max_skip to 15.
+
 2011-10-17  Michael Spertus  mike_sper...@symantec.com

* gcc/c-family/c-common.c (c_common_reswords): Add __bases,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..8c60086 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2596,7 +2596,7 @@ static const struct ptt
processor_target_table[PROCESSOR_max] =
   {bdver1_cost, 32, 24, 32, 7, 32},
   {bdver2_cost, 32, 24, 32, 7, 32},
   {btver1_cost, 32, 24, 32, 7, 32},
-  {atom_cost, 16, 7, 16, 7, 16}
+  {atom_cost, 16, 15, 16, 7, 16}
 };

 static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =


[PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Sergey Ostanevich
Hi!

Here is a patch to address PR50572.
Successfully bootstrapped and passed make check on x86_64-linux.

regards,
Sergos



2011-10-19  Sergey Ostanevich  sergos@gmail.com

* gcc/config/i386/i386.c (ix86_option_override_internal): use loop
align by 16 bytes for Atom platform

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..7a93144 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3497,6 +3497,8 @@ ix86_option_override_internal (bool main_args_p)
 {
   align_loops = processor_target_table[ix86_tune].align_loop;
   align_loops_max_skip =
processor_target_table[ix86_tune].align_loop_max_skip;
+  if (ix86_tune == PROCESSOR_ATOM  optimize  1)
+align_loops_max_skip = 15;
 }
   if (align_jumps == 0)
 {


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Sergey Ostanevich
On Wed, Oct 19, 2011 at 4:14 PM, Uros Bizjak ubiz...@gmail.com wrote:

 You can just change the default in processor_target_table.

 Uros.


Will it be applicable during optimizations for size? It will hurt,
although not much (see PR).
New patch is below. Ok for trunk as obvious?

Sergos

2011-10-19  Sergey Ostanevich  sergos@gmail.com

* gcc/config/i386/i386.c (ix86_option_override_internal): use loop
align by 16 bytes for Atom platform

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..8c60086 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2596,7 +2596,7 @@ static const struct ptt
processor_target_table[PROCESSOR_max] =
   {bdver1_cost, 32, 24, 32, 7, 32},
   {bdver2_cost, 32, 24, 32, 7, 32},
   {btver1_cost, 32, 24, 32, 7, 32},
-  {atom_cost, 16, 7, 16, 7, 16}
+  {atom_cost, 16, 15, 16, 7, 16}
 };


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Sergey Ostanevich
On Wed, Oct 19, 2011 at 4:46 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich sergos@gmail.com 
 wrote:

 You can just change the default in processor_target_table.

 Uros.


 Will it be applicable during optimizations for size? It will hurt,
 although not much (see PR).

 Looking at the code, I'd say that we don't handle -Os in different way.

 New patch is below. Ok for trunk as obvious?

 Sergos

 2011-10-19  Sergey Ostanevich  sergos@gmail.com

        * gcc/config/i386/i386.c (ix86_option_override_internal): use loop
        align by 16 bytes for Atom platform

 Please update ChangeLog, like:

 * gcc/config/i386/i386.c (processor_target_table): Change Atom
 align_loop_max_skip to 15.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 2c53423..8c60086 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -2596,7 +2596,7 @@ static const struct ptt
 processor_target_table[PROCESSOR_max] =
   {bdver1_cost, 32, 24, 32, 7, 32},
   {bdver2_cost, 32, 24, 32, 7, 32},
   {btver1_cost, 32, 24, 32, 7, 32},
 -  {atom_cost, 16, 7, 16, 7, 16}
 +  {atom_cost, 16, 15, 16, 7, 16}
  };


 OK.

 Thanks,
 Uros.


Thanks for comments!
I double checked: for -Os there's no .p2align appeared.
For -O2 I see .p2align 4,,15 instead of .p2align 4,,7, as expected.

Can someone commit it please?

Regards,
Sergos


2011-10-18  Sergey Ostanevich sergos@gmail.com

* gcc/config/i386/i386.c (processor_target_table): Change Atom
align_loops_max_skip to 15.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..8c60086 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2596,7 +2596,7 @@ static const struct ptt
processor_target_table[PROCESSOR_max] =
   {bdver1_cost, 32, 24, 32, 7, 32},
   {bdver2_cost, 32, 24, 32, 7, 32},
   {btver1_cost, 32, 24, 32, 7, 32},
-  {atom_cost, 16, 7, 16, 7, 16}
+  {atom_cost, 16, 15, 16, 7, 16}
 };

 static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =


Re: RFA: patch to solve PR48455 and improve code size for -Os

2011-08-18 Thread Sergey Ostanevich


16.08.2011, в 23:07, Vladimir Makarov vmaka...@redhat.com написал(а):

 After a lot of thinking and some experiments, I did not find a better 
 solution to considerably (like on 0.2% - 0.3% on ARM SPEC2000) improve code 
 size than use of non-regional RA for -Os.
 

did you consider change of compile time because of single region RA?

 So the patch makes one region allocation is default for -Os.  I don't use 
 function optimize_function_for_size_p because I'd like to keep possibility 
 still set up regions independedly of used -O option.
 
 This final patch removes ARM code size regression reported on 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48455.
 
 The patch was successfully bootstrapped on x86-64 and ppc64 (actually the 
 generated code is not changed for -O2).
 
 Is it ok to commit?
 
 2011-08-16  Vladimir Makarov vmaka...@redhat.com
 
PR rtl-optimization/48455
* doc/invoke.texi (-fira-region): Document default values.
 
* flags-types.h (enum ira_region): Add new value
IRA_REGION_AUTODETECT.
 
* common.opt (fira-region): Set up initial value to
IRA_REGION_AUTODETECT.
 
* toplev.c (process_options): Set up flag_ira_region depending on
-O options.
 
* ira.c (ira.c): Remove optimize guard for ira_build.
 
 code-size.patch