Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-07-11 Thread Ilya Enkovich
Ping

2016-06-23 12:54 GMT+03:00 Ilya Enkovich :
> On 22 Jun 11:42, Jeff Law wrote:
>> On 06/22/2016 10:09 AM, Ilya Enkovich wrote:
>>
>> >>Given the common structure & duplication I can't help but wonder if a 
>> >>single
>> >>function should be used for widening/narrowing.  Ultimately can't you swap
>> >>mask_elems/req_elems and always go narrower to wider (using a different
>> >>optab for the two different cases)?
>> >
>> >I think we can't always go in narrower to wider direction because widening
>> >uses two optabs wand also because the way insn_data is checked.
>> OK.  Thanks for considering.
>>
>> >>
>> >>I'm guessing Richi's comment about what tree type you're looking at refers
>> >>to this and similar instances.  Doesn't this give you the type of the 
>> >>number
>> >>of iterations rather than the type of the iteration variable itself?
>> >>
>> >>
>> >
>> >Since I build vector IV by myself and use to compare with NITERS I
>> >feel it's safe to
>> >use type of NITERS.  Do you expect NITERS and IV types differ?
>> Since you're comparing to NITERS, it sounds like you've got it right and
>> that Richi and I have it wrong.
>>
>> It's less a question of whether or not we expect NITERS and IV to have
>> different types, but more a realization that there's nothing that inherently
>> says they have to be the same.  THey probably are the same most of the time,
>> but I don't think that's something we can or should necessarily depend on.
>>
>>
>>
>> >>>@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
>> >>>gimple_stmt_iterator *gsi,
>> >>>   && !useless_type_conversion_p (vectype, rhs_vectype)))
>> >>> return false;
>> >>>
>> >>>+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
>> >>>+{
>> >>>+  /* Check that mask conjuction is supported.  */
>> >>>+  optab tab;
>> >>>+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
>> >>>+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
>> >>>CODE_FOR_nothing)
>> >>>+   {
>> >>>+ if (dump_enabled_p ())
>> >>>+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >>>+"cannot be masked: unsupported mask
>> >>>operation\n");
>> >>>+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>> >>>+   }
>> >>>+}
>> >>
>> >>Should the optab querying be in optab-query.c?
>> >
>> >We always directly call optab_handler for simple operations.  There are 
>> >dozens
>> >of such calls in vectorizer.
>> OK.  I would look favorably on a change to move those queries out into
>> optabs-query as a separate patch.
>>
>> >
>> >We don't embed masking capabilities into vectorizer.
>> >
>> >Actually we don't depend on masking capabilities so much.  We have to mask
>> >loads and stores and use can_mask_load_store for that which uses existing 
>> >optab
>> >query.  We also require masking for reductions and use VEC_COND for that
>> >(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
>> >can build required masks.  So we actually don't expose any new processor
>> >masking capabilities to GIMPLE.  I.e. all this works on targets with no
>> >rich masking capabilities.  E.g. we can mask loops for quite old SSE 
>> >targets.
>> OK.  I think the key here is that load/store masking already exists and the
>> others are either VEC_COND or checking if we can build the mask rather than
>> can the operation be masked.  THanks for clarifying.
>> jeff
>
> Here is an updated version with less typos and more comments.
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-23  Ilya Enkovich  
>
> * tree-vect-loop.c: Include insn-config.h and recog.h.
> (vect_check_required_masks_widening): New.
> (vect_check_required_masks_narrowing): New.
> (vect_get_masking_iv_elems): New.
> (vect_get_masking_iv_type): New.
> (vect_get_extreme_masks): New.
> (vect_check_required_masks): New.
> (vect_analyze_loop_operations): Add vect_check_required_masks
> call to compute LOOP_VINFO_CAN_BE_MASKED.
> (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
> LOOP_VINFO_NEED_MASKING before starting over.
> (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
> masking cost.
> * tree-vect-stmts.c (can_mask_load_store): New.
> (vect_model_load_masking_cost): New.
> (vect_model_store_masking_cost): New.
> (vect_model_simple_masking_cost): New.
> (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
> and masking cost.
> (vectorizable_simd_clone_call): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (vect_stmt_should_be_masked_for_epilogue): New.
> (vect_add_required_mask_for_stmt): New.
> (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
> * tree-vectorizer.h (vect_model_load_masking_cost): New.
> (vect_model_st

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-28 Thread Ilya Enkovich
On 23 Jun 12:54, Ilya Enkovich wrote:
> 
> Here is an updated version with less typos and more comments.
> 
> Thanks,
> Ilya
> --

Here is an updated version with trapping statements check added
to vect_analyze_stmt.

Thanks,
Ilya
--
gcc/

2016-06-28  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 36e60d4..1146de9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h" /* FIXME: for insn_data */
 #include "diagnostic-core.h"
 #include "fold-const.h"
 #include "stor-layout.h"
@@ -1603,6 +1605,270 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
 vectorization_factor);
 }
 
+/* Function vect_check_required_masks_widening.
+
+   Return 1 if vector mask of type MASK_TYPE can be widened
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_widening (loop_vec_info loop_vinfo,
+   tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (mask_elems > req_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (mask_elems / req_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
+  if (icode == CODE_FOR_nothing
+  || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
+return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+{
+  mask_elems = mask_elems >> 1;
+  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+
+  if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
+  req_elems))
+   return false;
+}
+  else
+{
+  mask_type = build_truth_vector_type (req_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+}
+
+  return true;
+}
+
+/* Function vect_check_required_masks_narrowing.
+
+   Return 1 if vector mask of type MASK_TYPE can be narrowed
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
+tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (req_elems > mask_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (req_elems / mask_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_pack_trunc_optab, mode);
+  if (icode == CODE_FOR_nothing)
+return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+{
+  mask_elems = mask_elems << 1;
+  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+
+  if (!vect_check_req

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-23 Thread Ilya Enkovich
On 22 Jun 11:42, Jeff Law wrote:
> On 06/22/2016 10:09 AM, Ilya Enkovich wrote:
> 
> >>Given the common structure & duplication I can't help but wonder if a single
> >>function should be used for widening/narrowing.  Ultimately can't you swap
> >>mask_elems/req_elems and always go narrower to wider (using a different
> >>optab for the two different cases)?
> >
> >I think we can't always go in narrower to wider direction because widening
> >uses two optabs wand also because the way insn_data is checked.
> OK.  Thanks for considering.
> 
> >>
> >>I'm guessing Richi's comment about what tree type you're looking at refers
> >>to this and similar instances.  Doesn't this give you the type of the number
> >>of iterations rather than the type of the iteration variable itself?
> >>
> >>
> >
> >Since I build vector IV by myself and use to compare with NITERS I
> >feel it's safe to
> >use type of NITERS.  Do you expect NITERS and IV types differ?
> Since you're comparing to NITERS, it sounds like you've got it right and
> that Richi and I have it wrong.
> 
> It's less a question of whether or not we expect NITERS and IV to have
> different types, but more a realization that there's nothing that inherently
> says they have to be the same.  THey probably are the same most of the time,
> but I don't think that's something we can or should necessarily depend on.
> 
> 
> 
> >>>@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
> >>>gimple_stmt_iterator *gsi,
> >>>   && !useless_type_conversion_p (vectype, rhs_vectype)))
> >>> return false;
> >>>
> >>>+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> >>>+{
> >>>+  /* Check that mask conjuction is supported.  */
> >>>+  optab tab;
> >>>+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
> >>>+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
> >>>CODE_FOR_nothing)
> >>>+   {
> >>>+ if (dump_enabled_p ())
> >>>+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>>+"cannot be masked: unsupported mask
> >>>operation\n");
> >>>+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> >>>+   }
> >>>+}
> >>
> >>Should the optab querying be in optab-query.c?
> >
> >We always directly call optab_handler for simple operations.  There are 
> >dozens
> >of such calls in vectorizer.
> OK.  I would look favorably on a change to move those queries out into
> optabs-query as a separate patch.
> 
> >
> >We don't embed masking capabilities into vectorizer.
> >
> >Actually we don't depend on masking capabilities so much.  We have to mask
> >loads and stores and use can_mask_load_store for that which uses existing 
> >optab
> >query.  We also require masking for reductions and use VEC_COND for that
> >(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
> >can build required masks.  So we actually don't expose any new processor
> >masking capabilities to GIMPLE.  I.e. all this works on targets with no
> >rich masking capabilities.  E.g. we can mask loops for quite old SSE targets.
> OK.  I think the key here is that load/store masking already exists and the
> others are either VEC_COND or checking if we can build the mask rather than
> can the operation be masked.  THanks for clarifying.
> jeff

Here is an updated version with less typos and more comments.

Thanks,
Ilya
--
gcc/

2016-05-23  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c75d234..3b50168 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Jeff Law

On 06/22/2016 10:09 AM, Ilya Enkovich wrote:


Given the common structure & duplication I can't help but wonder if a single
function should be used for widening/narrowing.  Ultimately can't you swap
mask_elems/req_elems and always go narrower to wider (using a different
optab for the two different cases)?


I think we can't always go in narrower to wider direction because widening
uses two optabs wand also because the way insn_data is checked.

OK.  Thanks for considering.



I'm guessing Richi's comment about what tree type you're looking at refers
to this and similar instances.  Doesn't this give you the type of the number
of iterations rather than the type of the iteration variable itself?




Since I build vector IV by myself and use to compare with NITERS I
feel it's safe to
use type of NITERS.  Do you expect NITERS and IV types differ?
Since you're comparing to NITERS, it sounds like you've got it right and 
that Richi and I have it wrong.


It's less a question of whether or not we expect NITERS and IV to have 
different types, but more a realization that there's nothing that 
inherently says they have to be the same.  THey probably are the same 
most of the time, but I don't think that's something we can or should 
necessarily depend on.





@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
gimple_stmt_iterator *gsi,
   && !useless_type_conversion_p (vectype, rhs_vectype)))
 return false;

+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+{
+  /* Check that mask conjuction is supported.  */
+  optab tab;
+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: unsupported mask
operation\n");
+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+   }
+}


Should the optab querying be in optab-query.c?


We always directly call optab_handler for simple operations.  There are dozens
of such calls in vectorizer.
OK.  I would look favorably on a change to move those queries out into 
optabs-query as a separate patch.




We don't embed masking capabilities into vectorizer.

Actually we don't depend on masking capabilities so much.  We have to mask
loads and stores and use can_mask_load_store for that which uses existing optab
query.  We also require masking for reductions and use VEC_COND for that
(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
can build required masks.  So we actually don't expose any new processor
masking capabilities to GIMPLE.  I.e. all this works on targets with no
rich masking capabilities.  E.g. we can mask loops for quite old SSE targets.
OK.  I think the key here is that load/store masking already exists and 
the others are either VEC_COND or checking if we can build the mask 
rather than can the operation be masked.  THanks for clarifying.

jeff


Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Jeff Law

On 06/22/2016 09:03 AM, Ilya Enkovich wrote:

2016-06-16 9:26 GMT+03:00 Jeff Law :

On 06/15/2016 05:22 AM, Richard Biener wrote:



You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
this is meaningful (if then only by accident).  I think you should look at
the
control IV itself, possibly it's value-range, to determine the smallest
possible
type to use.


Can we get an IV that's created after VRP?  If so, then we have to be
prepared for the case where there's no range information on the IV.  At
which point I think using type min/max of the IV is probably the right
fallback.  But I do think we should be looking at range info much more
systematically.

I can't see how TREE_TYPE of the NITERS makes sense either.


I need to build a vector {niters, ..., niters} and compare to it.  Why doesn't
it make sense to choose the same type for IV?  I agree that choosing a smaller
type may be beneficial.   Shouldn't I look at nb_iterations_upper_bound then
to check if NITERS can be casted to a smaller type?
Isn't TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)) the type of the 
constant being used to represent the number of iterations?  That is 
independent of the type of the IV.


Though I guess your argument is that since you're building a vector of 
niters, that indeed what you want is the type of that constant, not the 
type of the IV.  That might be worth a comment in the code :-)


jeff



Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Ilya Enkovich
2016-06-16 10:08 GMT+03:00 Jeff Law :
> On 05/19/2016 01:42 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch introduces analysis to determine if loop can be masked
>> (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
>> and compute how much masking costs.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * tree-vect-loop.c: Include insn-config.h and recog.h.
>> (vect_check_required_masks_widening): New.
>> (vect_check_required_masks_narrowing): New.
>> (vect_get_masking_iv_elems): New.
>> (vect_get_masking_iv_type): New.
>> (vect_get_extreme_masks): New.
>> (vect_check_required_masks): New.
>> (vect_analyze_loop_operations): Add vect_check_required_masks
>> call to compute LOOP_VINFO_CAN_BE_MASKED.
>> (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
>> LOOP_VINFO_NEED_MASKING before starting over.
>> (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
>> masking cost.
>> * tree-vect-stmts.c (can_mask_load_store): New.
>> (vect_model_load_masking_cost): New.
>> (vect_model_store_masking_cost): New.
>> (vect_model_simple_masking_cost): New.
>> (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
>> and masking cost.
>> (vectorizable_simd_clone_call): Likewise.
>> (vectorizable_store): Likewise.
>> (vectorizable_load): Likewise.
>> (vect_stmt_should_be_masked_for_epilogue): New.
>> (vect_add_required_mask_for_stmt): New.
>> (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
>> * tree-vectorizer.h (vect_model_load_masking_cost): New.
>> (vect_model_store_masking_cost): New.
>> (vect_model_simple_masking_cost): New.
>>
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index e25a0ce..31360d3 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-pass.h"
>>  #include "ssa.h"
>>  #include "optabs-tree.h"
>> +#include "insn-config.h"
>> +#include "recog.h" /* FIXME: for insn_data */
>
> Ick :(
>
>
>> +
>> +/* Function vect_check_required_masks_narowing.
>
> narrowing
>
>
>> +
>> +   Return 1 if vector mask of type MASK_TYPE can be narrowed
>> +   to a type having REQ_ELEMS elements in a single vector.  */
>> +
>> +static bool
>> +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
>> +tree mask_type, unsigned req_elems)
>
> Given the common structure & duplication I can't help but wonder if a single
> function should be used for widening/narrowing.  Ultimately can't you swap
> mask_elems/req_elems and always go narrower to wider (using a different
> optab for the two different cases)?

I think we can't always go in narrower to wider direction because widening
uses two optabs wand also because the way insn_data is checked.

>
>
>
>
>> +
>> +/* Function vect_get_masking_iv_elems.
>> +
>> +   Return a number of elements in IV used for loop masking.  */
>> +static int
>> +vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
>> +{
>> +  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
>
> I'm guessing Richi's comment about what tree type you're looking at refers
> to this and similar instances.  Doesn't this give you the type of the number
> of iterations rather than the type of the iteration variable itself?
>
>

Since I build vector IV by myself and use to compare with NITERS I
feel it's safe to
use type of NITERS.  Do you expect NITERS and IV types differ?

>
>
>  +
>>
>> +  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
>> +{
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +"cannot be masked: required vector comparison "
>> +"is not supported.\n");
>> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>> +  return;
>> +}
>
> On a totally unrelated topic, I was speaking with David Malcolm earlier this
> week about how to turn this kind of missed optimization information we
> currently emit into dumps into something more easily consumed by users.
>
> The general issue is that we've got customers that want to understand why
> certain optimizations fire or do not fire.  They're by far more interested
> in the vectorizer than anything else.
>
> We have a sense that much of the information those customers desire is
> sitting in the dump files, but it's buried in there with other stuff that
> isn't generally useful to users.
>
> So we're pondering what it might take to take these glorified fprintf calls
> and turn them into a first class diagnostic that could be emitted to stderr
> or into the dump file depending (of course) on the options passed to GCC.
>
> The reason I bring this up is the hope that your team mi

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Ilya Enkovich
2016-06-16 9:26 GMT+03:00 Jeff Law :
> On 06/15/2016 05:22 AM, Richard Biener wrote:
>>
>>
>> You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
>> this is meaningful (if then only by accident).  I think you should look at
>> the
>> control IV itself, possibly it's value-range, to determine the smallest
>> possible
>> type to use.
>
> Can we get an IV that's created after VRP?  If so, then we have to be
> prepared for the case where there's no range information on the IV.  At
> which point I think using type min/max of the IV is probably the right
> fallback.  But I do think we should be looking at range info much more
> systematically.
>
> I can't see how TREE_TYPE of the NITERS makes sense either.

I need to build a vector {niters, ..., niters} and compare to it.  Why doesn't
it make sense to choose the same type for IV?  I agree that choosing a smaller
type may be beneficial.   Shouldn't I look at nb_iterations_upper_bound then
to check if NITERS can be casted to a smaller type?

Thanks,
Ilya

>
>> Finally we have a related missed optimization opportunity, namely avoiding
>> peeling for gaps if we mask the last load of the group (profitability
>> depends
>> on the overhead of such masking of course as it would be done in the main
>> vectorized loop).
>
> I think that's a specific instance of a more general question -- what
> transformations can be avoided by masking and can we generate costs to
> select between those transformations and masking.  Seems like a follow-up
> item rather than a requirement for this work to go forward to me.
>
> Jeff


Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-16 Thread Jeff Law

On 05/19/2016 01:42 PM, Ilya Enkovich wrote:

Hi,

This patch introduces analysis to determine if loop can be masked
(compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
and compute how much masking costs.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index e25a0ce..31360d3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"   /* FIXME: for insn_data */

Ick :(



+
+/* Function vect_check_required_masks_narowing.

narrowing



+
+   Return 1 if vector mask of type MASK_TYPE can be narrowed
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
+tree mask_type, unsigned req_elems)
Given the common structure & duplication I can't help but wonder if a 
single function should be used for widening/narrowing.  Ultimately can't 
you swap  mask_elems/req_elems and always go narrower to wider (using a 
different optab for the two different cases)?






+
+/* Function vect_get_masking_iv_elems.
+
+   Return a number of elements in IV used for loop masking.  */
+static int
+vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
+{
+  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
I'm guessing Richi's comment about what tree type you're looking at 
refers to this and similar instances.  Doesn't this give you the type of 
the number of iterations rather than the type of the iteration variable 
itself?





 +

+  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: required vector comparison "
+"is not supported.\n");
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  return;
+}
On a totally unrelated topic, I was speaking with David Malcolm earlier 
this week about how to turn this kind of missed optimization information 
we currently emit into dumps into something more easily consumed by users.


The general issue is that we've got customers that want to understand 
why certain optimizations fire or do not fire.  They're by far more 
interested in the vectorizer than anything else.


We have a sense that much of the information those customers desire is 
sitting in the dump files, but it's buried in there with other stuff 
that isn't generally useful to users.


So we're pondering what it might take to take these glorified fprintf 
calls and turn them into a first class diagnostic that could be emitted 
to stderr or into the dump file depending (of course) on the options 
passed to GCC.


The reason I bring this up is the hope that your team might have some 
insights based on what ICC has done the in the past for its customers.


Anyway, back to the code...



diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9ab4af4..91ebe5a 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "builtins.h"
 #include "internal-fn.h"
+#include "tree-ssa-loop-ivopts.h"

 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -535,6 +536,38 @@ process_use (gimple *stmt, tree use, loop_vec_info 
loop_vinfo, bool live_p

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-15 Thread Jeff Law

On 06/15/2016 05:22 AM, Richard Biener wrote:


You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
this is meaningful (if then only by accident).  I think you should look at the
control IV itself, possibly it's value-range, to determine the smallest possible
type to use.
Can we get an IV that's created after VRP?  If so, then we have to be 
prepared for the case where there's no range information on the IV.  At 
which point I think using type min/max of the IV is probably the right 
fallback.  But I do think we should be looking at range info much more 
systematically.


I can't see how TREE_TYPE of the NITERS makes sense either.


Finally we have a related missed optimization opportunity, namely avoiding
peeling for gaps if we mask the last load of the group (profitability depends
on the overhead of such masking of course as it would be done in the main
vectorized loop).
I think that's a specific instance of a more general question -- what 
transformations can be avoided by masking and can we generate costs to 
select between those transformations and masking.  Seems like a 
follow-up item rather than a requirement for this work to go forward to me.


Jeff


Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-15 Thread Richard Biener
On Thu, May 19, 2016 at 9:42 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch introduces analysis to determine if loop can be masked
> (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
> and compute how much masking costs.

Maybe in a different patch, but it looks like you assume say, a
division, does not need
masking.

Code-generation-wise we'd add a new iv starting with

 iv = { 0, 1, 2, 3 };

and the mask is computed by comparing that against {niter, niter, niter, niter}?

So if we need masks for different vector element counts we could also add
additional IVs rather than "widening"/"shortening" the comparison result.
cond-expr reduction does this kind of IV as well which is a chance to share
some code (eventually).

You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
this is meaningful (if then only by accident).  I think you should look at the
control IV itself, possibly it's value-range, to determine the smallest possible
type to use.

Finally we have a related missed optimization opportunity, namely avoiding
peeling for gaps if we mask the last load of the group (profitability depends
on the overhead of such masking of course as it would be done in the main
vectorized loop).

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  
>
> * tree-vect-loop.c: Include insn-config.h and recog.h.
> (vect_check_required_masks_widening): New.
> (vect_check_required_masks_narrowing): New.
> (vect_get_masking_iv_elems): New.
> (vect_get_masking_iv_type): New.
> (vect_get_extreme_masks): New.
> (vect_check_required_masks): New.
> (vect_analyze_loop_operations): Add vect_check_required_masks
> call to compute LOOP_VINFO_CAN_BE_MASKED.
> (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
> LOOP_VINFO_NEED_MASKING before starting over.
> (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
> masking cost.
> * tree-vect-stmts.c (can_mask_load_store): New.
> (vect_model_load_masking_cost): New.
> (vect_model_store_masking_cost): New.
> (vect_model_simple_masking_cost): New.
> (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
> and masking cost.
> (vectorizable_simd_clone_call): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (vect_stmt_should_be_masked_for_epilogue): New.
> (vect_add_required_mask_for_stmt): New.
> (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
> * tree-vectorizer.h (vect_model_load_masking_cost): New.
> (vect_model_store_masking_cost): New.
> (vect_model_simple_masking_cost): New.
>
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index e25a0ce..31360d3 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "ssa.h"
>  #include "optabs-tree.h"
> +#include "insn-config.h"
> +#include "recog.h" /* FIXME: for insn_data */
>  #include "diagnostic-core.h"
>  #include "fold-const.h"
>  #include "stor-layout.h"
> @@ -1601,6 +1603,266 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
>  vectorization_factor);
>  }
>
> +/* Function vect_check_required_masks_widening.
> +
> +   Return 1 if vector mask of type MASK_TYPE can be widened
> +   to a type having REQ_ELEMS elements in a single vector.  */
> +
> +static bool
> +vect_check_required_masks_widening (loop_vec_info loop_vinfo,
> +   tree mask_type, unsigned req_elems)
> +{
> +  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
> +
> +  gcc_assert (mask_elems > req_elems);
> +
> +  /* Don't convert if it requires too many intermediate steps.  */
> +  int steps = exact_log2 (mask_elems / req_elems);
> +  if (steps > MAX_INTERM_CVT_STEPS + 1)
> +return false;
> +
> +  /* Check we have conversion support for given mask mode.  */
> +  machine_mode mode = TYPE_MODE (mask_type);
> +  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
> +  if (icode == CODE_FOR_nothing
> +  || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
> +return false;
> +
> +  /* Make recursive call for multi-step conversion.  */
> +  if (steps > 1)
> +{
> +  mask_elems = mask_elems >> 1;
> +  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
> +  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +   return false;
> +
> +  if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
> +  req_elems))
> +   return false;
> +}
> +  else
> +{
> +  mask_type = build_truth_vector_type (req_elems, current_vector_size);
> +  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +   

[PATCH, vec-tails 05/10] Check if loop can be masked

2016-05-19 Thread Ilya Enkovich
Hi,

This patch introduces analysis to determine if loop can be masked
(compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
and compute how much masking costs.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index e25a0ce..31360d3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h" /* FIXME: for insn_data */
 #include "diagnostic-core.h"
 #include "fold-const.h"
 #include "stor-layout.h"
@@ -1601,6 +1603,266 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
 vectorization_factor);
 }
 
+/* Function vect_check_required_masks_widening.
+
+   Return 1 if vector mask of type MASK_TYPE can be widened
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_widening (loop_vec_info loop_vinfo,
+   tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (mask_elems > req_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (mask_elems / req_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
+  if (icode == CODE_FOR_nothing
+  || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
+return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+{
+  mask_elems = mask_elems >> 1;
+  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+
+  if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
+  req_elems))
+   return false;
+}
+  else
+{
+  mask_type = build_truth_vector_type (req_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+}
+
+  return true;
+}
+
+/* Function vect_check_required_masks_narowing.
+
+   Return 1 if vector mask of type MASK_TYPE can be narrowed
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
+tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (req_elems > mask_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (req_elems / mask_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_pack_trunc_optab, mode);
+  if (icode == CODE_FOR_nothing)
+return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+{
+  mask_elems = mask_elems << 1;
+  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+
+  if (!vect_check_required_masks_narrowing (loop_vinfo, mask_type,