Re: [PATCH, vec-tails 05/10] Check if loop can be masked
Ping 2016-06-23 12:54 GMT+03:00 Ilya Enkovich : > On 22 Jun 11:42, Jeff Law wrote: >> On 06/22/2016 10:09 AM, Ilya Enkovich wrote: >> >> >>Given the common structure & duplication I can't help but wonder if a >> >>single >> >>function should be used for widening/narrowing. Ultimately can't you swap >> >>mask_elems/req_elems and always go narrower to wider (using a different >> >>optab for the two different cases)? >> > >> >I think we can't always go in narrower to wider direction because widening >> >uses two optabs wand also because the way insn_data is checked. >> OK. Thanks for considering. >> >> >> >> >>I'm guessing Richi's comment about what tree type you're looking at refers >> >>to this and similar instances. Doesn't this give you the type of the >> >>number >> >>of iterations rather than the type of the iteration variable itself? >> >> >> >> >> > >> >Since I build vector IV by myself and use to compare with NITERS I >> >feel it's safe to >> >use type of NITERS. Do you expect NITERS and IV types differ? >> Since you're comparing to NITERS, it sounds like you've got it right and >> that Richi and I have it wrong. >> >> It's less a question of whether or not we expect NITERS and IV to have >> different types, but more a realization that there's nothing that inherently >> says they have to be the same. THey probably are the same most of the time, >> but I don't think that's something we can or should necessarily depend on. >> >> >> >> >>>@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt, >> >>>gimple_stmt_iterator *gsi, >> >>> && !useless_type_conversion_p (vectype, rhs_vectype))) >> >>> return false; >> >>> >> >>>+ if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) >> >>>+{ >> >>>+ /* Check that mask conjuction is supported. */ >> >>>+ optab tab; >> >>>+ tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default); >> >>>+ if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == >> >>>CODE_FOR_nothing) >> >>>+ { >> >>>+ if (dump_enabled_p ()) >> >>>+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> >>>+"cannot be masked: unsupported mask >> >>>operation\n"); >> >>>+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; >> >>>+ } >> >>>+} >> >> >> >>Should the optab querying be in optab-query.c? >> > >> >We always directly call optab_handler for simple operations. There are >> >dozens >> >of such calls in vectorizer. >> OK. I would look favorably on a change to move those queries out into >> optabs-query as a separate patch. >> >> > >> >We don't embed masking capabilities into vectorizer. >> > >> >Actually we don't depend on masking capabilities so much. We have to mask >> >loads and stores and use can_mask_load_store for that which uses existing >> >optab >> >query. We also require masking for reductions and use VEC_COND for that >> >(and use existing expand_vec_cond_expr_p). Other checks are to check if we >> >can build required masks. So we actually don't expose any new processor >> >masking capabilities to GIMPLE. I.e. all this works on targets with no >> >rich masking capabilities. E.g. we can mask loops for quite old SSE >> >targets. >> OK. I think the key here is that load/store masking already exists and the >> others are either VEC_COND or checking if we can build the mask rather than >> can the operation be masked. THanks for clarifying. >> jeff > > Here is an updated version with less typos and more comments. > > Thanks, > Ilya > -- > gcc/ > > 2016-05-23 Ilya Enkovich > > * tree-vect-loop.c: Include insn-config.h and recog.h. > (vect_check_required_masks_widening): New. > (vect_check_required_masks_narrowing): New. > (vect_get_masking_iv_elems): New. > (vect_get_masking_iv_type): New. > (vect_get_extreme_masks): New. > (vect_check_required_masks): New. > (vect_analyze_loop_operations): Add vect_check_required_masks > call to compute LOOP_VINFO_CAN_BE_MASKED. > (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and > LOOP_VINFO_NEED_MASKING before starting over. > (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and > masking cost. > * tree-vect-stmts.c (can_mask_load_store): New. > (vect_model_load_masking_cost): New. > (vect_model_store_masking_cost): New. > (vect_model_simple_masking_cost): New. > (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED > and masking cost. > (vectorizable_simd_clone_call): Likewise. > (vectorizable_store): Likewise. > (vectorizable_load): Likewise. > (vect_stmt_should_be_masked_for_epilogue): New. > (vect_add_required_mask_for_stmt): New. > (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. > * tree-vectorizer.h (vect_model_load_masking_cost): New. > (vect_model_st
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On 23 Jun 12:54, Ilya Enkovich wrote: > > Here is an updated version with less typos and more comments. > > Thanks, > Ilya > -- Here is an updated version with trapping statements check added to vect_analyze_stmt. Thanks, Ilya -- gcc/ 2016-06-28 Ilya Enkovich * tree-vect-loop.c: Include insn-config.h and recog.h. (vect_check_required_masks_widening): New. (vect_check_required_masks_narrowing): New. (vect_get_masking_iv_elems): New. (vect_get_masking_iv_type): New. (vect_get_extreme_masks): New. (vect_check_required_masks): New. (vect_analyze_loop_operations): Add vect_check_required_masks call to compute LOOP_VINFO_CAN_BE_MASKED. (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_NEED_MASKING before starting over. (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. * tree-vect-stmts.c (can_mask_load_store): New. (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. (vectorizable_simd_clone_call): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vect_stmt_should_be_masked_for_epilogue): New. (vect_add_required_mask_for_stmt): New. (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. * tree-vectorizer.h (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 36e60d4..1146de9 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "optabs-tree.h" +#include "insn-config.h" +#include "recog.h" /* FIXME: for insn_data */ #include "diagnostic-core.h" #include "fold-const.h" #include "stor-layout.h" @@ -1603,6 +1605,270 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo) vectorization_factor); } +/* Function vect_check_required_masks_widening. + + Return 1 if vector mask of type MASK_TYPE can be widened + to a type having REQ_ELEMS elements in a single vector. */ + +static bool +vect_check_required_masks_widening (loop_vec_info loop_vinfo, + tree mask_type, unsigned req_elems) +{ + unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type); + + gcc_assert (mask_elems > req_elems); + + /* Don't convert if it requires too many intermediate steps. */ + int steps = exact_log2 (mask_elems / req_elems); + if (steps > MAX_INTERM_CVT_STEPS + 1) +return false; + + /* Check we have conversion support for given mask mode. */ + machine_mode mode = TYPE_MODE (mask_type); + insn_code icode = optab_handler (vec_unpacks_lo_optab, mode); + if (icode == CODE_FOR_nothing + || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing) +return false; + + /* Make recursive call for multi-step conversion. */ + if (steps > 1) +{ + mask_elems = mask_elems >> 1; + mask_type = build_truth_vector_type (mask_elems, current_vector_size); + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) + return false; + + if (!vect_check_required_masks_widening (loop_vinfo, mask_type, + req_elems)) + return false; +} + else +{ + mask_type = build_truth_vector_type (req_elems, current_vector_size); + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) + return false; +} + + return true; +} + +/* Function vect_check_required_masks_narrowing. + + Return 1 if vector mask of type MASK_TYPE can be narrowed + to a type having REQ_ELEMS elements in a single vector. */ + +static bool +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo, +tree mask_type, unsigned req_elems) +{ + unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type); + + gcc_assert (req_elems > mask_elems); + + /* Don't convert if it requires too many intermediate steps. */ + int steps = exact_log2 (req_elems / mask_elems); + if (steps > MAX_INTERM_CVT_STEPS + 1) +return false; + + /* Check we have conversion support for given mask mode. */ + machine_mode mode = TYPE_MODE (mask_type); + insn_code icode = optab_handler (vec_pack_trunc_optab, mode); + if (icode == CODE_FOR_nothing) +return false; + + /* Make recursive call for multi-step conversion. */ + if (steps > 1) +{ + mask_elems = mask_elems << 1; + mask_type = build_truth_vector_type (mask_elems, current_vector_size); + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) + return false; + + if (!vect_check_req
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On 22 Jun 11:42, Jeff Law wrote: > On 06/22/2016 10:09 AM, Ilya Enkovich wrote: > > >>Given the common structure & duplication I can't help but wonder if a single > >>function should be used for widening/narrowing. Ultimately can't you swap > >>mask_elems/req_elems and always go narrower to wider (using a different > >>optab for the two different cases)? > > > >I think we can't always go in narrower to wider direction because widening > >uses two optabs wand also because the way insn_data is checked. > OK. Thanks for considering. > > >> > >>I'm guessing Richi's comment about what tree type you're looking at refers > >>to this and similar instances. Doesn't this give you the type of the number > >>of iterations rather than the type of the iteration variable itself? > >> > >> > > > >Since I build vector IV by myself and use to compare with NITERS I > >feel it's safe to > >use type of NITERS. Do you expect NITERS and IV types differ? > Since you're comparing to NITERS, it sounds like you've got it right and > that Richi and I have it wrong. > > It's less a question of whether or not we expect NITERS and IV to have > different types, but more a realization that there's nothing that inherently > says they have to be the same. THey probably are the same most of the time, > but I don't think that's something we can or should necessarily depend on. > > > > >>>@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt, > >>>gimple_stmt_iterator *gsi, > >>> && !useless_type_conversion_p (vectype, rhs_vectype))) > >>> return false; > >>> > >>>+ if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) > >>>+{ > >>>+ /* Check that mask conjuction is supported. */ > >>>+ optab tab; > >>>+ tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default); > >>>+ if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == > >>>CODE_FOR_nothing) > >>>+ { > >>>+ if (dump_enabled_p ()) > >>>+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >>>+"cannot be masked: unsupported mask > >>>operation\n"); > >>>+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > >>>+ } > >>>+} > >> > >>Should the optab querying be in optab-query.c? > > > >We always directly call optab_handler for simple operations. There are > >dozens > >of such calls in vectorizer. > OK. I would look favorably on a change to move those queries out into > optabs-query as a separate patch. > > > > >We don't embed masking capabilities into vectorizer. > > > >Actually we don't depend on masking capabilities so much. We have to mask > >loads and stores and use can_mask_load_store for that which uses existing > >optab > >query. We also require masking for reductions and use VEC_COND for that > >(and use existing expand_vec_cond_expr_p). Other checks are to check if we > >can build required masks. So we actually don't expose any new processor > >masking capabilities to GIMPLE. I.e. all this works on targets with no > >rich masking capabilities. E.g. we can mask loops for quite old SSE targets. > OK. I think the key here is that load/store masking already exists and the > others are either VEC_COND or checking if we can build the mask rather than > can the operation be masked. THanks for clarifying. > jeff Here is an updated version with less typos and more comments. Thanks, Ilya -- gcc/ 2016-05-23 Ilya Enkovich * tree-vect-loop.c: Include insn-config.h and recog.h. (vect_check_required_masks_widening): New. (vect_check_required_masks_narrowing): New. (vect_get_masking_iv_elems): New. (vect_get_masking_iv_type): New. (vect_get_extreme_masks): New. (vect_check_required_masks): New. (vect_analyze_loop_operations): Add vect_check_required_masks call to compute LOOP_VINFO_CAN_BE_MASKED. (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_NEED_MASKING before starting over. (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. * tree-vect-stmts.c (can_mask_load_store): New. (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. (vectorizable_simd_clone_call): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vect_stmt_should_be_masked_for_epilogue): New. (vect_add_required_mask_for_stmt): New. (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. * tree-vectorizer.h (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index c75d234..3b50168 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On 06/22/2016 10:09 AM, Ilya Enkovich wrote: Given the common structure & duplication I can't help but wonder if a single function should be used for widening/narrowing. Ultimately can't you swap mask_elems/req_elems and always go narrower to wider (using a different optab for the two different cases)? I think we can't always go in narrower to wider direction because widening uses two optabs wand also because the way insn_data is checked. OK. Thanks for considering. I'm guessing Richi's comment about what tree type you're looking at refers to this and similar instances. Doesn't this give you the type of the number of iterations rather than the type of the iteration variable itself? Since I build vector IV by myself and use to compare with NITERS I feel it's safe to use type of NITERS. Do you expect NITERS and IV types differ? Since you're comparing to NITERS, it sounds like you've got it right and that Richi and I have it wrong. It's less a question of whether or not we expect NITERS and IV to have different types, but more a realization that there's nothing that inherently says they have to be the same. THey probably are the same most of the time, but I don't think that's something we can or should necessarily depend on. @@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi, && !useless_type_conversion_p (vectype, rhs_vectype))) return false; + if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) +{ + /* Check that mask conjuction is supported. */ + optab tab; + tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default); + if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"cannot be masked: unsupported mask operation\n"); + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; + } +} Should the optab querying be in optab-query.c? We always directly call optab_handler for simple operations. There are dozens of such calls in vectorizer. OK. I would look favorably on a change to move those queries out into optabs-query as a separate patch. We don't embed masking capabilities into vectorizer. Actually we don't depend on masking capabilities so much. We have to mask loads and stores and use can_mask_load_store for that which uses existing optab query. We also require masking for reductions and use VEC_COND for that (and use existing expand_vec_cond_expr_p). Other checks are to check if we can build required masks. So we actually don't expose any new processor masking capabilities to GIMPLE. I.e. all this works on targets with no rich masking capabilities. E.g. we can mask loops for quite old SSE targets. OK. I think the key here is that load/store masking already exists and the others are either VEC_COND or checking if we can build the mask rather than can the operation be masked. THanks for clarifying. jeff
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On 06/22/2016 09:03 AM, Ilya Enkovich wrote: 2016-06-16 9:26 GMT+03:00 Jeff Law : On 06/15/2016 05:22 AM, Richard Biener wrote: You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think this is meaningful (if then only by accident). I think you should look at the control IV itself, possibly it's value-range, to determine the smallest possible type to use. Can we get an IV that's created after VRP? If so, then we have to be prepared for the case where there's no range information on the IV. At which point I think using type min/max of the IV is probably the right fallback. But I do think we should be looking at range info much more systematically. I can't see how TREE_TYPE of the NITERS makes sense either. I need to build a vector {niters, ..., niters} and compare to it. Why doesn't it make sense to choose the same type for IV? I agree that choosing a smaller type may be beneficial. Shouldn't I look at nb_iterations_upper_bound then to check if NITERS can be casted to a smaller type? Isn't TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)) the type of the constant being used to represent the number of iterations? That is independent of the type of the IV. Though I guess your argument is that since you're building a vector of niters, that indeed what you want is the type of that constant, not the type of the IV. That might be worth a comment in the code :-) jeff
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
2016-06-16 10:08 GMT+03:00 Jeff Law : > On 05/19/2016 01:42 PM, Ilya Enkovich wrote: >> >> Hi, >> >> This patch introduces analysis to determine if loop can be masked >> (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS) >> and compute how much masking costs. >> >> Thanks, >> Ilya >> -- >> gcc/ >> >> 2016-05-19 Ilya Enkovich >> >> * tree-vect-loop.c: Include insn-config.h and recog.h. >> (vect_check_required_masks_widening): New. >> (vect_check_required_masks_narrowing): New. >> (vect_get_masking_iv_elems): New. >> (vect_get_masking_iv_type): New. >> (vect_get_extreme_masks): New. >> (vect_check_required_masks): New. >> (vect_analyze_loop_operations): Add vect_check_required_masks >> call to compute LOOP_VINFO_CAN_BE_MASKED. >> (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and >> LOOP_VINFO_NEED_MASKING before starting over. >> (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and >> masking cost. >> * tree-vect-stmts.c (can_mask_load_store): New. >> (vect_model_load_masking_cost): New. >> (vect_model_store_masking_cost): New. >> (vect_model_simple_masking_cost): New. >> (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED >> and masking cost. >> (vectorizable_simd_clone_call): Likewise. >> (vectorizable_store): Likewise. >> (vectorizable_load): Likewise. >> (vect_stmt_should_be_masked_for_epilogue): New. >> (vect_add_required_mask_for_stmt): New. >> (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. >> * tree-vectorizer.h (vect_model_load_masking_cost): New. >> (vect_model_store_masking_cost): New. >> (vect_model_simple_masking_cost): New. >> >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index e25a0ce..31360d3 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-pass.h" >> #include "ssa.h" >> #include "optabs-tree.h" >> +#include "insn-config.h" >> +#include "recog.h" /* FIXME: for insn_data */ > > Ick :( > > >> + >> +/* Function vect_check_required_masks_narowing. > > narrowing > > >> + >> + Return 1 if vector mask of type MASK_TYPE can be narrowed >> + to a type having REQ_ELEMS elements in a single vector. */ >> + >> +static bool >> +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo, >> +tree mask_type, unsigned req_elems) > > Given the common structure & duplication I can't help but wonder if a single > function should be used for widening/narrowing. Ultimately can't you swap > mask_elems/req_elems and always go narrower to wider (using a different > optab for the two different cases)? I think we can't always go in narrower to wider direction because widening uses two optabs wand also because the way insn_data is checked. > > > > >> + >> +/* Function vect_get_masking_iv_elems. >> + >> + Return a number of elements in IV used for loop masking. */ >> +static int >> +vect_get_masking_iv_elems (loop_vec_info loop_vinfo) >> +{ >> + tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)); > > I'm guessing Richi's comment about what tree type you're looking at refers > to this and similar instances. Doesn't this give you the type of the number > of iterations rather than the type of the iteration variable itself? > > Since I build vector IV by myself and use to compare with NITERS I feel it's safe to use type of NITERS. Do you expect NITERS and IV types differ? > > > + >> >> + if (!expand_vec_cmp_expr_p (iv_vectype, mask_type)) >> +{ >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> +"cannot be masked: required vector comparison " >> +"is not supported.\n"); >> + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; >> + return; >> +} > > On a totally unrelated topic, I was speaking with David Malcolm earlier this > week about how to turn this kind of missed optimization information we > currently emit into dumps into something more easily consumed by users. > > The general issue is that we've got customers that want to understand why > certain optimizations fire or do not fire. They're by far more interested > in the vectorizer than anything else. > > We have a sense that much of the information those customers desire is > sitting in the dump files, but it's buried in there with other stuff that > isn't generally useful to users. > > So we're pondering what it might take to take these glorified fprintf calls > and turn them into a first class diagnostic that could be emitted to stderr > or into the dump file depending (of course) on the options passed to GCC. > > The reason I bring this up is the hope that your team mi
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
2016-06-16 9:26 GMT+03:00 Jeff Law : > On 06/15/2016 05:22 AM, Richard Biener wrote: >> >> >> You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think >> this is meaningful (if then only by accident). I think you should look at >> the >> control IV itself, possibly it's value-range, to determine the smallest >> possible >> type to use. > > Can we get an IV that's created after VRP? If so, then we have to be > prepared for the case where there's no range information on the IV. At > which point I think using type min/max of the IV is probably the right > fallback. But I do think we should be looking at range info much more > systematically. > > I can't see how TREE_TYPE of the NITERS makes sense either. I need to build a vector {niters, ..., niters} and compare to it. Why doesn't it make sense to choose the same type for IV? I agree that choosing a smaller type may be beneficial. Shouldn't I look at nb_iterations_upper_bound then to check if NITERS can be casted to a smaller type? Thanks, Ilya > >> Finally we have a related missed optimization opportunity, namely avoiding >> peeling for gaps if we mask the last load of the group (profitability >> depends >> on the overhead of such masking of course as it would be done in the main >> vectorized loop). > > I think that's a specific instance of a more general question -- what > transformations can be avoided by masking and can we generate costs to > select between those transformations and masking. Seems like a follow-up > item rather than a requirement for this work to go forward to me. > > Jeff
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On 05/19/2016 01:42 PM, Ilya Enkovich wrote: Hi, This patch introduces analysis to determine if loop can be masked (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS) and compute how much masking costs. Thanks, Ilya -- gcc/ 2016-05-19 Ilya Enkovich * tree-vect-loop.c: Include insn-config.h and recog.h. (vect_check_required_masks_widening): New. (vect_check_required_masks_narrowing): New. (vect_get_masking_iv_elems): New. (vect_get_masking_iv_type): New. (vect_get_extreme_masks): New. (vect_check_required_masks): New. (vect_analyze_loop_operations): Add vect_check_required_masks call to compute LOOP_VINFO_CAN_BE_MASKED. (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_NEED_MASKING before starting over. (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. * tree-vect-stmts.c (can_mask_load_store): New. (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. (vectorizable_simd_clone_call): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vect_stmt_should_be_masked_for_epilogue): New. (vect_add_required_mask_for_stmt): New. (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. * tree-vectorizer.h (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index e25a0ce..31360d3 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "optabs-tree.h" +#include "insn-config.h" +#include "recog.h" /* FIXME: for insn_data */ Ick :( + +/* Function vect_check_required_masks_narowing. narrowing + + Return 1 if vector mask of type MASK_TYPE can be narrowed + to a type having REQ_ELEMS elements in a single vector. */ + +static bool +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo, +tree mask_type, unsigned req_elems) Given the common structure & duplication I can't help but wonder if a single function should be used for widening/narrowing. Ultimately can't you swap mask_elems/req_elems and always go narrower to wider (using a different optab for the two different cases)? + +/* Function vect_get_masking_iv_elems. + + Return a number of elements in IV used for loop masking. */ +static int +vect_get_masking_iv_elems (loop_vec_info loop_vinfo) +{ + tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)); I'm guessing Richi's comment about what tree type you're looking at refers to this and similar instances. Doesn't this give you the type of the number of iterations rather than the type of the iteration variable itself? + + if (!expand_vec_cmp_expr_p (iv_vectype, mask_type)) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"cannot be masked: required vector comparison " +"is not supported.\n"); + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; + return; +} On a totally unrelated topic, I was speaking with David Malcolm earlier this week about how to turn this kind of missed optimization information we currently emit into dumps into something more easily consumed by users. The general issue is that we've got customers that want to understand why certain optimizations fire or do not fire. They're by far more interested in the vectorizer than anything else. We have a sense that much of the information those customers desire is sitting in the dump files, but it's buried in there with other stuff that isn't generally useful to users. So we're pondering what it might take to take these glorified fprintf calls and turn them into a first class diagnostic that could be emitted to stderr or into the dump file depending (of course) on the options passed to GCC. The reason I bring this up is the hope that your team might have some insights based on what ICC has done the in the past for its customers. Anyway, back to the code... diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 9ab4af4..91ebe5a 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "builtins.h" #include "internal-fn.h" +#include "tree-ssa-loop-ivopts.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -535,6 +536,38 @@ process_use (gimple *stmt, tree use, loop_vec_info loop_vinfo, bool live_p
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On 06/15/2016 05:22 AM, Richard Biener wrote: You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think this is meaningful (if then only by accident). I think you should look at the control IV itself, possibly it's value-range, to determine the smallest possible type to use. Can we get an IV that's created after VRP? If so, then we have to be prepared for the case where there's no range information on the IV. At which point I think using type min/max of the IV is probably the right fallback. But I do think we should be looking at range info much more systematically. I can't see how TREE_TYPE of the NITERS makes sense either. Finally we have a related missed optimization opportunity, namely avoiding peeling for gaps if we mask the last load of the group (profitability depends on the overhead of such masking of course as it would be done in the main vectorized loop). I think that's a specific instance of a more general question -- what transformations can be avoided by masking and can we generate costs to select between those transformations and masking. Seems like a follow-up item rather than a requirement for this work to go forward to me. Jeff
Re: [PATCH, vec-tails 05/10] Check if loop can be masked
On Thu, May 19, 2016 at 9:42 PM, Ilya Enkovich wrote: > Hi, > > This patch introduces analysis to determine if loop can be masked > (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS) > and compute how much masking costs. Maybe in a different patch, but it looks like you assume say, a division, does not need masking. Code-generation-wise we'd add a new iv starting with iv = { 0, 1, 2, 3 }; and the mask is computed by comparing that against {niter, niter, niter, niter}? So if we need masks for different vector element counts we could also add additional IVs rather than "widening"/"shortening" the comparison result. cond-expr reduction does this kind of IV as well which is a chance to share some code (eventually). You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think this is meaningful (if then only by accident). I think you should look at the control IV itself, possibly it's value-range, to determine the smallest possible type to use. Finally we have a related missed optimization opportunity, namely avoiding peeling for gaps if we mask the last load of the group (profitability depends on the overhead of such masking of course as it would be done in the main vectorized loop). Richard. > Thanks, > Ilya > -- > gcc/ > > 2016-05-19 Ilya Enkovich > > * tree-vect-loop.c: Include insn-config.h and recog.h. > (vect_check_required_masks_widening): New. > (vect_check_required_masks_narrowing): New. > (vect_get_masking_iv_elems): New. > (vect_get_masking_iv_type): New. > (vect_get_extreme_masks): New. > (vect_check_required_masks): New. > (vect_analyze_loop_operations): Add vect_check_required_masks > call to compute LOOP_VINFO_CAN_BE_MASKED. > (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and > LOOP_VINFO_NEED_MASKING before starting over. > (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and > masking cost. > * tree-vect-stmts.c (can_mask_load_store): New. > (vect_model_load_masking_cost): New. > (vect_model_store_masking_cost): New. > (vect_model_simple_masking_cost): New. > (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED > and masking cost. > (vectorizable_simd_clone_call): Likewise. > (vectorizable_store): Likewise. > (vectorizable_load): Likewise. > (vect_stmt_should_be_masked_for_epilogue): New. > (vect_add_required_mask_for_stmt): New. > (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. > * tree-vectorizer.h (vect_model_load_masking_cost): New. > (vect_model_store_masking_cost): New. > (vect_model_simple_masking_cost): New. > > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index e25a0ce..31360d3 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3. If not see > #include "tree-pass.h" > #include "ssa.h" > #include "optabs-tree.h" > +#include "insn-config.h" > +#include "recog.h" /* FIXME: for insn_data */ > #include "diagnostic-core.h" > #include "fold-const.h" > #include "stor-layout.h" > @@ -1601,6 +1603,266 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo) > vectorization_factor); > } > > +/* Function vect_check_required_masks_widening. > + > + Return 1 if vector mask of type MASK_TYPE can be widened > + to a type having REQ_ELEMS elements in a single vector. */ > + > +static bool > +vect_check_required_masks_widening (loop_vec_info loop_vinfo, > + tree mask_type, unsigned req_elems) > +{ > + unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type); > + > + gcc_assert (mask_elems > req_elems); > + > + /* Don't convert if it requires too many intermediate steps. */ > + int steps = exact_log2 (mask_elems / req_elems); > + if (steps > MAX_INTERM_CVT_STEPS + 1) > +return false; > + > + /* Check we have conversion support for given mask mode. */ > + machine_mode mode = TYPE_MODE (mask_type); > + insn_code icode = optab_handler (vec_unpacks_lo_optab, mode); > + if (icode == CODE_FOR_nothing > + || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing) > +return false; > + > + /* Make recursive call for multi-step conversion. */ > + if (steps > 1) > +{ > + mask_elems = mask_elems >> 1; > + mask_type = build_truth_vector_type (mask_elems, current_vector_size); > + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) > + return false; > + > + if (!vect_check_required_masks_widening (loop_vinfo, mask_type, > + req_elems)) > + return false; > +} > + else > +{ > + mask_type = build_truth_vector_type (req_elems, current_vector_size); > + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) > +
[PATCH, vec-tails 05/10] Check if loop can be masked
Hi, This patch introduces analysis to determine if loop can be masked (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS) and compute how much masking costs. Thanks, Ilya -- gcc/ 2016-05-19 Ilya Enkovich * tree-vect-loop.c: Include insn-config.h and recog.h. (vect_check_required_masks_widening): New. (vect_check_required_masks_narrowing): New. (vect_get_masking_iv_elems): New. (vect_get_masking_iv_type): New. (vect_get_extreme_masks): New. (vect_check_required_masks): New. (vect_analyze_loop_operations): Add vect_check_required_masks call to compute LOOP_VINFO_CAN_BE_MASKED. (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_NEED_MASKING before starting over. (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. * tree-vect-stmts.c (can_mask_load_store): New. (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED and masking cost. (vectorizable_simd_clone_call): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vect_stmt_should_be_masked_for_epilogue): New. (vect_add_required_mask_for_stmt): New. (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED. * tree-vectorizer.h (vect_model_load_masking_cost): New. (vect_model_store_masking_cost): New. (vect_model_simple_masking_cost): New. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index e25a0ce..31360d3 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "optabs-tree.h" +#include "insn-config.h" +#include "recog.h" /* FIXME: for insn_data */ #include "diagnostic-core.h" #include "fold-const.h" #include "stor-layout.h" @@ -1601,6 +1603,266 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo) vectorization_factor); } +/* Function vect_check_required_masks_widening. + + Return 1 if vector mask of type MASK_TYPE can be widened + to a type having REQ_ELEMS elements in a single vector. */ + +static bool +vect_check_required_masks_widening (loop_vec_info loop_vinfo, + tree mask_type, unsigned req_elems) +{ + unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type); + + gcc_assert (mask_elems > req_elems); + + /* Don't convert if it requires too many intermediate steps. */ + int steps = exact_log2 (mask_elems / req_elems); + if (steps > MAX_INTERM_CVT_STEPS + 1) +return false; + + /* Check we have conversion support for given mask mode. */ + machine_mode mode = TYPE_MODE (mask_type); + insn_code icode = optab_handler (vec_unpacks_lo_optab, mode); + if (icode == CODE_FOR_nothing + || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing) +return false; + + /* Make recursive call for multi-step conversion. */ + if (steps > 1) +{ + mask_elems = mask_elems >> 1; + mask_type = build_truth_vector_type (mask_elems, current_vector_size); + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) + return false; + + if (!vect_check_required_masks_widening (loop_vinfo, mask_type, + req_elems)) + return false; +} + else +{ + mask_type = build_truth_vector_type (req_elems, current_vector_size); + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) + return false; +} + + return true; +} + +/* Function vect_check_required_masks_narowing. + + Return 1 if vector mask of type MASK_TYPE can be narrowed + to a type having REQ_ELEMS elements in a single vector. */ + +static bool +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo, +tree mask_type, unsigned req_elems) +{ + unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type); + + gcc_assert (req_elems > mask_elems); + + /* Don't convert if it requires too many intermediate steps. */ + int steps = exact_log2 (req_elems / mask_elems); + if (steps > MAX_INTERM_CVT_STEPS + 1) +return false; + + /* Check we have conversion support for given mask mode. */ + machine_mode mode = TYPE_MODE (mask_type); + insn_code icode = optab_handler (vec_pack_trunc_optab, mode); + if (icode == CODE_FOR_nothing) +return false; + + /* Make recursive call for multi-step conversion. */ + if (steps > 1) +{ + mask_elems = mask_elems << 1; + mask_type = build_truth_vector_type (mask_elems, current_vector_size); + if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode) + return false; + + if (!vect_check_required_masks_narrowing (loop_vinfo, mask_type,