[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #12 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #11 from rsandifo at gcc dot gnu.org --- Author: rsandifo Date: Fri Nov 29 14:47:44 2019 New Revision: 278851 URL: https://gcc.gnu.org/viewcvs?rev=278851=gcc=rev Log: Don't defer choice of vector type for bools (PR 92596) Now that stmt_vec_info records the choice between vector mask types and normal nonmask types, we can use that information in vect_get_vector_types_for_stmt instead of deferring the choice of vector type till later. vect_get_mask_type_for_stmt used to check whether the boolean inputs to an operation: (a) consistently used mask types or consistently used nonmask types; and (b) agreed on the number of elements. (b) shouldn't be a problem when (a) is met. If the operation consistently uses mask types, tree-vect-patterns.c will have corrected any mismatches in mask precision. (This is because we only use mask types for a small well-known set of operations and tree-vect-patterns.c knows how to handle any that could have different mask precisions.) And if the operation consistently uses normal nonmask types, there's no reason why booleans should need extra vector compatibility checks compared to ordinary integers. So the potential difficulties all seem to come from (a). Now that we've chosen the result type ahead of time, we also have to consider whether the outputs and inputs consistently use mask types. Taking each vectorizable_* routine in turn: - vectorizable_call vect_get_vector_types_for_stmt only handled booleans specially for gassigns, so vect_get_mask_type_for_stmt never had chance to handle calls. I'm not sure we support any calls that operate on booleans, but as things stand, a boolean result would always have a nonmask type. Presumably any vector argument would also need to use nonmask types, unless it corresponds to internal_fn_mask_index (which is already a special case). For safety, I've added a check for mask/nonmask combinations here even though we didn't check this previously. - vectorizable_simd_clone_call Again, vect_get_mask_type_for_stmt never had chance to handle calls. The result of the call will always be a nonmask type and the patch for PR 92710 rejects mask arguments. So all booleans should consistently use nonmask types here. - vectorizable_conversion The function already rejects any conversion between booleans in which one type isn't a mask type. - vectorizable_operation This function definitely needs a consistency check, e.g. to handle & and | in which one operand is loaded from memory and the other is a comparison result. Ideally we'd handle this via pattern stmts instead (like we do for the all-mask case), but that's future work. - vectorizable_assignment VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the current code already rejects problematic cases. - vectorizable_load Loads always produce nonmask types and there are no relevant inputs to check against. - vectorizable_store vect_check_store_rhs already rejects mask/nonmask combinations via useless_type_conversion_p. - vectorizable_reduction - vectorizable_lc_phi PHIs always have nonmask types. After the change above, attempts to combine the PHI result with a mask type would be rejected by vectorizable_operation. (Again, it would be better to handle this using pattern stmts.) - vectorizable_induction We don't generate inductions for booleans. - vectorizable_shift The function already rejects boolean shifts via type_has_mode_precision_p. - vectorizable_condition The function already rejects mismatches via useless_type_conversion_p. - vectorizable_comparison The function already rejects comparisons between mask and nonmask types. The result is always a mask type. 2019-11-29 Richard Sandiford gcc/ PR tree-optimization/92596 * tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask operations. (vectorizable_operation): Likewise, instead of relying on vect_get_mask_type_for_stmt to do this. (vect_get_vector_types_for_stmt): Always return a vector type immediately, rather than deferring the choice for boolean results. Use a vector mask type instead of a normal vector if vect_use_mask_type_p. (vect_get_mask_type_for_stmt): Delete. * tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove mask_producers argument and special boolean_type_node handling. (vect_determine_vf_for_stmt): Remove mask_producers argument and update calls to vect_determine_vf_for_stmt_1. Remove doubled call. (vect_determine_vectorization_factor): Update call accordingly. * tree-vect-slp.c (vect_build_slp_tree_1): Remove special boolean_type_node handling. (vect_slp_analyze_node_operations_1): Likewise. gcc/testsuite/ PR
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #10 from rsandifo at gcc dot gnu.org --- Since it's been a while since the last update: I've been trying various non-invasive ways of fixing it, but even if they seem to be strict improvements, they still leave open obvious traps of a similar nature for later. I think we'll just have to get rid of the special deferred handling of boolean_type_node and stop computing so much of this stuff on the fly. I'm now testing a series of patches to do that.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 rsandifo at gcc dot gnu.org changed: What|Removed |Added Assignee|rguenth at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #9 from rsandifo at gcc dot gnu.org --- I'll take the second one.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #8 from Richard Biener --- Author: rguenth Date: Thu Nov 21 15:01:17 2019 New Revision: 278555 URL: https://gcc.gnu.org/viewcvs?rev=278555=gcc=rev Log: 2019-11-21 Richard Biener PR tree-optimization/92596 * tree-vect-slp.c (vect_build_slp_tree): Fix pasto. * gcc.dg/torture/pr92596-1.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr92596-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-slp.c
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #7 from Richard Biener --- (In reply to Richard Biener from comment #6) > The second testcase still ICEs though, so investigating that. Which is more fun. We have group_size == vf == 1 here but V2SI as vectype. t2.c:17:1: note: Build SLP for _1 = b_2(D) == 0; t2.c:17:1: note: get vectype for scalar type (group size 2): int t2.c:17:1: note: vectype: vector(2) int ... t2.c:17:1: note: Final SLP tree for instance: t2.c:17:1: note: node 0x3721280 (max_nunits=2) t2.c:17:1: note:stmt 0 a.n[0] = _4; t2.c:17:1: note:stmt 1 a.n[1] = _4; t2.c:17:1: note:children 0x38dc8c0 t2.c:17:1: note: node 0x38dc8c0 (max_nunits=2) t2.c:17:1: note:stmt 0 _4 = (long int) _1; t2.c:17:1: note:stmt 1 _4 = (long int) _1; t2.c:17:1: note:children 0x38dcb10 t2.c:17:1: note: node 0x38dcb10 (max_nunits=2) t2.c:17:1: note:stmt 0 _1 = b_2(D) == 0; t2.c:17:1: note:stmt 1 _1 = b_2(D) == 0; t2.c:17:1: note:children 0x370ef50 0x371de80 t2.c:17:1: note: node (external) 0x370ef50 (max_nunits=1) t2.c:17:1: note:{ b_2(D), b_2(D) } t2.c:17:1: note: node (constant) 0x371de80 (max_nunits=1) t2.c:17:1: note:{ 0, 0 } t2.c:17:1: note: Build SLP for _1 = b_2(D) == 0; t2.c:17:1: note: get vectype for scalar type (group size 1): int t2.c:17:1: note: vectype: vector(1) int t2.c:17:1: note: nunits = 1 ... t2.c:17:1: note: Final SLP tree for instance: t2.c:17:1: note: node 0x38e3760 (max_nunits=1) t2.c:17:1: note:stmt 0 a.n[2] = _4; t2.c:17:1: note:children 0x38e3710 t2.c:17:1: note: node 0x38e3710 (max_nunits=1) t2.c:17:1: note:stmt 0 _4 = (long int) _1; t2.c:17:1: note:children 0x38e36c0 t2.c:17:1: note: node 0x38e36c0 (max_nunits=1) t2.c:17:1: note:stmt 0 _1 = b_2(D) == 0; t2.c:17:1: note:children 0x3721340 0x38e3670 t2.c:17:1: note: node (external) 0x3721340 (max_nunits=1) t2.c:17:1: note:{ b_2(D) } t2.c:17:1: note: node (constant) 0x38e3670 (max_nunits=1) t2.c:17:1: note:{ 0 } shoudn't updating of the shared vector type have failed? Ah, it gets QImode (boolean_type_node). Isn't that a bug? Anyway, will continue to dig next week unless somebody beats me to it.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #6 from Richard Biener --- The second testcase still ICEs though, so investigating that.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #5 from Richard Biener --- So I managed to commit to trunk: + poly_uint64 this_max_nunits = 1; slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits, matches, npermutes, tree_size, bst_map); - /* Keep a reference for the bst_map use. */ if (res) while to the branch (correct): + poly_uint64 this_max_nunits = 1; + slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, + _max_nunits, matches, npermutes, tree_size, max_tree_size, bst_map); - /* Keep a reference for the bst_map use. */ oops.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #4 from Richard Biener --- There's also a missed optimization showing - we analyze the group_size == 3 case successfully but fail to consider splitting it as it fails the unroll check because /* We consider breaking the group only on VF boundaries from the existing start. */ for (i = 0; i < group_size; i++) if (!matches[i]) break; if (i >= const_nunits && i < group_size) { i == group_size here. If you make fn1 static you see a different thing, namely t2.c:10:1: note: Build SLP for _10 = (long int) _2; t2.c:10:1: note: get vectype for scalar type (group size 3): long int t2.c:10:1: note: vectype: vector(1) long int t2.c:10:1: note: get vectype for smallest scalar type: _Bool t2.c:10:1: note: nunits vectype: vector(2) unsigned char t2.c:10:1: note: nunits = 2 V2QI? Huh. t2.c:10:1: note: Build SLP for _2 = c.0_1 == 0; t2.c:10:1: note: get vectype for scalar type (group size 3): int t2.c:10:1: note: vectype: vector(2) int t2.c:10:1: note: nunits = 2 and V2SI. But still V1DI. I guess with all this it might be the case that the vect_update_max_nunits call in vect_build_slp_tree for the case where there is a leader doesn't work in case the local max_nunits is bigger? But this isn't the case here. So interestingly for _2 = c.0_1 == 0; we have vectype == boolean_type_node and nunits_vectype V2SI. Uh. @@ -1247,7 +1248,8 @@ vect_build_slp_tree (vec_info *vinfo, return *leader; } poly_uint64 this_max_nunits = 1; - slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits, + slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, + _max_nunits, matches, npermutes, tree_size, bst_map); if (res) {
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #3 from Richard Biener --- So for patt_17 = (long int) patt_18; for example vect_get_vector_types_for_stmt now computes V2DI and V2SI as vectype and nunits_vectype. I think that's undesirable. scalar_type = vect_get_smallest_scalar_type (stmt_info, , ); if (scalar_type != TREE_TYPE (vectype)) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "get vectype for smallest scalar type: %T\n", scalar_type); nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size); used to result in the same sized vector type (but actually used a simple get_vectype_for_scalar_type). I'm not sure how we should handle max_nunits with the new scheme of allowing mixed-size vectors? Certainly there seem to be inconsistencies which lead to this ICE since during SLP build we always arrive at a V2{DI,SI}mode vector but during analysis we see also t2.c:10:1: note: Build SLP for _8->n[4] = _10; t2.c:10:1: note: get vectype for scalar type (group size 1): long int t2.c:10:1: note: vectype: vector(1) long int t2.c:10:1: note: nunits = 1 (fail) so eventually the vector type mismatch is introudced by t2.c:10:1: note: Split group into 2 and 1 also because t2.c:10:1: missed: Build SLP failed: incompatible vector types for: c.0_1 = c; t2.c:10:1: note: old vector type: vector(2) int t2.c:10:1: note: new vector type: vector(1) int (but that's an external node)
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 --- Comment #2 from Richard Biener --- So I think the issue is that we have /* Calculate the unrolling factor based on the smallest type. */ poly_uint64 unrolling_factor = calculate_unrolling_factor (max_nunits, group_size); with max_nunits == 1 and group_size == 3 because we end up with V1DImode as vectype. But then later we end up with V2SImode vectors for _2 = c.0_1 == 0; but the vectorization factor is 1.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- Mine.
[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596 Martin Liška changed: What|Removed |Added Priority|P3 |P1 Status|UNCONFIRMED |NEW Last reconfirmed||2019-11-20 Known to work||9.2.0 Target Milestone|--- |10.0 Ever confirmed|0 |1 Known to fail||10.0