[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-29 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from rsandifo at gcc dot gnu.org  
---
Fixed.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-29 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #11 from rsandifo at gcc dot gnu.org  
---
Author: rsandifo
Date: Fri Nov 29 14:47:44 2019
New Revision: 278851

URL: https://gcc.gnu.org/viewcvs?rev=278851=gcc=rev
Log:
Don't defer choice of vector type for bools (PR 92596)

Now that stmt_vec_info records the choice between vector mask
types and normal nonmask types, we can use that information in
vect_get_vector_types_for_stmt instead of deferring the choice
of vector type till later.

vect_get_mask_type_for_stmt used to check whether the boolean inputs
to an operation:
(a) consistently used mask types or consistently used nonmask types; and
(b) agreed on the number of elements.

(b) shouldn't be a problem when (a) is met.  If the operation
consistently uses mask types, tree-vect-patterns.c will have corrected
any mismatches in mask precision.  (This is because we only use mask
types for a small well-known set of operations and tree-vect-patterns.c
knows how to handle any that could have different mask precisions.)
And if the operation consistently uses normal nonmask types, there's
no reason why booleans should need extra vector compatibility checks
compared to ordinary integers.

So the potential difficulties all seem to come from (a).  Now that
we've chosen the result type ahead of time, we also have to consider
whether the outputs and inputs consistently use mask types.

Taking each vectorizable_* routine in turn:

- vectorizable_call

vect_get_vector_types_for_stmt only handled booleans specially
for gassigns, so vect_get_mask_type_for_stmt never had chance to
handle calls.  I'm not sure we support any calls that operate on
booleans, but as things stand, a boolean result would always have
a nonmask type.  Presumably any vector argument would also need to
use nonmask types, unless it corresponds to internal_fn_mask_index
(which is already a special case).

For safety, I've added a check for mask/nonmask combinations here
even though we didn't check this previously.

- vectorizable_simd_clone_call

Again, vect_get_mask_type_for_stmt never had chance to handle calls.
The result of the call will always be a nonmask type and the patch
for PR 92710 rejects mask arguments.  So all booleans should
consistently use nonmask types here.

- vectorizable_conversion

The function already rejects any conversion between booleans in which
one type isn't a mask type.

- vectorizable_operation

This function definitely needs a consistency check, e.g. to handle
& and | in which one operand is loaded from memory and the other is
a comparison result.  Ideally we'd handle this via pattern stmts
instead (like we do for the all-mask case), but that's future work.

- vectorizable_assignment

VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
current code already rejects problematic cases.

- vectorizable_load

Loads always produce nonmask types and there are no relevant inputs
to check against.

- vectorizable_store

vect_check_store_rhs already rejects mask/nonmask combinations
via useless_type_conversion_p.

- vectorizable_reduction
- vectorizable_lc_phi

PHIs always have nonmask types.  After the change above, attempts
to combine the PHI result with a mask type would be rejected by
vectorizable_operation.  (Again, it would be better to handle
this using pattern stmts.)

- vectorizable_induction

We don't generate inductions for booleans.

- vectorizable_shift

The function already rejects boolean shifts via type_has_mode_precision_p.

- vectorizable_condition

The function already rejects mismatches via useless_type_conversion_p.

- vectorizable_comparison

The function already rejects comparisons between mask and nonmask types.
The result is always a mask type.

2019-11-29  Richard Sandiford  

gcc/
PR tree-optimization/92596
* tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
operations.
(vectorizable_operation): Likewise, instead of relying on
vect_get_mask_type_for_stmt to do this.
(vect_get_vector_types_for_stmt): Always return a vector type
immediately, rather than deferring the choice for boolean results.
Use a vector mask type instead of a normal vector if
vect_use_mask_type_p.
(vect_get_mask_type_for_stmt): Delete.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
mask_producers argument and special boolean_type_node handling.
(vect_determine_vf_for_stmt): Remove mask_producers argument and
update calls to vect_determine_vf_for_stmt_1.  Remove doubled call.
(vect_determine_vectorization_factor): Update call accordingly.
* tree-vect-slp.c (vect_build_slp_tree_1): Remove special
boolean_type_node handling.
(vect_slp_analyze_node_operations_1): Likewise.

gcc/testsuite/
PR 

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-27 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #10 from rsandifo at gcc dot gnu.org  
---
Since it's been a while since the last update: I've been trying
various non-invasive ways of fixing it, but even if they seem to be
strict improvements, they still leave open obvious traps of a similar
nature for later.

I think we'll just have to get rid of the special deferred handling
of boolean_type_node and stop computing so much of this stuff on
the fly.  I'm now testing a series of patches to do that.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

   Assignee|rguenth at gcc dot gnu.org |rsandifo at gcc dot 
gnu.org

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
I'll take the second one.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #8 from Richard Biener  ---
Author: rguenth
Date: Thu Nov 21 15:01:17 2019
New Revision: 278555

URL: https://gcc.gnu.org/viewcvs?rev=278555=gcc=rev
Log:
2019-11-21  Richard Biener  

PR tree-optimization/92596
* tree-vect-slp.c (vect_build_slp_tree): Fix pasto.

* gcc.dg/torture/pr92596-1.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/torture/pr92596-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-slp.c

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #7 from Richard Biener  ---
(In reply to Richard Biener from comment #6)
> The second testcase still ICEs though, so investigating that.

Which is more fun.  We have group_size == vf == 1 here but V2SI as vectype.

t2.c:17:1: note:   Build SLP for _1 = b_2(D) == 0;
t2.c:17:1: note:   get vectype for scalar type (group size 2): int
t2.c:17:1: note:   vectype: vector(2) int
...
t2.c:17:1: note:   Final SLP tree for instance:
t2.c:17:1: note:   node 0x3721280 (max_nunits=2)
t2.c:17:1: note:stmt 0 a.n[0] = _4;
t2.c:17:1: note:stmt 1 a.n[1] = _4;
t2.c:17:1: note:children 0x38dc8c0
t2.c:17:1: note:   node 0x38dc8c0 (max_nunits=2)
t2.c:17:1: note:stmt 0 _4 = (long int) _1;
t2.c:17:1: note:stmt 1 _4 = (long int) _1;
t2.c:17:1: note:children 0x38dcb10
t2.c:17:1: note:   node 0x38dcb10 (max_nunits=2)
t2.c:17:1: note:stmt 0 _1 = b_2(D) == 0;
t2.c:17:1: note:stmt 1 _1 = b_2(D) == 0;
t2.c:17:1: note:children 0x370ef50 0x371de80
t2.c:17:1: note:   node (external) 0x370ef50 (max_nunits=1)
t2.c:17:1: note:{ b_2(D), b_2(D) }
t2.c:17:1: note:   node (constant) 0x371de80 (max_nunits=1)
t2.c:17:1: note:{ 0, 0 }

t2.c:17:1: note:   Build SLP for _1 = b_2(D) == 0;
t2.c:17:1: note:   get vectype for scalar type (group size 1): int
t2.c:17:1: note:   vectype: vector(1) int
t2.c:17:1: note:   nunits = 1
...
t2.c:17:1: note:   Final SLP tree for instance:
t2.c:17:1: note:   node 0x38e3760 (max_nunits=1)
t2.c:17:1: note:stmt 0 a.n[2] = _4;
t2.c:17:1: note:children 0x38e3710
t2.c:17:1: note:   node 0x38e3710 (max_nunits=1)
t2.c:17:1: note:stmt 0 _4 = (long int) _1;
t2.c:17:1: note:children 0x38e36c0
t2.c:17:1: note:   node 0x38e36c0 (max_nunits=1)
t2.c:17:1: note:stmt 0 _1 = b_2(D) == 0;
t2.c:17:1: note:children 0x3721340 0x38e3670
t2.c:17:1: note:   node (external) 0x3721340 (max_nunits=1)
t2.c:17:1: note:{ b_2(D) }
t2.c:17:1: note:   node (constant) 0x38e3670 (max_nunits=1)
t2.c:17:1: note:{ 0 }

shoudn't updating of the shared vector type have failed?

Ah, it gets QImode (boolean_type_node).  Isn't that a bug?

Anyway, will continue to dig next week unless somebody beats me to it.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #6 from Richard Biener  ---
The second testcase still ICEs though, so investigating that.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #5 from Richard Biener  ---
So I managed to commit to trunk:

+  poly_uint64 this_max_nunits = 1;
   slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits,
matches, npermutes, tree_size,
bst_map);
-  /* Keep a reference for the bst_map use.  */
   if (res)

while to the branch (correct):

+  poly_uint64 this_max_nunits = 1;
+  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size,
+   _max_nunits,
matches, npermutes, tree_size,
max_tree_size, bst_map);
-  /* Keep a reference for the bst_map use.  */

oops.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #4 from Richard Biener  ---
There's also a missed optimization showing - we analyze the group_size == 3
case
successfully but fail to consider splitting it as it fails the unroll check
because

  /* We consider breaking the group only on VF boundaries from the existing
 start.  */
  for (i = 0; i < group_size; i++)
if (!matches[i]) break;

  if (i >= const_nunits && i < group_size)
{

i == group_size here.

If you make fn1 static you see a different thing, namely

t2.c:10:1: note:   Build SLP for _10 = (long int) _2;
t2.c:10:1: note:   get vectype for scalar type (group size 3): long int
t2.c:10:1: note:   vectype: vector(1) long int
t2.c:10:1: note:   get vectype for smallest scalar type: _Bool
t2.c:10:1: note:   nunits vectype: vector(2) unsigned char
t2.c:10:1: note:   nunits = 2

V2QI?  Huh.

t2.c:10:1: note:   Build SLP for _2 = c.0_1 == 0;
t2.c:10:1: note:   get vectype for scalar type (group size 3): int
t2.c:10:1: note:   vectype: vector(2) int
t2.c:10:1: note:   nunits = 2

and V2SI.  But still V1DI.

I guess with all this it might be the case that the vect_update_max_nunits
call in vect_build_slp_tree for the case where there is a leader doesn't
work in case the local max_nunits is bigger?  But this isn't the case here.

So interestingly for

_2 = c.0_1 == 0;

we have vectype == boolean_type_node and nunits_vectype V2SI.

Uh.

@@ -1247,7 +1248,8 @@ vect_build_slp_tree (vec_info *vinfo,
   return *leader;
 }
   poly_uint64 this_max_nunits = 1;
-  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits,
+  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size,
+   _max_nunits,
matches, npermutes, tree_size,
bst_map);
   if (res)
 {

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #3 from Richard Biener  ---
So for

patt_17 = (long int) patt_18;

for example vect_get_vector_types_for_stmt now computes V2DI and V2SI as
vectype and nunits_vectype.  I think that's undesirable.

  scalar_type = vect_get_smallest_scalar_type (stmt_info, , );
  if (scalar_type != TREE_TYPE (vectype))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "get vectype for smallest scalar type: %T\n",
 scalar_type);
  nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
group_size);

used to result in the same sized vector type (but actually used
a simple get_vectype_for_scalar_type).

I'm not sure how we should handle max_nunits with the new scheme of
allowing mixed-size vectors?

Certainly there seem to be inconsistencies which lead to this ICE since
during SLP build we always arrive at a V2{DI,SI}mode vector but during
analysis we see also

t2.c:10:1: note:   Build SLP for _8->n[4] = _10;
t2.c:10:1: note:   get vectype for scalar type (group size 1): long int
t2.c:10:1: note:   vectype: vector(1) long int
t2.c:10:1: note:   nunits = 1

(fail)

so eventually the vector type mismatch is introudced by

t2.c:10:1: note:   Split group into 2 and 1

also because

t2.c:10:1: missed:   Build SLP failed: incompatible vector types for: c.0_1 =
c;
t2.c:10:1: note:   old vector type: vector(2) int
t2.c:10:1: note:   new vector type: vector(1) int

(but that's an external node)

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #2 from Richard Biener  ---
So I think the issue is that we have

  /* Calculate the unrolling factor based on the smallest type.  */
  poly_uint64 unrolling_factor
= calculate_unrolling_factor (max_nunits, group_size);

with max_nunits == 1 and group_size == 3 because we end up with
V1DImode as vectype.  But then later we end up with V2SImode vectors
for

_2 = c.0_1 == 0;

but the vectorization factor is 1.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener  ---
Mine.

[Bug tree-optimization/92596] [10 Regression] ICE in exact_div, at poly-int.h:2162 since r278406

2019-11-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

Martin Liška  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-11-20
  Known to work||9.2.0
   Target Milestone|--- |10.0
 Ever confirmed|0   |1
  Known to fail||10.0