Re: [PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-06-01 Thread Richard Biener
On Tue, 31 May 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> I built compiler with your patch and did not find out any issues with
> vectorization of loops marked with pragma simd. I also noticed that
> the size of the vectorized loop looks smaller (I can't tell you exact
> numbers since the fresh compiler performs fool unroll even if
> "-funroll-loops" option was not passed).

Thanks for checking - and yes, I expect us to generate better code
as we can't always recover from the un-CSE ifcvt did.

I have installed the patch now, watching for fallout on other archs
(I have installed it w/o re-instantiating bool patterns on x86).

Richard.

> 2016-05-30 15:55 GMT+03:00 Ilya Enkovich :
> > 2016-05-30 14:04 GMT+03:00 Richard Biener :
> >>
> >> The following patch removes the restriction on seeing a tree of stmts
> >> in vectorizer bool pattern detection (aka single-use).  With this
> >> it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
> >> and that compile-time hog can go (it's now enabled unconditionally for GCC 
> >> 7).
> >>
> >> Instead the pattern detection code will now "unshare" the condition tree
> >> for each bool pattern root my means of adding all pattern stmts of the
> >> condition tree to its pattern def sequence (so we still get some
> >> unnecessary copying, worst-case quadratic rather than exponential).
> >>
> >> Ilja - I had to disable the
> >>
> >>   tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE
> >> (rhs1));
> >>   if (mask_type
> >>   && expand_vec_cmp_expr_p (comp_vectype, mask_type))
> >> return false;
> >>
> >> check you added to check_bool_pattern to get any coverage for bool
> >> patterns on x86_64.  Doing that regresses
> >>
> >> FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1
> >> loops" 10
> >> FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized
> >> 1 loops" 10
> >> FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1
> >> loops" 2
> >>
> >> so somehow bool patterns mess up things here (I didn't investigate).
> >> The final patch will enable the above path again, avoiding the regression.
> >
> > Mask conversion patterns handle some cases bool patterns don't.  So it's
> > expected we can't vectorize some loops if bool patterns are enforced.
> >
> > Thanks,
> > Ilya
> >
> >>
> >> Yuri - I suppose you have a larger set of testcases using OMP simd
> >> or other forced vectorization you added ifcvt_repair_bool_pattern for.
> >> I'd appreciate testing (and testcases if anything fails unexpectedly).
> >>
> >> Testing on other targets is of course appreciated as well.
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).
> >>
> >> Comments?
> >>
> >> I agree with Ilya elsewhere to remove bool patterns completely
> >> (as a first step making them apply to individual stmts).  We do
> >> currently not support mixed cond/non-cond uses anyway, like
> >>
> >> _Bool a[64];
> >> unsigned char b[64];
> >>
> >> void foo (void)
> >> {
> >>   for (int i = 0; i < 64; ++i)
> >> {
> >>   _Bool x = a[i] && b[i] < 10;
> >>   a[i] = x;
> >> }
> >> }
> >>
> >> and stmt-local "patterns" can be added when vectorizing the stmt
> >> (like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
> >> bool "promotion" optimally requires a better vectorizer IL
> >> (similar to placing of shuffles).
> >>
> >> Thanks,
> >> Richard.
> >>
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-05-31 Thread Yuri Rumyantsev
Richard,

I built compiler with your patch and did not find out any issues with
vectorization of loops marked with pragma simd. I also noticed that
the size of the vectorized loop looks smaller (I can't tell you exact
numbers since the fresh compiler performs fool unroll even if
"-funroll-loops" option was not passed).

2016-05-30 15:55 GMT+03:00 Ilya Enkovich :
> 2016-05-30 14:04 GMT+03:00 Richard Biener :
>>
>> The following patch removes the restriction on seeing a tree of stmts
>> in vectorizer bool pattern detection (aka single-use).  With this
>> it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
>> and that compile-time hog can go (it's now enabled unconditionally for GCC 
>> 7).
>>
>> Instead the pattern detection code will now "unshare" the condition tree
>> for each bool pattern root my means of adding all pattern stmts of the
>> condition tree to its pattern def sequence (so we still get some
>> unnecessary copying, worst-case quadratic rather than exponential).
>>
>> Ilja - I had to disable the
>>
>>   tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE
>> (rhs1));
>>   if (mask_type
>>   && expand_vec_cmp_expr_p (comp_vectype, mask_type))
>> return false;
>>
>> check you added to check_bool_pattern to get any coverage for bool
>> patterns on x86_64.  Doing that regresses
>>
>> FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1
>> loops" 10
>> FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized
>> 1 loops" 10
>> FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1
>> loops" 2
>>
>> so somehow bool patterns mess up things here (I didn't investigate).
>> The final patch will enable the above path again, avoiding the regression.
>
> Mask conversion patterns handle some cases bool patterns don't.  So it's
> expected we can't vectorize some loops if bool patterns are enforced.
>
> Thanks,
> Ilya
>
>>
>> Yuri - I suppose you have a larger set of testcases using OMP simd
>> or other forced vectorization you added ifcvt_repair_bool_pattern for.
>> I'd appreciate testing (and testcases if anything fails unexpectedly).
>>
>> Testing on other targets is of course appreciated as well.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).
>>
>> Comments?
>>
>> I agree with Ilya elsewhere to remove bool patterns completely
>> (as a first step making them apply to individual stmts).  We do
>> currently not support mixed cond/non-cond uses anyway, like
>>
>> _Bool a[64];
>> unsigned char b[64];
>>
>> void foo (void)
>> {
>>   for (int i = 0; i < 64; ++i)
>> {
>>   _Bool x = a[i] && b[i] < 10;
>>   a[i] = x;
>> }
>> }
>>
>> and stmt-local "patterns" can be added when vectorizing the stmt
>> (like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
>> bool "promotion" optimally requires a better vectorizer IL
>> (similar to placing of shuffles).
>>
>> Thanks,
>> Richard.
>>


Re: [PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-05-30 Thread Ilya Enkovich
2016-05-30 14:04 GMT+03:00 Richard Biener :
>
> The following patch removes the restriction on seeing a tree of stmts
> in vectorizer bool pattern detection (aka single-use).  With this
> it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
> and that compile-time hog can go (it's now enabled unconditionally for GCC 7).
>
> Instead the pattern detection code will now "unshare" the condition tree
> for each bool pattern root my means of adding all pattern stmts of the
> condition tree to its pattern def sequence (so we still get some
> unnecessary copying, worst-case quadratic rather than exponential).
>
> Ilja - I had to disable the
>
>   tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE
> (rhs1));
>   if (mask_type
>   && expand_vec_cmp_expr_p (comp_vectype, mask_type))
> return false;
>
> check you added to check_bool_pattern to get any coverage for bool
> patterns on x86_64.  Doing that regresses
>
> FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1
> loops" 10
> FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized
> 1 loops" 10
> FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1
> loops" 2
>
> so somehow bool patterns mess up things here (I didn't investigate).
> The final patch will enable the above path again, avoiding the regression.

Mask conversion patterns handle some cases bool patterns don't.  So it's
expected we can't vectorize some loops if bool patterns are enforced.

Thanks,
Ilya

>
> Yuri - I suppose you have a larger set of testcases using OMP simd
> or other forced vectorization you added ifcvt_repair_bool_pattern for.
> I'd appreciate testing (and testcases if anything fails unexpectedly).
>
> Testing on other targets is of course appreciated as well.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).
>
> Comments?
>
> I agree with Ilya elsewhere to remove bool patterns completely
> (as a first step making them apply to individual stmts).  We do
> currently not support mixed cond/non-cond uses anyway, like
>
> _Bool a[64];
> unsigned char b[64];
>
> void foo (void)
> {
>   for (int i = 0; i < 64; ++i)
> {
>   _Bool x = a[i] && b[i] < 10;
>   a[i] = x;
> }
> }
>
> and stmt-local "patterns" can be added when vectorizing the stmt
> (like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
> bool "promotion" optimally requires a better vectorizer IL
> (similar to placing of shuffles).
>
> Thanks,
> Richard.
>


[PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-05-30 Thread Richard Biener

The following patch removes the restriction on seeing a tree of stmts
in vectorizer bool pattern detection (aka single-use).  With this
it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
and that compile-time hog can go (it's now enabled unconditionally for GCC 7).

Instead the pattern detection code will now "unshare" the condition tree
for each bool pattern root my means of adding all pattern stmts of the
condition tree to its pattern def sequence (so we still get some
unnecessary copying, worst-case quadratic rather than exponential).

Ilja - I had to disable the

  tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE 
(rhs1));
  if (mask_type
  && expand_vec_cmp_expr_p (comp_vectype, mask_type))
return false;

check you added to check_bool_pattern to get any coverage for bool
patterns on x86_64.  Doing that regresses

FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1 
loops" 10
FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized 
1 loops" 10
FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1 
loops" 2

so somehow bool patterns mess up things here (I didn't investigate).
The final patch will enable the above path again, avoiding the regression.

Yuri - I suppose you have a larger set of testcases using OMP simd
or other forced vectorization you added ifcvt_repair_bool_pattern for.
I'd appreciate testing (and testcases if anything fails unexpectedly).

Testing on other targets is of course appreciated as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).

Comments?

I agree with Ilya elsewhere to remove bool patterns completely
(as a first step making them apply to individual stmts).  We do
currently not support mixed cond/non-cond uses anyway, like

_Bool a[64];
unsigned char b[64];

void foo (void)
{
  for (int i = 0; i < 64; ++i)
{
  _Bool x = a[i] && b[i] < 10;
  a[i] = x;
}
}

and stmt-local "patterns" can be added when vectorizing the stmt
(like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
bool "promotion" optimally requires a better vectorizer IL
(similar to placing of shuffles).

Thanks,
Richard.

2016-05-30  Richard Biener  

PR tree-optimization/71261
* tree-vect-patterns.c (check_bool_pattern): Gather a hash-set
of stmts successfully put in the bool pattern.  Remove
single-use restriction.
(adjust_bool_pattern_cast): Add cast at the use site via the
pattern def sequence.
(adjust_bool_pattern): Remove recursion, maintain a hash-map
of patterned defs.  Use the pattern def seqence instead of
multiple independent patterns.
(sort_after_uid): New qsort compare function.
(adjust_bool_stmts): New function to process stmts in the bool
pattern in IL order.
(vect_recog_bool_pattern): Adjust.
* tree-if-conv.c (ifcvt_split_def_stmt): Remove.
(ifcvt_walk_pattern_tree): Likewise.
(stmt_is_root_of_bool_pattern): Likewise.
(ifcvt_repair_bool_pattern): Likewise.
(tree_if_conversion): Do not call ifcvt_repair_bool_pattern.

* gcc.dg/torture/vect-bool-1.c: New testcase.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c.orig   2016-05-27 14:32:49.778686018 +0200
--- gcc/tree-vect-patterns.c2016-05-30 11:06:23.563532173 +0200
*** vect_recog_mixed_size_cond_pattern (vec<
*** 2888,2897 
  /* Helper function of vect_recog_bool_pattern.  Called recursively, return
 true if bool VAR can and should be optimized that way.  Assume it shouldn't
 in case it's a result of a comparison which can be directly vectorized into
!a vector comparison.  */
  
  static bool
! check_bool_pattern (tree var, vec_info *vinfo)
  {
gimple *def_stmt;
enum vect_def_type dt;
--- 2888,2898 
  /* Helper function of vect_recog_bool_pattern.  Called recursively, return
 true if bool VAR can and should be optimized that way.  Assume it shouldn't
 in case it's a result of a comparison which can be directly vectorized into
!a vector comparison.  Fills in STMTS with all stmts visited during the
!walk.  */
  
  static bool
! check_bool_pattern (tree var, vec_info *vinfo, hash_set )
  {
gimple *def_stmt;
enum vect_def_type dt;
*** check_bool_pattern (tree var, vec_info *
*** 2907,2943 
if (!is_gimple_assign (def_stmt))
  return false;
  
!   if (!has_single_use (var))
! return false;
  
rhs1 = gimple_assign_rhs1 (def_stmt);
rhs_code = gimple_assign_rhs_code (def_stmt);
switch (rhs_code)
  {
  case SSA_NAME:
!   return check_bool_pattern (rhs1, vinfo);
  
  CASE_CONVERT:
if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
  && TREE_CODE