[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #10 from Jakub Jelinek  ---
Author: jakub
Date: Wed Mar 16 13:34:36 2016
New Revision: 234258

URL: https://gcc.gnu.org/viewcvs?rev=234258&root=gcc&view=rev
Log:
PR tree-optimization/68714
* gcc.dg/tree-ssa/pr68714.c: Add -w -Wno-psabi to dg-options.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68714.c

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-19 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #11 from Richard Henderson  ---
Author: rth
Date: Wed Mar 16 23:53:01 2016
New Revision: 234271

URL: https://gcc.gnu.org/viewcvs?rev=234271&root=gcc&view=rev
Log:
Gimplify vec_cond_expr with condition inside

  PR middle-end/70240
  PR middle-end/68215
  PR tree-opt/68714
  * gimplify.c (gimplify_expr) [VEC_COND_EXPR]: Gimplify the
  first operand as is_gimple_condexpr.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Richard Henderson  ---
Fixed.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-14 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #8 from Richard Henderson  ---
Author: rth
Date: Mon Mar 14 20:48:15 2016
New Revision: 234196

URL: https://gcc.gnu.org/viewcvs?rev=234196&root=gcc&view=rev
Log:
PR tree-opt/68714

  * tree-ssa-reassoc.c (ovce_extract_ops, optimize_vec_cond_expr): New.
  (can_reassociate_p): Allow ANY_INTEGRAL_TYPE_P.
  (reassociate_bb): Use optimize_vec_cond_expr; avoid
  optimize_range_tests, attempt_builtin_copysign and attempt_builtin_powi
  on vectors.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68714.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-reassoc.c

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-03-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #7 from Marc Glisse  ---
I find it strange that we do all operations on masks and not on "booleans" for
vectors.

typedef int T;
T f(T a,T b,T c,T d){
  return (a:
  _3 = a_1(D) < b_2(D);
  _6 = c_4(D) < d_5(D);
  _7 = _3 & _6;
  _8 = (T) _7;
  return _8;

that is, we are happy to do the bit_and on booleans. However, with

typedef int T __attribute__((vector_size(64)));

we now generate (-mavx512f):

  _3 = VEC_COND_EXPR ;
  _6 = VEC_COND_EXPR ;
  _7 = _3 & _6;
  return _7;

yielding this code:

vpcmpgtd%zmm0, %zmm1, %k1
vpternlogd  $0xFF, %zmm4, %zmm4, %zmm4
vmovdqa32   %zmm4, %zmm0{%k1}{z}
vpcmpgtd%zmm2, %zmm3, %k1
vmovdqa32   %zmm4, %zmm2{%k1}{z}
vpandd  %zmm2, %zmm0, %zmm0

We perform the bit_and on the mask type, whereas it would be better to do it on
the boolean type and use 'kandw'. For most platforms, (vec_cnd x -1 0) should
be a NOP so it doesn't really matter, and for the few remaining (AVX512 and
Sparc IIRC) we want to use "booleans" as much as possible and only convert to a
mask late. I think that implies that we should pull operations on masks into
operations on booleans (as in the original patch in comment #1 maybe, plus
canonicalizing (vec_cnd x 0 -1)), and probably that forwarding conditions into
the first argument of vec_cond should only be done late (around expand).

But it is quite possible that my intuition is completely bogus here.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-02-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-01-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #6 from Richard Biener  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Jakub Jelinek from comment #4)
> > I'd add this regressed with r229128, and indeed before that change reassoc
> > has been able to optimize the comparisons, but now it is not.  So, either we
> > defer the creation of vec_cond_expr until later time, or need to teach at
> > least reassoc pass about COND_EXPRs and VEC_COND_EXPRs.
> 
> More than that, vec_cond_expr for non x86_64 AVX targets here is useless and
> makes it harder to optimize otherwise.
> 
> Why again do we need the vec_cond_expr for those expressions again?

The same reason you need a conversion to do int i = a < b; in gimple.
Comparisons have a boolean type.

Btw, the patch from comment #1 would also help

int f(int x, int y)
{
 return (x;
  D.1767 = VEC_COND_EXPR ;
  D.1765 = D.1766 | D.1767;
  return D.1765;
}

forwprop is also supposed to re-store this.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2016-01-04 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #5 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #4)
> I'd add this regressed with r229128, and indeed before that change reassoc
> has been able to optimize the comparisons, but now it is not.  So, either we
> defer the creation of vec_cond_expr until later time, or need to teach at
> least reassoc pass about COND_EXPRs and VEC_COND_EXPRs.

More than that, vec_cond_expr for non x86_64 AVX targets here is useless and
makes it harder to optimize otherwise.

Why again do we need the vec_cond_expr for those expressions again?

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2015-12-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
I'd add this regressed with r229128, and indeed before that change reassoc has
been able to optimize the comparisons, but now it is not.  So, either we defer
the creation of vec_cond_expr until later time, or need to teach at least
reassoc pass about COND_EXPRs and VEC_COND_EXPRs.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2015-12-07 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #3 from Ilya Enkovich  ---
(In reply to Marc Glisse from comment #1)
> Helps, but then we have:
> 
>   _8 = x_1(D) <= y_2(D);
>   _6 = VEC_COND_EXPR <_8, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> 
> vector lowering calls expand_vec_cond_expr_p using the type of _8 (when the
> comparison is inside rhs1, it uses the type of x) which goes through
> get_vcond_mask_icode so it answers false (on everything but x86), and the
> VEC_COND_EXPR is lowered to a horrible sequence of
> 
>   _5 = BIT_FIELD_REF <_8, 32, 0>;
>   _3 = _5 != 0;
>   _4 = _3 ? -1 : 0;
> [...]
>   _6 = {_4, _11, _14, _17};

expand_vec_cond_expr_p is not in sync with expand_vec_cond_expr right now.
expand_vec_cond_expr allows VEC_COND_EXPR with no embedded comparison even if
vcond_mask_optab doesn't have it.  This patch should help:

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index d887619..3c9c485 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -343,8 +343,13 @@ expand_vec_cond_expr_p (tree value_type, tree cmp_op_type)
   machine_mode value_mode = TYPE_MODE (value_type);
   machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type);
   if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type))
-return get_vcond_mask_icode (TYPE_MODE (value_type),
-TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing;
+{
+  if (get_vcond_mask_icode (TYPE_MODE (value_type),
+   TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
+   return true;
+  if (GET_MODE_CLASS (TYPE_MODE (cmp_op_type)) != MODE_VECTOR_INT)
+   return false;
+}
   if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
   || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode)
   || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2015-12-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2015-12-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-07
 CC||ienkovich at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
   Target Milestone|--- |6.0
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Confirmed.  I suppose we also need to canonicalize VEC_COND_EXPRs to have
-1 in the true and 0 in the false arm.  Note that the optimization also
applies to COND_EXPRs with all_ones/zero arms and bitops.

We also should handle bit_not of course.

Not sure why you guard on !lvec, the optab query is done independent
of the comparison code.

As for the

  _8 = x_1(D) <= y_2(D);
  _6 = VEC_COND_EXPR <_8, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;

issue it might be "easiest" to force a target canonical variant during
vector lowering.  That is, forward the conditon into the vec_cond_expr
if that's what the target understands (no bool vectors).  Doing this
at expansion time only may fall foul of coalescing and TER limitations.

[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison

2015-12-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714

--- Comment #1 from Marc Glisse  ---
+/* Sink logical operations below the transformation from a bool vector to a
+   mask.  */
+(if (!(cfun->curr_properties & PROP_gimple_lvec))
+ (for bitop (bit_and bit_ior bit_xor)
+  (simplify
+   (bitop (vec_cond @0 integer_all_onesp@2 integer_zerop@3)
+ (vec_cond @1 integer_all_onesp integer_zerop))
+   (vec_cond (bitop @0 @1) @2 @3

Helps, but then we have:

  _8 = x_1(D) <= y_2(D);
  _6 = VEC_COND_EXPR <_8, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;

vector lowering calls expand_vec_cond_expr_p using the type of _8 (when the
comparison is inside rhs1, it uses the type of x) which goes through
get_vcond_mask_icode so it answers false (on everything but x86), and the
VEC_COND_EXPR is lowered to a horrible sequence of

  _5 = BIT_FIELD_REF <_8, 32, 0>;
  _3 = _5 != 0;
  _4 = _3 ? -1 : 0;
[...]
  _6 = {_4, _11, _14, _17};

Easiest might be to get expand_vector_condition to look at the defining
statement of rhs1 (and make sure expand does the same). And maybe ping all
target maintainers with a vector mode that they may want to implement
vcond_mask (when I look at the x86 implementation, it uses vec_merge with a
third argument of vector type, while the doc still says that it has to be a
const_int bit mask), or maybe provide a default for platforms where the bool
and mask vector types are essentially the same.