[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #10 from Jakub Jelinek --- Author: jakub Date: Wed Mar 16 13:34:36 2016 New Revision: 234258 URL: https://gcc.gnu.org/viewcvs?rev=234258&root=gcc&view=rev Log: PR tree-optimization/68714 * gcc.dg/tree-ssa/pr68714.c: Add -w -Wno-psabi to dg-options. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68714.c
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #11 from Richard Henderson --- Author: rth Date: Wed Mar 16 23:53:01 2016 New Revision: 234271 URL: https://gcc.gnu.org/viewcvs?rev=234271&root=gcc&view=rev Log: Gimplify vec_cond_expr with condition inside PR middle-end/70240 PR middle-end/68215 PR tree-opt/68714 * gimplify.c (gimplify_expr) [VEC_COND_EXPR]: Gimplify the first operand as is_gimple_condexpr. Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 Richard Henderson changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Richard Henderson --- Fixed.
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #8 from Richard Henderson --- Author: rth Date: Mon Mar 14 20:48:15 2016 New Revision: 234196 URL: https://gcc.gnu.org/viewcvs?rev=234196&root=gcc&view=rev Log: PR tree-opt/68714 * tree-ssa-reassoc.c (ovce_extract_ops, optimize_vec_cond_expr): New. (can_reassociate_p): Allow ANY_INTEGRAL_TYPE_P. (reassociate_bb): Use optimize_vec_cond_expr; avoid optimize_range_tests, attempt_builtin_copysign and attempt_builtin_powi on vectors. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68714.c Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-reassoc.c
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #7 from Marc Glisse --- I find it strange that we do all operations on masks and not on "booleans" for vectors. typedef int T; T f(T a,T b,T c,T d){ return (a: _3 = a_1(D) < b_2(D); _6 = c_4(D) < d_5(D); _7 = _3 & _6; _8 = (T) _7; return _8; that is, we are happy to do the bit_and on booleans. However, with typedef int T __attribute__((vector_size(64))); we now generate (-mavx512f): _3 = VEC_COND_EXPR ; _6 = VEC_COND_EXPR ; _7 = _3 & _6; return _7; yielding this code: vpcmpgtd%zmm0, %zmm1, %k1 vpternlogd $0xFF, %zmm4, %zmm4, %zmm4 vmovdqa32 %zmm4, %zmm0{%k1}{z} vpcmpgtd%zmm2, %zmm3, %k1 vmovdqa32 %zmm4, %zmm2{%k1}{z} vpandd %zmm2, %zmm0, %zmm0 We perform the bit_and on the mask type, whereas it would be better to do it on the boolean type and use 'kandw'. For most platforms, (vec_cnd x -1 0) should be a NOP so it doesn't really matter, and for the few remaining (AVX512 and Sparc IIRC) we want to use "booleans" as much as possible and only convert to a mask late. I think that implies that we should pull operations on masks into operations on booleans (as in the original patch in comment #1 maybe, plus canonicalizing (vec_cnd x 0 -1)), and probably that forwarding conditions into the first argument of vec_cond should only be done late (around expand). But it is quite possible that my intuition is completely bogus here.
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 Richard Henderson changed: What|Removed |Added Status|NEW |ASSIGNED CC||rth at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |rth at gcc dot gnu.org
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #6 from Richard Biener --- (In reply to Andrew Pinski from comment #5) > (In reply to Jakub Jelinek from comment #4) > > I'd add this regressed with r229128, and indeed before that change reassoc > > has been able to optimize the comparisons, but now it is not. So, either we > > defer the creation of vec_cond_expr until later time, or need to teach at > > least reassoc pass about COND_EXPRs and VEC_COND_EXPRs. > > More than that, vec_cond_expr for non x86_64 AVX targets here is useless and > makes it harder to optimize otherwise. > > Why again do we need the vec_cond_expr for those expressions again? The same reason you need a conversion to do int i = a < b; in gimple. Comparisons have a boolean type. Btw, the patch from comment #1 would also help int f(int x, int y) { return (x; D.1767 = VEC_COND_EXPR ; D.1765 = D.1766 | D.1767; return D.1765; } forwprop is also supposed to re-store this.
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #5 from Andrew Pinski --- (In reply to Jakub Jelinek from comment #4) > I'd add this regressed with r229128, and indeed before that change reassoc > has been able to optimize the comparisons, but now it is not. So, either we > defer the creation of vec_cond_expr until later time, or need to teach at > least reassoc pass about COND_EXPRs and VEC_COND_EXPRs. More than that, vec_cond_expr for non x86_64 AVX targets here is useless and makes it harder to optimize otherwise. Why again do we need the vec_cond_expr for those expressions again?
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- I'd add this regressed with r229128, and indeed before that change reassoc has been able to optimize the comparisons, but now it is not. So, either we defer the creation of vec_cond_expr until later time, or need to teach at least reassoc pass about COND_EXPRs and VEC_COND_EXPRs.
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #3 from Ilya Enkovich --- (In reply to Marc Glisse from comment #1) > Helps, but then we have: > > _8 = x_1(D) <= y_2(D); > _6 = VEC_COND_EXPR <_8, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; > > vector lowering calls expand_vec_cond_expr_p using the type of _8 (when the > comparison is inside rhs1, it uses the type of x) which goes through > get_vcond_mask_icode so it answers false (on everything but x86), and the > VEC_COND_EXPR is lowered to a horrible sequence of > > _5 = BIT_FIELD_REF <_8, 32, 0>; > _3 = _5 != 0; > _4 = _3 ? -1 : 0; > [...] > _6 = {_4, _11, _14, _17}; expand_vec_cond_expr_p is not in sync with expand_vec_cond_expr right now. expand_vec_cond_expr allows VEC_COND_EXPR with no embedded comparison even if vcond_mask_optab doesn't have it. This patch should help: diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index d887619..3c9c485 100644 --- a/gcc/optabs-tree.c +++ b/gcc/optabs-tree.c @@ -343,8 +343,13 @@ expand_vec_cond_expr_p (tree value_type, tree cmp_op_type) machine_mode value_mode = TYPE_MODE (value_type); machine_mode cmp_op_mode = TYPE_MODE (cmp_op_type); if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)) -return get_vcond_mask_icode (TYPE_MODE (value_type), -TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing; +{ + if (get_vcond_mask_icode (TYPE_MODE (value_type), + TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing) + return true; + if (GET_MODE_CLASS (TYPE_MODE (cmp_op_type)) != MODE_VECTOR_INT) + return false; +} if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode) || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode) || get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-07 CC||ienkovich at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Target Milestone|--- |6.0 Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- Confirmed. I suppose we also need to canonicalize VEC_COND_EXPRs to have -1 in the true and 0 in the false arm. Note that the optimization also applies to COND_EXPRs with all_ones/zero arms and bitops. We also should handle bit_not of course. Not sure why you guard on !lvec, the optab query is done independent of the comparison code. As for the _8 = x_1(D) <= y_2(D); _6 = VEC_COND_EXPR <_8, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; issue it might be "easiest" to force a target canonical variant during vector lowering. That is, forward the conditon into the vec_cond_expr if that's what the target understands (no bool vectors). Doing this at expansion time only may fall foul of coalescing and TER limitations.
[Bug tree-optimization/68714] [6 Regression] less folding of vector comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714 --- Comment #1 from Marc Glisse --- +/* Sink logical operations below the transformation from a bool vector to a + mask. */ +(if (!(cfun->curr_properties & PROP_gimple_lvec)) + (for bitop (bit_and bit_ior bit_xor) + (simplify + (bitop (vec_cond @0 integer_all_onesp@2 integer_zerop@3) + (vec_cond @1 integer_all_onesp integer_zerop)) + (vec_cond (bitop @0 @1) @2 @3 Helps, but then we have: _8 = x_1(D) <= y_2(D); _6 = VEC_COND_EXPR <_8, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; vector lowering calls expand_vec_cond_expr_p using the type of _8 (when the comparison is inside rhs1, it uses the type of x) which goes through get_vcond_mask_icode so it answers false (on everything but x86), and the VEC_COND_EXPR is lowered to a horrible sequence of _5 = BIT_FIELD_REF <_8, 32, 0>; _3 = _5 != 0; _4 = _3 ? -1 : 0; [...] _6 = {_4, _11, _14, _17}; Easiest might be to get expand_vector_condition to look at the defining statement of rhs1 (and make sure expand does the same). And maybe ping all target maintainers with a vector mode that they may want to implement vcond_mask (when I look at the x86 implementation, it uses vec_merge with a third argument of vector type, while the doc still says that it has to be a const_int bit mask), or maybe provide a default for platforms where the bool and mask vector types are essentially the same.