On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > What we should do to cope with this problem (structure size increasing)? > Should we return to vector comparison version?
Ok, given this constraint I think the cleanest approach is to allow integer(!) vector equality(!) compares with scalar result. This should then expand via cmp_optab and not via vec_cmp_optab. On gimple you can then have if (mask_vec_1 != {0, 0, .... }) ... Note that a fallback expansion (for optabs.c to try) would be the suggested view-conversion (aka, subreg) variant using a same-sized integer mode. Target maintainers can then choose what is a better fit for their target (and instruction set as register set constraints may apply). The patch you posted seems to do this but not restrict the compares to integer ones (please do that). if (TREE_CODE (op0_type) == VECTOR_TYPE || TREE_CODE (op1_type) == VECTOR_TYPE) { - error ("vector comparison returning a boolean"); - debug_generic_expr (op0_type); - debug_generic_expr (op1_type); - return true; + /* Allow vector comparison returning boolean if operand types + are equal and CODE is EQ/NE. */ + if ((code != EQ_EXPR && code != NE_EXPR) + || TREE_CODE (op0_type) != TREE_CODE (op1_type) + || TYPE_VECTOR_SUBPARTS (op0_type) + != TYPE_VECTOR_SUBPARTS (op1_type) + || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type))) + != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type)))) These are all checked with the useless_type_conversion_p checks done earlier. As said I'd like to see a || ! VECTOR_INTEGER_TYPE_P (op0_type) check added so we and targets do not need to worry about using EQ/NE vs. CMP and worry about signed zeros and friends. + { + error ("type mismatch for vector comparison returning a boolean"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } --- a/gcc/tree-ssa-forwprop.c +++ b/gcc/tree-ssa-forwprop.c @@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt, enum tree_code def_code = gimple_assign_rhs_code (def_stmt); bool invariant_only_p = !single_use0_p; + /* Can't combine vector comparison with scalar boolean type of + the result and VEC_COND_EXPR having vector type of comparison. */ + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE + && INTEGRAL_TYPE_P (type) + && (TREE_CODE (type) == BOOLEAN_TYPE + || TYPE_PRECISION (type) == 1) + && def_code == VEC_COND_EXPR) + return NULL_TREE; this hints at larger fallout you paper over here. So this effectively means we're trying combining (vec1 != vec2) != 0 for example and that fails miserably? If so then the solution is to fix whatever does not expect this (valid) GENERIC tree. + if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0) + return; not sure if I like a param more than a target hook ... :/ + /* Create vector comparison with boolean result. */ + vectype = TREE_TYPE (mask); + zero = build_zero_cst (TREE_TYPE (vectype)); + zero = build_vector_from_val (vectype, zero); build_zero_cst (vectype); + stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE); you can omit the NULL_TREE operands. + gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME); please omit the assert. + gimple_set_vdef (last, new_vdef); do this before you create the PHI. + /* Put definition statement of stored value in STORE_BB + if possible. */ + arg3 = gimple_call_arg (last, 3); + if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3)) + { ... is this really necessary? It looks incomplete to me anyway. I'd rather have a late sink pass if this shows necessary. Btw,... + it is legal. */ + if (gimple_bb (def_stmt) == bb + && is_valid_sink (def_stmt, last_store)) with the implementation of is_valid_sink this is effectively && (!gimple_vuse (def_stmt) || gimple_vuse (def_stmt) == gimple_vdef (last_store)) I still think this "pass" is quite a hack, esp. as it appears as generic code in a GIMPLE pass. And esp. as this hack seems to be needed for Haswell only, not Boradwell or Skylake. Thanks, Richard. > Thanks. > Yuri. > > 2015-11-11 12:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich <enkovich....@gmail.com> >> wrote: >>> 2015-11-10 17:46 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich <enkovich....@gmail.com> >>>> wrote: >>>>> 2015-11-10 15:33 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> I tried it but 256-bit precision integer type is not yet supported. >>>>>> >>>>>> What's the symptom? The compare cannot be expanded? Just add a pattern >>>>>> then. >>>>>> After all we have modes up to XImode. >>>>> >>>>> I suppose problem may be in: >>>>> >>>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >>>>> >>>>> which doesn't allow to create constants of bigger size. Changing it >>>>> to maximum vector size (512) would mean we increase wide_int structure >>>>> size significantly. New patterns are probably also needed. >>>> >>>> Yes, new patterns are needed but wide-int should be fine (we only need to >>>> create >>>> a literal zero AFACS). The "new pattern" would be equality/inequality >>>> against zero >>>> compares only. >>> >>> Currently 256bit integer creation fails because wide_int for max and >>> min values cannot be created. >> >> Hmm, indeed: >> >> #1 0x000000000072dab5 in wi::extended_tree<192>::extended_tree ( >> this=0x7fffffffd950, t=0x7ffff6a000b0) >> at /space/rguenther/src/svn/trunk/gcc/tree.h:5125 >> 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); >> >> but that's not that the constants fail to be created but >> >> #5 0x00000000010d8828 in build_nonstandard_integer_type (precision=512, >> unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051 >> 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) >> (gdb) l >> 8046 fixup_unsigned_type (itype); >> 8047 else >> 8048 fixup_signed_type (itype); >> 8049 >> 8050 ret = itype; >> 8051 if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype))) >> 8052 ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE >> (itype)), itype); >> >> thus the integer type hashing being "interesting". tree_fits_uhwi_p >> fails because >> it does >> >> 7289 bool >> 7290 tree_fits_uhwi_p (const_tree t) >> 7291 { >> 7292 return (t != NULL_TREE >> 7293 && TREE_CODE (t) == INTEGER_CST >> 7294 && wi::fits_uhwi_p (wi::to_widest (t))); >> 7295 } >> >> and wi::to_widest () fails with doing >> >> 5121 template <int N> >> 5122 inline wi::extended_tree <N>::extended_tree (const_tree t) >> 5123 : m_t (t) >> 5124 { >> 5125 gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N); >> 5126 } >> >> fixing the hashing then runs into type_cache_hasher::equal doing >> tree_int_cst_equal >> which again uses to_widest (it should be easier and cheaper to do the >> compare on >> the actual tree representation, but well, seems to be just the first >> of various issues >> we'd run into). >> >> We eventually could fix the assert above (but then need to hope we assert >> when a computation overflows the narrower precision of widest_int) or use >> a special really_widest_int (ugh). >> >>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases >>> WIDE_INT_MAX_ELTS >>> and thus increases wide_int structure. If we use 512 for >>> MAX_BITSIZE_MODE_ANY_INT then >>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for >>> MAX_BITSIZE_MODE_ANY_INT). >>> Is it OK for such narrow usage? >> >> widest_int is used in some long-living structures (which is the reason for >> MAX_BITSIZE_MODE_ANY_INT in the first place). So I don't think so. >> >> Richard. >> >>> Ilya >>> >>>> >>>> Richard. >>>> >>>>> Ilya >>>>> >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Yuri. >>>>>>> >>>>>>>