On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: > Adding CCs. > > 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich....@gmail.com>: >> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich....@gmail.com> >>> wrote: >>>> On 27 Aug 09:55, Richard Biener wrote: >>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich....@gmail.com> >>>>> wrote: >>>>> > >>>>> > Yes, I want to try it. But getting rid of bool patterns would mean >>>>> > support for all targets currently supporting vec_cond. Would it be OK >>>>> > to have vector<bool> mask co-exist with bool patterns for some time? >>>>> >>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should >>>>> be able >>>>> to figure out the correct vector type (or mask type) from the uses. >>>>> Currently >>>>> it simply looks at the stmts LHS type but as all stmt operands already >>>>> have >>>>> vector types it can as well compute the result type from those. We'd >>>>> want to >>>>> have a helper function that does this result type computation as I figure >>>>> it >>>>> will be needed in multiple places. >>>>> >>>>> This is now on my personal TODO list (but that's already quite long for >>>>> GCC 6), >>>>> so if you manage to get to that... see >>>>> tree-vect-loop.c:vect_determine_vectorization_factor >>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their >>>>> vector type set from data-ref analysis already - there 'bool' loads >>>>> correctly get >>>>> VNQImode). There is a basic-block / SLP part as well that would need to >>>>> use >>>>> the helper function (eventually with some SLP discovery order issue). >>>>> >>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and >>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and >>>>> > MASK_STORE). >>>>> >>>>> You can certainly try that first, but as soon as you hit complications >>>>> with >>>>> needing to adjust bool patterns then I'd rather get rid of them. >>>>> >>>>> > >>>>> > I can directly build a vector type with specified mode to avoid it. >>>>> > Smth. like: >>>>> > >>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, >>>>> > current_vector_size); >>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode); >>>>> >>>>> Hmm, indeed, that might be a (good) solution. Btw, in this case >>>>> target attribute >>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change >>>>> depending >>>>> on the active target). There would also be no way for the user to >>>>> declare vector<bool> >>>>> in source (which is good because of that target attribute issue...). >>>>> >>>>> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits) >>>>> and adjusting >>>>> truth_type_for is the way to go. >>>>> >>>>> I suggest you try modifying those parts first according to this scheme >>>>> that will most >>>>> likely uncover issues we missed. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>> >>>> I tried to implement this scheme and apply it for MASK_LOAD and >>>> MASK_STORE. There were no major issues (for now). >>>> >>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to >>>> build a mask type. It is always a vector of bools but its mode is >>>> determined by a target using number of units and currently used vector >>>> length. >>>> >>>> As previously I fixed if-conversion to apply boolean masks for loads and >>>> stores which automatically disables bool patterns for them and flow goes >>>> by a mask path. Vectorization factor computation is fixed to have a >>>> separate computation for mask types. Comparison is now handled separately >>>> by vectorizer and is vectorized into vector comparison. >>>> >>>> Optabs for masked loads and stores were transformed into convert optabs. >>>> Now it is checked using both value and mask modes. >>>> >>>> Optabs for comparison were added. These are also convert optabs checking >>>> value and result type. >>>> >>>> I had to introduce significant number of new patterns in i386 target to >>>> support new optabs. The reason was vector compare was never expanded >>>> separately and always was a part of a vec_cond expansion. >>> >>> Indeed. >>> >>>> As a result it's possible to use the sage GIMPLE representation for both >>>> vector and scalar masks target type. Here is an example I used as a >>>> simple test: >>>> >>>> for (i=0; i<N; i++) >>>> { >>>> float t = a[i]; >>>> if (t > 0.0f && t < 1.0e+2f) >>>> if (c[i] != 0) >>>> c[i] = 1; >>>> } >>>> >>>> Produced vector GIMPLE (before expand): >>>> >>>> vect_t_5.22_105 = MEM[base: _256, offset: 0B]; >>>> mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>>> 0.0 }; >>>> mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, >>>> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 }; >>>> mask__8.27_110 = mask__6.23_107 & mask__7.25_109; >>>> vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110); >>>> mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 }; >>>> mask__37.35_120 = mask__8.27_110 & mask__36.33_119; >>>> MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, >>>> 1 }); >>> >>> Looks good to me. >>> >>>> Produced assembler on AVX-512: >>>> >>>> vmovups (%rdi), %zmm0 >>>> vcmpps $25, %zmm5, %zmm0, %k1 >>>> vcmpps $22, %zmm3, %zmm0, %k1{%k1} >>>> vmovdqa32 -64(%rdx), %zmm2{%k1} >>>> vpcmpd $4, %zmm1, %zmm2, %k1{%k1} >>>> vmovdqa32 %zmm4, (%rcx){%k1} >>>> >>>> Produced assembler on AVX-2: >>>> >>>> vmovups (%rdx), %xmm1 >>>> vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1 >>>> vcmpltps %ymm1, %ymm3, %ymm0 >>>> vcmpltps %ymm5, %ymm1, %ymm1 >>>> vpand %ymm0, %ymm1, %ymm0 >>>> vpmaskmovd -32(%rcx), %ymm0, %ymm1 >>>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>>> vpand %ymm0, %ymm1, %ymm0 >>>> vpmaskmovd %ymm4, %ymm0, (%rax) >>>> >>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer: >>>> >>>> vmovups (%rdx), %xmm0 >>>> vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0 >>>> vcmpltps %ymm0, %ymm6, %ymm1 >>>> vcmpltps %ymm7, %ymm0, %ymm0 >>>> vpand %ymm1, %ymm5, %ymm2 >>>> vpand %ymm0, %ymm2, %ymm1 >>>> vpcmpeqd %ymm3, %ymm1, %ymm0 >>>> vpandn %ymm4, %ymm0, %ymm0 >>>> vpmaskmovd -32(%rcx), %ymm0, %ymm0 >>>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>>> vpandn %ymm1, %ymm0, %ymm0 >>>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>>> vpandn %ymm4, %ymm0, %ymm0 >>>> vpmaskmovd %ymm5, %ymm0, (%rax) >>>> >>>> >>>> For now I still don't disable bool patterns, thus new masks apply to >>>> masked loads and stores only. Patch is also not tested and tried on >>>> several small tests only. Could you please look at what I currently have >>>> and say if it's in sync with your view on vector masking? >>> >>> So apart from bool patterns and maybe implementation details (didn't >>> look too closely at the patch yet, maybe tomorrow), there is >>> >>> + /* Or a boolean vector type with the same element count >>> + as the comparison operand types. */ >>> + else if (TREE_CODE (type) == VECTOR_TYPE >>> + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) >>> + { >>> >>> so we now allow both, integer typed and boolean typed comparison >>> results? I was hoping that on GIMPLE >>> we can canonicalize to a single form, the boolean one and for the >>> "old" style force the use of VEC_COND exprs >>> (which we did anyway, AFAIK). The comparison in the VEC_COND would >>> still have vector bool result type. >>> >>> I expect the vectorization factor changes to "vanish" if we remove >>> bool patterns and re-org vector type deduction >>> >>> Richard. >>> >> >> Totally disabling old style vector comparison and bool pattern is a >> goal but doing hat would mean a lot of regressions for many targets. >> Do you want to it to be tried to estimate amount of changes required >> and reveal possible issues? What would be integration plan for these >> changes? Do you want to just introduce new vector<bool> in GIMPLE >> disabling bool patterns and then resolving vectorization regression on >> all targets or allow them live together with following target switch >> one by one from bool patterns with finally removing them? Not all >> targets are likely to be adopted fast I suppose.
Well, the frontends already create vec_cond exprs I believe. So for bool patterns the vectorizer would have to do the same, but the comparison result in there would still use vec<bool>. Thus the scalar _Bool a = b < c; _Bool c = a || d; if (c) would become vec<int> a = VEC_COND <a < b ? -1 : 0>; vec<int> c = a | d; when the target does not have vec<bool>s directly and otherwise vec<boo> directly (dropping the VEC_COND). Just the vector comparison inside the VEC_COND would always have vec<bool> type. And the "bool patterns" I am talking about are those in tree-vect-patterns.c, not any targets instruction patterns. Richard. >> >> Ilya