Re: [RFC] Try vector as a new representation for vector masks

Richard Biener Thu, 03 Sep 2015 05:12:25 -0700

On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich....@gmail.com>:
>> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich....@gmail.com> 
>>> wrote:
>>>> On 27 Aug 09:55, Richard Biener wrote:
>>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich....@gmail.com> 
>>>>> wrote:
>>>>> >
>>>>> > Yes, I want to try it. But getting rid of bool patterns would mean
>>>>> > support for all targets currently supporting vec_cond. Would it be OK
>>>>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>>>>
>>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should 
>>>>> be able
>>>>> to figure out the correct vector type (or mask type) from the uses.  
>>>>> Currently
>>>>> it simply looks at the stmts LHS type but as all stmt operands already 
>>>>> have
>>>>> vector types it can as well compute the result type from those.  We'd 
>>>>> want to
>>>>> have a helper function that does this result type computation as I figure 
>>>>> it
>>>>> will be needed in multiple places.
>>>>>
>>>>> This is now on my personal TODO list (but that's already quite long for 
>>>>> GCC 6),
>>>>> so if you manage to get to that...  see
>>>>> tree-vect-loop.c:vect_determine_vectorization_factor
>>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>>>>> vector type set from data-ref analysis already - there 'bool' loads
>>>>> correctly get
>>>>> VNQImode).  There is a basic-block / SLP part as well that would need to 
>>>>> use
>>>>> the helper function (eventually with some SLP discovery order issue).
>>>>>
>>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>>>>> > MASK_STORE).
>>>>>
>>>>> You can certainly try that first, but as soon as you hit complications 
>>>>> with
>>>>> needing to adjust bool patterns then I'd rather get rid of them.
>>>>>
>>>>> >
>>>>> > I can directly build a vector type with specified mode to avoid it. 
>>>>> > Smth. like:
>>>>> >
>>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, 
>>>>> > current_vector_size);
>>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>>>>
>>>>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>>>>> target attribute
>>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change 
>>>>> depending
>>>>> on the active target).  There would also be no way for the user to
>>>>> declare vector<bool>
>>>>> in source (which is good because of that target attribute issue...).
>>>>>
>>>>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>>>>> and adjusting
>>>>> truth_type_for is the way to go.
>>>>>
>>>>> I suggest you try modifying those parts first according to this scheme
>>>>> that will most
>>>>> likely uncover issues we missed.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>
>>>> I tried to implement this scheme and apply it for MASK_LOAD and 
>>>> MASK_STORE.  There were no major issues (for now).
>>>>
>>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to 
>>>> build a mask type.  It is always a vector of bools but its mode is 
>>>> determined by a target using number of units and currently used vector 
>>>> length.
>>>>
>>>> As previously I fixed if-conversion to apply boolean masks for loads and 
>>>> stores which automatically disables bool patterns for them and flow goes 
>>>> by a mask path.  Vectorization factor computation is fixed to have a 
>>>> separate computation for mask types.  Comparison is now handled separately 
>>>> by vectorizer and is vectorized into vector comparison.
>>>>
>>>> Optabs for masked loads and stores were transformed into convert optabs.  
>>>> Now it is checked using both value and mask modes.
>>>>
>>>> Optabs for comparison were added.  These are also convert optabs checking 
>>>> value and result type.
>>>>
>>>> I had to introduce significant number of new patterns in i386 target to 
>>>> support new optabs.  The reason was vector compare was never expanded 
>>>> separately and always was a part of a vec_cond expansion.
>>>
>>> Indeed.
>>>
>>>> As a result it's possible to use the sage GIMPLE representation for both 
>>>> vector and scalar masks target type.  Here is an example I used as a 
>>>> simple test:
>>>>
>>>>   for (i=0; i<N; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f && t < 1.0e+2f)
>>>>       if (c[i] != 0)
>>>>         c[i] = 1;
>>>>   }
>>>>
>>>> Produced vector GIMPLE (before expand):
>>>>
>>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
>>>> 0.0 };
>>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
>>>> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 
>>>> 1 });
>>>
>>> Looks good to me.
>>>
>>>> Produced assembler on AVX-512:
>>>>
>>>>         vmovups (%rdi), %zmm0
>>>>         vcmpps  $25, %zmm5, %zmm0, %k1
>>>>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>>>>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>>>>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>>>>         vmovdqa32       %zmm4, (%rcx){%k1}
>>>>
>>>> Produced assembler on AVX-2:
>>>>
>>>>         vmovups (%rdx), %xmm1
>>>>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>>>>         vcmpltps        %ymm1, %ymm3, %ymm0
>>>>         vcmpltps        %ymm5, %ymm1, %ymm1
>>>>         vpand   %ymm0, %ymm1, %ymm0
>>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>>         vpand   %ymm0, %ymm1, %ymm0
>>>>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>>>>
>>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>>>>
>>>>         vmovups (%rdx), %xmm0
>>>>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>>>>         vcmpltps        %ymm0, %ymm6, %ymm1
>>>>         vcmpltps        %ymm7, %ymm0, %ymm0
>>>>         vpand   %ymm1, %ymm5, %ymm2
>>>>         vpand   %ymm0, %ymm2, %ymm1
>>>>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>>         vpandn  %ymm1, %ymm0, %ymm0
>>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>>>>
>>>>
>>>> For now I still don't disable bool patterns, thus new masks apply to 
>>>> masked loads and stores only.  Patch is also not tested and tried on 
>>>> several small tests only.  Could you please look at what I currently have 
>>>> and say if it's in sync with your view on vector masking?
>>>
>>> So apart from bool patterns and maybe implementation details (didn't
>>> look too closely at the patch yet, maybe tomorrow), there is
>>>
>>> +  /* Or a boolean vector type with the same element count
>>> +     as the comparison operand types.  */
>>> +  else if (TREE_CODE (type) == VECTOR_TYPE
>>> +          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>>> +    {
>>>
>>> so we now allow both, integer typed and boolean typed comparison
>>> results?  I was hoping that on GIMPLE
>>> we can canonicalize to a single form, the boolean one and for the
>>> "old" style force the use of VEC_COND exprs
>>> (which we did anyway, AFAIK).  The comparison in the VEC_COND would
>>> still have vector bool result type.
>>>
>>> I expect the vectorization factor changes to "vanish" if we remove
>>> bool patterns and re-org vector type deduction
>>>
>>> Richard.
>>>
>>
>> Totally disabling old style vector comparison and bool pattern is a
>> goal but doing hat would mean a lot of regressions for many targets.
>> Do you want to it to be tried to estimate amount of changes required
>> and reveal possible issues? What would be integration plan for these
>> changes? Do you want to just introduce new vector<bool> in GIMPLE
>> disabling bool patterns and then resolving vectorization regression on
>> all targets or allow them live together with following target switch
>> one by one from bool patterns with finally removing them? Not all
>> targets are likely to be adopted fast I suppose.


Well, the frontends already create vec_cond exprs I believe.  So for
bool patterns the vectorizer would have to do the same, but the
comparison result in there would still use vec<bool>.  Thus the scalar

 _Bool a = b < c;
 _Bool c = a || d;
 if (c)

would become

 vec<int> a = VEC_COND <a < b ? -1 : 0>;
 vec<int> c = a | d;

when the target does not have vec<bool>s directly and otherwise
vec<boo> directly (dropping the VEC_COND).

Just the vector comparison inside the VEC_COND would always
have vec<bool> type.

And the "bool patterns" I am talking about are those in
tree-vect-patterns.c, not any targets instruction patterns.

Richard.

>>
>> Ilya

Re: [RFC] Try vector as a new representation for vector masks

Reply via email to