Re: [RFC PATCH] AVX2 32-byte integer {s,u}m{in,ax} and vcond{,u} patterns

Uros Bizjak Sat, 17 Sep 2011 02:13:58 -0700

On Fri, Sep 16, 2011 at 6:20 PM, Jakub Jelinek <ja...@redhat.com> wrote:


>> Surprisingly with -mavx2 the integer loops aren't vectorized with
>> 32-byte vectors, wonder why.  But looking at the integer umin/umax/smin/smax
>> 16-byte reductions they generate good code even without reduc_* patterns,
>> apparently using vector shifts.
>
> Seems on that testcase the integer loops weren't using 32-byte vectors
> because there were no expanders for 32-byte integer min/max.
> The following patch adds that (and also 32-byte integer condition
> vcond/u because it is related).  With this all the integer loops
> in that testcase are nicely vectorized with 32-byte vectors with -mavx2,
> unfortunately the reductions look terrible.
>
> The problem is that AVX2 doesn't have 32-byte whole vector shift right
> (well, in theory it has it if the shift count is exactly 128 - vextractf128).
> For shift counts > 128 we could in theory handle it as two instructions,
> vextractf128 plus a 16-byte whole vector shift with count - 128, but
> reductions actually don't need the two steps, we only care about the
> bottom bits after the shifts and the upper bits can contain anything.
>
> So, either we can fix this by adding 
> reduc_{smin,smax,umin,umax}_v{32q,16h,8s,4d}i
> patterns (at that point I guess I should just macroize them together with
> the reduc_{smin,smax,umin,umax}_v{4sf,8sf,4df}) and handle the 4 32-byte
> integer modes also in ix86_expand_reduc, or come up with some new optab
> for an operation like whole vector shift right, but which would allow
> the upper bits to be undefined and would only allow shifts by
> vector size / 2, / 4, / 8 down to element size and corresponding tree code.
> What do you prefer?

I think that the former approach is better. We don't have full-vector
shift in this case, so faking it with some very constrainted optab
would be IMO pointless.

> OT: seems the AVX2 support put the avx2_<code><mode>3 and
> *avx2_<code><mode>3 patterns (the former after this patch <code><mode>3)
> in a wrong spot, in between vec_shr_<mode> expander and sse2_lshrv1ti3
> insn which implements what the expander expands.  Uros, would you like to
> move it elsewhere?  Where exactly?

I'd put these after sse4_1 umaxmin patterns, just before:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; Parallel integral comparisons
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

>
> This patch has been tested on x86_64-linux and i686-linux on SandyBridge.
>
> 2011-09-16  Jakub Jelinek  <ja...@redhat.com>
>
>        * config/i386/i386.c (ix86_build_const_vector): Handle V8SImode
>        and V4DImode.
>        (ix86_build_signbit_mask): Likewise.
>        (ix86_expand_int_vcond): Likewise.  Handle V16HImode and
>        V32QImode.
>        (bdesc_args): Use CODE_FOR_{s,u}m{ax,in}v{32q,16h,8s}i3
>        instead of CODE_FOR_avx2_{s,u}m{ax,in}v{32q,16h,8s}i3.
>        * config/i386/sse.md (avx2_<code><mode>3 umaxmin expand): Rename
>        to...
>        (<code><mode>3) ... this.
>        (avx2_<code><mode>3 smaxmin expand): Rename to...
>        (<code><mode>3) ... this.
>        (smax<mode>3, smin<mode>3): Macroize using smaxmin code iterator.
>        (smaxv2di3, sminv2di3): Macroize using smaxmin code iterator and
>        VI8_AVX2 mode iterator.
>        (umaxv2di3, uminv2di3): Macroize using umaxmin code iterator and
>        VI8_AVX2 mode iterator.
>        (vcond<V_256:mode><VI_256:mode>, vcondu<V_256:mode><VI_256:mode>):
>        New expanders.

This is OK for mainline SVN.

Thanks,
Uros.

Re: [RFC PATCH] AVX2 32-byte integer {s,u}m{in,ax} and vcond{,u} patterns

Reply via email to